The Book Structure Extraction Competition with the Resurgence Software at Caen University

Size: px
Start display at page:

Download "The Book Structure Extraction Competition with the Resurgence Software at Caen University"

Transcription

1 The Book Structure Extraction Competition with the Resurgence Software at Caen University Emmanuel Giguet and Nadine Lucas GREYC Cnrs, Caen Basse Normandie University BP 5186 F CAEN Cedex France Abstract. The GREYC Island team participated in the Structure Extraction Competition part of the INEX Book track for the first time, with the Resurgence software. We used a minimal strategy primarily based on top-down document representation. The main idea is to use a model describing relationships for elements in the document structure. Chapters are represented and implemented by frontiers between chapters. Page is also used. The periphery center relationship is calculated on the entire document and reflected on each page. The strong points of the approach are that it deals with the entire document; it handles books without ToCs, and titles that are not represented in the ToC (e. g. preface); it is not dependent on lexicon, hence tolerant to OCR errors and language independent; it is simple and fast. 1 Introduction The GREYC Island team participated for the first time in the Book Structure Extraction Competition held at ICDAR in 2009 and part of the INEX evaluations [1]. The Resurgence software was modified to this end. This software was designed to handle various document formats, in order to process academic articles (mainly in pdf format) and news articles (mainly in HTML format) in various text parsing tasks [2]. We decided to join the INEX competition because the team was also interested in handling voluminous documents, such as textbooks. The experiment was conducted over 1 month. It was run from pdf documents to ensure the control of the entire process. The document content is extracted using the pdf2xml software [3]. The evaluation rules were not thoroughly studied, for we simply wished to check if we were able to handle large corpora of voluminous documents. The huge memory needed to handle books was indeed a serious obstacle, as compared with the ease in handling academic articles. The OCR texts were also difficult to cope with. Therefore, Resurgence was modified in order to handle the corpus. We could not propagate our principles on all the levels of the book hierarchy at a time. We consequently focused on chapter detection. In the following, we explain the main difficulties, our strategy and the results on the INEX book corpus. We provide corrected results after a few modifications were made. In the last section, we discuss the advantages of our method and make proposals for future competitions. S. Geva, J. Kamps, and A. Trotman (Eds.): INEX 2009, LNCS 6203, pp , Springer-Verlag Berlin Heidelberg 2010

2 The Book Structure Extraction Competition with the Resurgence Software Our Book Structure Extraction Method 2.1 Challenges In the first stage of the experiment, the huge memory needed to handle books was found to be indeed a serious hindrance: pdf2xml required up to 8 Gb of memory and Resurgence required up to 2 Gb to parse the content of large books (> 150 Mb). This was due to the fact that the whole content of the book was stored in memory. The underlying algorithms did not actually require the availability of the whole content at a time. They were so designed since they were meant to process short documents. Therefore, Resurgence was modified in order to load the necessary pages only. The objective was to allow processing on usual laptop computers. The fact that the corpus was OCR documents also challenged our previous program that detected the structure of electronic academic articles. A new branch in Resurgence had to be written in order to be tolerant to OCR documents. We had no time to propagate our document parsing principles on all the levels of the book hierarchy at a time. We consequently focused on chapter detection. 2.2 Strategy Very few principles were tested in this experiment. The strategy in Resurgence is based on document positional representation, and does not rely on the table of contents (ToC). This means that the whole document is considered first. Then document constituents are considered top-down (by successive subdivision), with focus on the middle part (main body). The document is thus the unit that can be broken down ultimately to pages. The main idea is to use a model describing relationships for elements in the document structure. The model is a periphery-center dichotomy. The periphery center relationship is calculated on the entire document and reflected on each page. The algorithm aims at retrieving the book main content bounded by annex material like preface and postface with different layout. It ultimately retrieves the page body in a page, surrounded by margins [2]. However, for this first experiment, we focused on chapter title detection so that the program detects only one level, i. e. chapter titles. Chapter title detection throughout the document was conducted using a sliding window. It is used to detect chapter transitions. The window captures a four-page context with a look-ahead of one page and look-behind of two pages. The underlying idea is that the chapter begins after a blank, or at least is found in a relatively empty zone at the top of page. The half page fill rate is the simple cue used to decide on chapter transition. The beginning of a chapter is detected by one of the two patterns below, where i is the page where a chapter starts. Figure 1 and 2 illustrate the two patterns. Pattern 1: - top and bottom of page i-2 equally filled - bottom of page i-1 less filled than top of page i-1 - top of page i less filled than bottom of page i - top and bottom of page i+1 equally filled

3 172 E. Giguet and N. Lucas Fig. 1. View of the four-page sliding window to detect chapter beginning. Pattern 1 matches. Excerpt from 2009 book id = 00AF1EE1CC79B277. Fig. 2. View of the four-page sliding window to detect chapter announced by a blank page. Pattern 2 matches. Excerpt from 2009 book id= 00AF1EE1CC79B277. Pattern 2: - any content for page i-2 - empty page i-1 - top of page i less filled than bottom of page i - top and bottom of page i+1 equally filled Chapter title extraction is made from the first third of the beginning page. The model assumes that the title begins at the top of the page. The end of the title was not carefully looked for. The title is grossly delineated by a constraint rule allowing a number of lines containing at most 40 words. 2.3 Experiment The program detected only chapter titles. No effort was exerted to find the sub-titles. The three runs were not very different since runs 2 and 3 amount to post-processing of the ToC generated by run 1.

4 The Book Structure Extraction Competition with the Resurgence Software 173 Run 1 was based on minimal rules as stated above. Run 2 was the same + removing white spaces at the beginning and end of the title (trim) Run 3 was the same + trim + pruning lower-case lines following a would-be title in higher-case. 2.4 Commented Results The entire corpus was handled. The results were equally very bad for the three runs. This was due to a page numbering bug where p = p-1. The intriguing value above zero 0,08% came from rare cases where the page contained two chapters (two poems). Table 1. Book Structure Extraction official evaluation 1 RunID Participant F-measure (complete entries) MDCS 41,51% Microsoft Development Center Serbia XRCE-run2 28,47% Xerox Research Centre Europe XRCE-run1 27,72% Xerox Research Centre Europe XRCE-run3 Xerox Research Centre Europe 27,33% Noopsis 8,32% Noopsis inc. GREYC-run1 GREYC - University of Caen, France 0,08% GREYC-run2 0,08% GREYC - University of Caen, France GREYC-run3 0,08% GREYC - University of Caen, France Table 2. Detailed results for GREYC Precision Recall F-Measure Titles 19,83% 13,60% 13,63% Levels 16,48% 12,08% 11,85% Links 1,04% 0,14% 0,23% Complete entries 0,40% 0,05% 0,08% Entries disregarding depth 1,04% 0,14% 0,23% The results were recomputed with correction on the unfortunate page number shift in the INEX grid (Table 3). The alternative evaluation grid suggested by [4, 5], was applied. In table 4, for the GREYC result, the corrected run p= p-1 is computed under the name "GREYC-1" doucet/structureextraction2009/ 2 AlternativeResults.html

5 174 E. Giguet and N. Lucas Table 3. GREYC results with page numbering correction precision recall F-measure run-1 10, 41 7,41 7,66 run-2 10,56 7,61 7,85 run-3 11,22 7,61 8,02 Table 4. Alternative evaluation XRCE Link-based Measure Links Accuracy (for valid links) Precision Recall F1 Title Level MDCS XRCE-run XRCE-run XRCE-run Noopsis GREYC GREYC The results still suffered from insufficient provision made for the evaluation rules. Notably, the title hierarchy is not represented, which impairs recall. Titles were grossly segmented on the right side, which impairs precision. Title accuracy is also very low for the same reason. However, level accuracy balances the bad results reflected in the F1 measure. The idea behind level accuracy is that good results at a given level are more satisfying than errors scattered everywhere. The accuracy for chapter level, which was the only level we tempted, was 73,2%, second high. It means that few chapter beginnings were missed by Resurgence. Errors reflect both non responses and wrong responses. Our system returned 80 non responses for chapters, out of 527 in the sample, and very few wrong responses. Chapter titles starting on the second half of the page have been missed, as well as some chapters where the title was not very clearly contrasted against the background. 2.5 Corrections after Official Competition A simple corrective strategy was applied in order to better compare methods. First the bug on page number was corrected. A new feature boosted precision. In a supplementary run (run 4) both page number shift and chapter title detection were amended. The title right end is detected, by calculating the line height disruption: a contrast between the would-be title line height and the rest of the page line height. These corrections result in a better precision as shown in Table 5 (line GREYC-2) with the XRCE link-based measure. The recall rate is not improved because the subtitles are still not looked for.

6 The Book Structure Extraction Competition with the Resurgence Software 175 Table 5. Corrected run (GREYC-2) with better title extraction compared with previous results XRCE Link-based Measure Links Accuracy (for valid links) Precision Recall F1 Title Level GREYC GREYC GREYC Table 6 reorders the final results of the Resurgence program (GREYC-2) against other participants known performance. The measure is the XRCE Link-based measure. Table 6. Best alternative evaluation XRCE Link-based Measure Links Accuracy (for valid links) Precision Recall F1 Title Level MDCS XRCE-run GREYC Noopsis Discussion The experiment was preliminary. We were pleased to be able to handle the entire corpus and go through evaluation, since only four competitors out of eleven finished [1]. The results were very bad even if small corrections significantly improved them. In addition to the unfortunate page shift, the low recall is due to the fact that the hierarchy of titles was not addressed as mentioned earlier. This will be addressed in the future. 3.1 Reflections on the Experiment On the scientific side, the importance of quick and light means to handle the corpus was salient. The pdf2xml program for instance returned too much information for our needs and was expensive to run. We wish to contribute to the improvement of the software on these aspects. Although the results are bad, they showed some strong points of the Resurgence program, based on relative position and differential principles. We intend to further explore this way. The advantages are the following: The program deals with the entire document, not only the table of contents; It handles books without ToCs, and titles that are not represented in the ToC (e. g. preface); It is dependent on typographical position, which is very stable in the corpus;

7 176 E. Giguet and N. Lucas It is not dependent on lexicon, hence tolerant to OCR errors and language independent. Last, it is simple and fast. Some examples below underline our point. They illustrate problems that are met in classical literal approaches but are avoided by the positional solution. Example 1. Varying forms for the string chapter due to OCR errors handled by Resurgence CHAPTEK CHAPTEE CH^PTEE CHAP TEE CHA 1 TKR C II APT Kit (MI A I TKIl C II A P T E II C H A P TEH C H A P T E R C II A P T E U Oil A PTKR Since no expectations bear on language-dependent forms, chapter titles can be extracted from any language. A reader can detect a posteriori that this is being written in French (first series) or in English (second series). TABLE DES MATIÈRES DEUXIEME PARTIE CHAPITRE TROISIÈME PARTIE QUATRIÈME PARTIE PREFACE CHAPTER TABLE OF CONTENTS INTRODUCTION APPENDIX Since no list of expected and memorized forms is used, but position instead, fairly common strings are extracted, such as CHAPTER or SECTION, but also uncommon ones, such as PSALM or SONNET. When chapters have no numbering and no prefix such as chapter, they are found as well, for instance a plain title Christmas Day. Resurgence did not rely on numbering of chapters: this is an important source of OCR errors, like in the following series. Hence they were retrieved as they were by our robust extractor.

8 The Book Structure Extraction Competition with the Resurgence Software 177 II HI IV V VI SECTION VI SECTION VII SECTION YTIT SECTION IX SKOTIOX XMI SECTION XV SECTION XVI THE FIRST SERMON THE SECOND SERMON THE THIRD SERMON THE FOURTH SERMON CHAPTEE TWELFTH CHAPTER THIRTEENTH CHAPTER FOURTEENTH The approach we used was minimal, but reflects an original breakthrough to improve robustness without sacrificing quality. 3.2 Proposals Concerning evaluation rules, generally speaking, it is unclear whether the ground truth depends on the book or on the ToC. If the ToC is the reference, it is an error to extract prefaces, for instance. The participants using the whole text as main reference would be penalized if they extract the whole hierarchy of titles as it appears in the book, when the ToC represents only higher levels, as is often the case. Concerning details, it should be clear whether or when the prefix indicating the book hierarchy level (Chapter, Section, and so on) and the numbering should be part of the extracted result. As it was mentioned earlier and as it can be seen in Figure 1, the chapter title is not necessarily preceded by such mentions, but in other cases there is no specific chapter title and only a number. The ground truth is not clear either on the extracted title case: sometimes the case differs in the ToC and in the actual title in the book. It would be very useful to provide results by title depth (level) as suggested by [4], because it seems that providing complete results for one or more level(s) would be more satisfying than missing some items at all levels. It is important to get coherent and comparable text spans for many tasks, such as indexing, helping navigation or text mining.

9 178 E. Giguet and N. Lucas The reason why the beginning and end of the titles are overrepresented in the evaluation scores is not clear and a more straightforward edit distance for extracted titles should be provided. There is also a bias introduced by a semi-automatically constructed ground truth. Manual annotation is still to be conducted to improve the ground truth quality, but it is time-consuming. We had technical difficulties to meet that requirement in summer It might be a better idea to open annotation to a larger audience and for a longer period of time. It might be a good idea to give the bounding box containing the title as a reference for the ground truth. This solution would solve conflicts between manual annotation and automatic annotation, leaving man or machine to read and interpret the content of the bounding box. It would also alleviate conflicts between ToC-based or text-based approaches. The corpus provided for the INEX Book track is very valuable, it is the only available corpus offering full books. Although it comprises old printed books only, it is interesting for it provides various examples of layout. References 1. Doucet, A., Kazai, G.: ICDAR 2009 Book Structure Extraction Competition. In: 10th International Conference on Document Analysis and Recognition ICDAR 2009, Barcelona, Spain, pp IEEE, Los Alamitos (2009) 2. Giguet, E., Lucas, N., Chircu, C.: Le projet Resurgence: Recouvrement de la structure logique des documents électroniques. In: JEP-TALN-RECITAL 08 Avignon (2008) 3. Déjean, H.: pdf2xml open source software (2010), (last visited March 2010) 4. Déjean, H., Meunier, J.-L.: XRCE Participation to the Book Structure Task. In: Geva, S., Kamps, J., Trotman, A. (eds.) INEX LNCS, vol. 5631, pp Springer, Heidelberg (2009) doi: / Déjean, H., Meunier, J.-L.: XRCE Participation to the Book Structure Task. In: Geva, S., Kamps, J., Trotman, A. (eds.) INEX LNCS, vol. 5631, pp Springer, Heidelberg (2009)

Setting up a Competition Framework for the Evaluation of Structure Extraction from OCR-ed Books

Setting up a Competition Framework for the Evaluation of Structure Extraction from OCR-ed Books Setting up a Competition Framework for the Evaluation of Structure Extraction from OCR-ed Books Antoine Doucet, Gabriella Kazai, Bodin Dresevic, Aleksandar Uzelac, Bogdan Radakovic, Nikola Todic To cite

More information

Overview of the ICDAR 2013 Competition on Book Structure Extraction

Overview of the ICDAR 2013 Competition on Book Structure Extraction Overview of the ICDAR 2013 Competition on Book Structure Extraction Antoine Doucet, Gabriella Kazai, Sebastian Colutto, Günter Mühlberger To cite this version: Antoine Doucet, Gabriella Kazai, Sebastian

More information

The Functional Extension Parser (FEP) A Document Understanding Platform

The Functional Extension Parser (FEP) A Document Understanding Platform The Functional Extension Parser (FEP) A Document Understanding Platform Günter Mühlberger University of Innsbruck Department for German Language and Literature Studies Introduction A book is more than

More information

University of Amsterdam at INEX 2010: Ad hoc and Book Tracks

University of Amsterdam at INEX 2010: Ad hoc and Book Tracks University of Amsterdam at INEX 2010: Ad hoc and Book Tracks Jaap Kamps 1,2 and Marijn Koolen 1 1 Archives and Information Studies, Faculty of Humanities, University of Amsterdam 2 ISLA, Faculty of Science,

More information

CADIAL Search Engine at INEX

CADIAL Search Engine at INEX CADIAL Search Engine at INEX Jure Mijić 1, Marie-Francine Moens 2, and Bojana Dalbelo Bašić 1 1 Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, 10000 Zagreb, Croatia {jure.mijic,bojana.dalbelo}@fer.hr

More information

Going digital Challenge & solutions in a newspaper archiving project. Andrey Lomov ATAPY Software Russia

Going digital Challenge & solutions in a newspaper archiving project. Andrey Lomov ATAPY Software Russia Going digital Challenge & solutions in a newspaper archiving project Andrey Lomov ATAPY Software Russia Problem Description Poor recognition results caused by low image quality: noise, white holes in characters,

More information

Overview of the INEX 2009 Link the Wiki Track

Overview of the INEX 2009 Link the Wiki Track Overview of the INEX 2009 Link the Wiki Track Wei Che (Darren) Huang 1, Shlomo Geva 2 and Andrew Trotman 3 Faculty of Science and Technology, Queensland University of Technology, Brisbane, Australia 1,

More information

Distributed Consensus Protocols

Distributed Consensus Protocols Distributed Consensus Protocols ABSTRACT In this paper, I compare Paxos, the most popular and influential of distributed consensus protocols, and Raft, a fairly new protocol that is considered to be a

More information

Processing Structural Constraints

Processing Structural Constraints SYNONYMS None Processing Structural Constraints Andrew Trotman Department of Computer Science University of Otago Dunedin New Zealand DEFINITION When searching unstructured plain-text the user is limited

More information

Classification and Information Extraction for Complex and Nested Tabular Structures in Images

Classification and Information Extraction for Complex and Nested Tabular Structures in Images Classification and Information Extraction for Complex and Nested Tabular Structures in Images Abstract Understanding of technical documents, like manuals, is one of the most important steps in automatic

More information

RiMOM Results for OAEI 2009

RiMOM Results for OAEI 2009 RiMOM Results for OAEI 2009 Xiao Zhang, Qian Zhong, Feng Shi, Juanzi Li and Jie Tang Department of Computer Science and Technology, Tsinghua University, Beijing, China zhangxiao,zhongqian,shifeng,ljz,tangjie@keg.cs.tsinghua.edu.cn

More information

A Semi-automatic Support to Adapt E-Documents in an Accessible and Usable Format for Vision Impaired Users

A Semi-automatic Support to Adapt E-Documents in an Accessible and Usable Format for Vision Impaired Users A Semi-automatic Support to Adapt E-Documents in an Accessible and Usable Format for Vision Impaired Users Elia Contini, Barbara Leporini, and Fabio Paternò ISTI-CNR, Pisa, Italy {elia.contini,barbara.leporini,fabio.paterno}@isti.cnr.it

More information

OTCYMIST: Otsu-Canny Minimal Spanning Tree for Born-Digital Images

OTCYMIST: Otsu-Canny Minimal Spanning Tree for Born-Digital Images OTCYMIST: Otsu-Canny Minimal Spanning Tree for Born-Digital Images Deepak Kumar and A G Ramakrishnan Medical Intelligence and Language Engineering Laboratory Department of Electrical Engineering, Indian

More information

Phrase Detection in the Wikipedia

Phrase Detection in the Wikipedia Phrase Detection in the Wikipedia Miro Lehtonen 1 and Antoine Doucet 1,2 1 Department of Computer Science P. O. Box 68 (Gustaf Hällströmin katu 2b) FI 00014 University of Helsinki Finland {Miro.Lehtonen,Antoine.Doucet}

More information

Refinement of digitized documents through recognition of mathematical formulae

Refinement of digitized documents through recognition of mathematical formulae Refinement of digitized documents through recognition of mathematical formulae Toshihiro KANAHORI Research and Support Center on Higher Education for the Hearing and Visually Impaired, Tsukuba University

More information

Binarization of Color Character Strings in Scene Images Using K-means Clustering and Support Vector Machines

Binarization of Color Character Strings in Scene Images Using K-means Clustering and Support Vector Machines 2011 International Conference on Document Analysis and Recognition Binarization of Color Character Strings in Scene Images Using K-means Clustering and Support Vector Machines Toru Wakahara Kohei Kita

More information

Unit 3. Design and the User Interface. Introduction to Multimedia Semester 1

Unit 3. Design and the User Interface. Introduction to Multimedia Semester 1 Unit 3 Design and the User Interface 2018-19 Semester 1 Unit Outline In this unit, we will learn Design Guidelines: Appearance Balanced Layout Movement White Space Unified Piece Metaphor Consistency Template

More information

Web Applications Usability Testing With Task Model Skeletons

Web Applications Usability Testing With Task Model Skeletons Web Applications Usability Testing With Task Model Skeletons Ivo Maly, Zdenek Mikovec, Czech Technical University in Prague, Faculty of Electrical Engineering, Karlovo namesti 13, 121 35 Prague, Czech

More information

A Comparative Study Weighting Schemes for Double Scoring Technique

A Comparative Study Weighting Schemes for Double Scoring Technique , October 19-21, 2011, San Francisco, USA A Comparative Study Weighting Schemes for Double Scoring Technique Tanakorn Wichaiwong Member, IAENG and Chuleerat Jaruskulchai Abstract In XML-IR systems, the

More information

Color. Today. part 2. How to Read a Research Paper Components of a Well-written Research Paper 3 Readings for Today

Color. Today. part 2. How to Read a Research Paper Components of a Well-written Research Paper 3 Readings for Today Color part 2 Today How to Read a Research Paper Components of a Well-written Research Paper 3 Readings for Today Modeling Color Difference for Visualization Design Szafir, IEEE TVCG / IEEE VIS 2017 Hue-Preserving

More information

This is a repository copy of A Rule Chaining Architecture Using a Correlation Matrix Memory.

This is a repository copy of A Rule Chaining Architecture Using a Correlation Matrix Memory. This is a repository copy of A Rule Chaining Architecture Using a Correlation Matrix Memory. White Rose Research Online URL for this paper: http://eprints.whiterose.ac.uk/88231/ Version: Submitted Version

More information

Math Search with Equivalence Detection Using Parse-tree Normalization

Math Search with Equivalence Detection Using Parse-tree Normalization Math Search with Equivalence Detection Using Parse-tree Normalization Abdou Youssef Department of Computer Science The George Washington University Washington, DC 20052 Phone: +1(202)994.6569 ayoussef@gwu.edu

More information

Design Better. Reduce Risks. Ease Upgrades. Protect Your Software Investment

Design Better. Reduce Risks. Ease Upgrades. Protect Your Software Investment Protect Your Software Investment Design Better. Reduce Risks. Ease Upgrades. Protect Your Software Investment The Difficulty with Embedded Software Development Developing embedded software is complicated.

More information

Book 5. Chapter 1: Slides with SmartArt & Pictures... 1 Working with SmartArt Formatting Pictures Adjust Group Buttons Picture Styles Group Buttons

Book 5. Chapter 1: Slides with SmartArt & Pictures... 1 Working with SmartArt Formatting Pictures Adjust Group Buttons Picture Styles Group Buttons Chapter 1: Slides with SmartArt & Pictures... 1 Working with SmartArt Formatting Pictures Adjust Group Buttons Picture Styles Group Buttons Chapter 2: Slides with Charts & Shapes... 12 Working with Charts

More information

Consultation document: Summary of Clinical Trial Results for Laypersons

Consultation document: Summary of Clinical Trial Results for Laypersons SANTE-B4-GL-results-laypersons@ec.europa.eu Consultation document: Summary of Clinical Trial Results for Laypersons Professor DK Theo Raynor, University of Leeds d.k.raynor@leeds.ac.uk This is my response

More information

Software-Defined Networking from Serro Solutions Enables Global Communication Services in Near Real-Time

Software-Defined Networking from Serro Solutions Enables Global Communication Services in Near Real-Time A CONNECTED A CONNECTED Software-Defined Networking from Serro Solutions Enables Global Communication Services in Near Real-Time Service providers gain a competitive advantage by responding to customer

More information

Web site with recorded speech for visually impaired

Web site with recorded speech for visually impaired Web site with recorded speech for visually impaired Kenji Inoue 1, Toshihiko Tsujimoto 1, and Hirotake Nakashima 2 1 Graduate School of Information Science and Technology, 2 Department of Media Science,

More information

ISO/IEC INTERNATIONAL STANDARD. Systems and software engineering Requirements for designers and developers of user documentation

ISO/IEC INTERNATIONAL STANDARD. Systems and software engineering Requirements for designers and developers of user documentation INTERNATIONAL STANDARD ISO/IEC 26514 First edition 2008-06-15 Systems and software engineering Requirements for designers and developers of user documentation Ingénierie du logiciel et des systèmes Exigences

More information

The Migration/Modernization Dilemma

The Migration/Modernization Dilemma The Migration/Modernization Dilemma By William Calcagni www.languageportability.com 866.731.9977 Approaches to Legacy Conversion For many years businesses have sought to reduce costs by moving their legacy

More information

Optimized XY-Cut for Determining a Page Reading Order

Optimized XY-Cut for Determining a Page Reading Order Optimized XY-Cut for Determining a Page Reading Order Jean-Luc Meunier Xerox Research Centre Europe 6, chemin de Maupertuis F-3840 Meylan jean-luc.meunier@xrce.xerox.com Abstract In this paper, we propose

More information

Mono-font Cursive Arabic Text Recognition Using Speech Recognition System

Mono-font Cursive Arabic Text Recognition Using Speech Recognition System Mono-font Cursive Arabic Text Recognition Using Speech Recognition System M.S. Khorsheed Computer & Electronics Research Institute, King AbdulAziz City for Science and Technology (KACST) PO Box 6086, Riyadh

More information

Jinkun Liu Xinhua Wang. Advanced Sliding Mode Control for Mechanical Systems. Design, Analysis and MATLAB Simulation

Jinkun Liu Xinhua Wang. Advanced Sliding Mode Control for Mechanical Systems. Design, Analysis and MATLAB Simulation Jinkun Liu Xinhua Wang Advanced Sliding Mode Control for Mechanical Systems Design, Analysis and MATLAB Simulation Jinkun Liu Xinhua Wang Advanced Sliding Mode Control for Mechanical Systems Design, Analysis

More information

Accessibility Guidelines

Accessibility Guidelines Accessibility s Table 1: Accessibility s The guidelines in this section should be followed throughout the course, including in word processing documents, spreadsheets, presentations, (portable document

More information

Ten common PDF accessibility errors with solutions

Ten common PDF accessibility errors with solutions Ten common PDF accessibility errors with solutions Table of Contents List of Figures...2 1. Why bother about accessible PDFs?...3 2. Common PDF accessibility errors and their solutions...3 2.1 PDF not

More information

COPYRIGHTED MATERIAL. Starting Strong with Visual C# 2005 Express Edition

COPYRIGHTED MATERIAL. Starting Strong with Visual C# 2005 Express Edition 1 Starting Strong with Visual C# 2005 Express Edition Okay, so the title of this chapter may be a little over the top. But to be honest, the Visual C# 2005 Express Edition, from now on referred to as C#

More information

Today s Hall of Fame and Shame is a comparison of two generations of Google Advanced Search. This is the old interface.

Today s Hall of Fame and Shame is a comparison of two generations of Google Advanced Search. This is the old interface. 1 Today s Hall of Fame and Shame is a comparison of two generations of Google Advanced Search. This is the old interface. 2 And this is the new interface. (If you can t read the image, go to http://www.google.com/advanced_search.)

More information

PART I. The Lay of the Land. CHAPTER 1: Exploring SharePoint Designer

PART I. The Lay of the Land. CHAPTER 1: Exploring SharePoint Designer PART I RI AL The Lay of the Land CO PY RI GH TE D MA TE CHAPTER 1: Exploring SharePoint Designer 1Exploring SharePoint Designer WHAT YOU WILL LEARN IN THIS CHAPTER How SharePoint Designer fits into Microsoft

More information

Computer Organization And Design, Fourth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series In Computer Architecture And Design)

Computer Organization And Design, Fourth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series In Computer Architecture And Design) Computer Organization And Design, Fourth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series In Computer Architecture And Design) PDF This Fourth Revised Edition of Computer Organization

More information

WordPress User Interface Expert Review Gabriel White Version 1.0 DRAFT March, 2005

WordPress User Interface Expert Review Gabriel White Version 1.0 DRAFT March, 2005 WordPress User Interface Expert Review Gabriel White Version 1.0 DRAFT March, 2005 WordPress User Interface Expert Review, Gabriel White (v1.0 Draft, March, 2005) 2 Copyright Copyright Gabriel White, 2005.

More information

Granularity of Documentation

Granularity of Documentation - compound Hasbergsvei 36 P.O. Box 235, NO-3603 Kongsberg Norway gaudisite@gmail.com This paper has been integrated in the book Systems Architecting: A Business Perspective", http://www.gaudisite.nl/sabp.html,

More information

Annotation for the Semantic Web During Website Development

Annotation for the Semantic Web During Website Development Annotation for the Semantic Web During Website Development Peter Plessers and Olga De Troyer Vrije Universiteit Brussel, Department of Computer Science, WISE, Pleinlaan 2, 1050 Brussel, Belgium {Peter.Plessers,

More information

Interface Redesign: Thomson.com

Interface Redesign: Thomson.com Interface Redesign: Thomson.com December 7, 2004 Anne Finlayson Interface Designer INP Associates Interface Redesign: Thomson.com page 2 Table of Contents Executive Summary... 3 Analysis of Current Interface...

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK REVIEW PAPER ON IMPLEMENTATION OF DOCUMENT ANNOTATION USING CONTENT AND QUERYING

More information

Primitive roots of bi-periodic infinite pictures

Primitive roots of bi-periodic infinite pictures Primitive roots of bi-periodic infinite pictures Nicolas Bacquey To cite this version: Nicolas Bacquey. Primitive roots of bi-periodic infinite pictures. Words 5, Sep 5, Kiel, Germany. Words 5, Local Proceedings.

More information

Wikipedia Retrieval Task ImageCLEF 2011

Wikipedia Retrieval Task ImageCLEF 2011 Wikipedia Retrieval Task ImageCLEF 2011 Theodora Tsikrika University of Applied Sciences Western Switzerland, Switzerland Jana Kludas University of Geneva, Switzerland Adrian Popescu CEA LIST, France Outline

More information

No more than six tables, pictures or figures can be considered for the paper version, although

No more than six tables, pictures or figures can be considered for the paper version, although Archaeometry Thank you for your interest in Archaeometry, we look forward to receiving your paper. We are aiming for the printed edition of Archaeometry to publish papers of no more than 15 pages. We have

More information

Easy Chair Online Conference Submission, Tracking and Distribution Process: Getting Started

Easy Chair Online Conference Submission, Tracking and Distribution Process: Getting Started Easy Chair Online Conference Submission, Tracking and Distribution Process: Getting Started AMS WMC 2014 Click on play to begin show AMS Conference Information You can always access information about the

More information

Time-Surfer: Time-Based Graphical Access to Document Content

Time-Surfer: Time-Based Graphical Access to Document Content Time-Surfer: Time-Based Graphical Access to Document Content Hector Llorens 1,EstelaSaquete 1,BorjaNavarro 1,andRobertGaizauskas 2 1 University of Alicante, Spain {hllorens,stela,borja}@dlsi.ua.es 2 University

More information

Numbered Sequence Detection in Documents

Numbered Sequence Detection in Documents Numbered Sequence Detection in Documents Hervé Déjean Xerox Research Centre Europe Herve.Dejean@xrce.xerox.com ABSTRACT We present in this work a method to detect numbered sequences in a document. The

More information

Migrating to the new IBM WebSphere Commerce Suite Platform. The Intelligent Approach for the E-Commerce Transition ELLUMINIS CONSULTING GROUP

Migrating to the new IBM WebSphere Commerce Suite Platform. The Intelligent Approach for the E-Commerce Transition ELLUMINIS CONSULTING GROUP WHITEPAPER ELLUMINIS CONSULTING GROUP The Intelligent Approach for the E-Commerce Transition Migrating to the new IBM WebSphere Commerce Suite Platform AN ELLUMINIS CONSULTING GROUP WHITEPAPER Migrating

More information

Since its earliest days about 14 years ago Access has been a relational

Since its earliest days about 14 years ago Access has been a relational Storing and Displaying Data in Access Since its earliest days about 14 years ago Access has been a relational database program, storing data in tables and using its own queries, forms, and reports to sort,

More information

A team LEAP Response is required for this event and must be submitted with the event entry (see LEAP Program).

A team LEAP Response is required for this event and must be submitted with the event entry (see LEAP Program). WEBSITE DESIGN OVERVIEW Participants are required to design, build, and launch a website that features the team's ability to incorporate the elements of website design, graphic layout, and proper coding

More information

Pattern Recognition Using Graph Theory

Pattern Recognition Using Graph Theory ISSN: 2278 0211 (Online) Pattern Recognition Using Graph Theory Aditya Doshi Department of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India Manmohan Jangid Department of

More information

Informativeness for Adhoc IR Evaluation:

Informativeness for Adhoc IR Evaluation: Informativeness for Adhoc IR Evaluation: A measure that prevents assessing individual documents Romain Deveaud 1, Véronique Moriceau 2, Josiane Mothe 3, and Eric SanJuan 1 1 LIA, Univ. Avignon, France,

More information

Sample Question Paper. Software Testing (ETIT 414)

Sample Question Paper. Software Testing (ETIT 414) Sample Question Paper Software Testing (ETIT 414) Q 1 i) What is functional testing? This type of testing ignores the internal parts and focus on the output is as per requirement or not. Black-box type

More information

Decomposing and Sketching 3D Objects by Curve Skeleton Processing

Decomposing and Sketching 3D Objects by Curve Skeleton Processing Decomposing and Sketching 3D Objects by Curve Skeleton Processing Luca Serino, Carlo Arcelli, and Gabriella Sanniti di Baja Institute of Cybernetics E. Caianiello, CNR, Naples, Italy {l.serino,c.arcelli,g.sannitidibaja}@cib.na.cnr.it

More information

Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors

Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors G. Chen 1, M. Kandemir 1, I. Kolcu 2, and A. Choudhary 3 1 Pennsylvania State University, PA 16802, USA 2 UMIST,

More information

A Rule Chaining Architecture Using a Correlation Matrix Memory. James Austin, Stephen Hobson, Nathan Burles, and Simon O Keefe

A Rule Chaining Architecture Using a Correlation Matrix Memory. James Austin, Stephen Hobson, Nathan Burles, and Simon O Keefe A Rule Chaining Architecture Using a Correlation Matrix Memory James Austin, Stephen Hobson, Nathan Burles, and Simon O Keefe Advanced Computer Architectures Group, Department of Computer Science, University

More information

Combining Classifiers for Web Violent Content Detection and Filtering

Combining Classifiers for Web Violent Content Detection and Filtering Combining Classifiers for Web Violent Content Detection and Filtering Radhouane Guermazi 1, Mohamed Hammami 2, and Abdelmajid Ben Hamadou 1 1 Miracl-Isims, Route Mharza Km 1 BP 1030 Sfax Tunisie 2 Miracl-Fss,

More information

Guideline for Creating Accessible Public Documents 1

Guideline for Creating Accessible Public Documents 1 Guideline for Creating Accessible Public Documents 1 I. Word Documents 2 Estimates indicate that in the United States, 12.5 million people rely on some sort of assistive technology to access electronic

More information

FIGURE DETECTION AND PART LABEL EXTRACTION FROM PATENT DRAWING IMAGES. Jaco Cronje

FIGURE DETECTION AND PART LABEL EXTRACTION FROM PATENT DRAWING IMAGES. Jaco Cronje FIGURE DETECTION AND PART LABEL EXTRACTION FROM PATENT DRAWING IMAGES Jaco Cronje Council for Scientific and Industrial Research, Pretoria, South Africa Email: jcronje@csir.co.za ABSTRACT The US Patent

More information

Lab 3: Digitizing in ArcMap

Lab 3: Digitizing in ArcMap Lab 3: Digitizing in ArcMap What You ll Learn: In this Lab you ll be introduced to basic digitizing techniques using ArcMap. You should read Chapter 4 in the GIS Fundamentals textbook before starting this

More information

Designing and Printing Address Labels

Designing and Printing Address Labels Designing and Printing Address Labels This file will show you one way to use your computer for producing stick-on address labels, helping you to reduce the time involved in preparing the year's set of

More information

Analyzing PDFs with Citavi 6

Analyzing PDFs with Citavi 6 Analyzing PDFs with Citavi 6 Introduction Just Like on Paper... 2 Methods in Detail Highlight Only (Yellow)... 3 Highlighting with a Main Idea (Red)... 4 Adding Direct Quotations (Blue)... 5 Adding Indirect

More information

GUIDE TO CERTIFICATION

GUIDE TO CERTIFICATION GUIDE TO CERTIFICATION December 2017 *Note this document is temporary, and the content will soon appear on peer.gbci.org, at the latest November 30, 2015.* CONGRATULATIONS ON YOUR DECISION TO PURSUE PEER

More information

Making Accessible Documents. Microsoft Office: Word, PowerPoint

Making Accessible Documents. Microsoft Office: Word, PowerPoint Making Accessible Documents Microsoft Office: Word, PowerPoint Purpose of Instruction Provide tips and strategies on creating documents accessible to individuals with disabilities. Accessibility tools

More information

THE weighting functions of information retrieval [1], [2]

THE weighting functions of information retrieval [1], [2] A Comparative Study of MySQL Functions for XML Element Retrieval Chuleerat Jaruskulchai, Member, IAENG, and Tanakorn Wichaiwong, Member, IAENG Abstract Due to the ever increasing information available

More information

Chapter 2 Text Processing with the Command Line Interface

Chapter 2 Text Processing with the Command Line Interface Chapter 2 Text Processing with the Command Line Interface Abstract This chapter aims to help demystify the command line interface that is commonly used in UNIX and UNIX-like systems such as Linux and Mac

More information

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent

More information

ABBYY FineReader 14. User s Guide ABBYY Production LLC. All rights reserved.

ABBYY FineReader 14. User s Guide ABBYY Production LLC. All rights reserved. ABBYY FineReader 14 User s Guide 2017 ABBYY Production LLC All rights reserved Information in this document is subject to change without notice and does not bear any commitment on the part of ABBYY The

More information

Parallel Concordancing and Translation. Michael Barlow

Parallel Concordancing and Translation. Michael Barlow [Translating and the Computer 26, November 2004 [London: Aslib, 2004] Parallel Concordancing and Translation Michael Barlow Dept. of Applied Language Studies and Linguistics University of Auckland Auckland,

More information

Volume-4, Issue-1,May Accepted and Published Manuscript

Volume-4, Issue-1,May Accepted and Published Manuscript Available online at International Journal of Research Publications Volume-4, Issue-1,May 2018 Accepted and Published Manuscript Comparison of Website Evaluation after Ranking Improvement and Implementation

More information

A Step-by-step guide to creating a Professional PowerPoint Presentation

A Step-by-step guide to creating a Professional PowerPoint Presentation Quick introduction to Microsoft PowerPoint A Step-by-step guide to creating a Professional PowerPoint Presentation Created by Cruse Control creative services Tel +44 (0) 1923 842 295 training@crusecontrol.com

More information

Book Recommendation based on Social Information

Book Recommendation based on Social Information Book Recommendation based on Social Information Chahinez Benkoussas and Patrice Bellot LSIS Aix-Marseille University chahinez.benkoussas@lsis.org patrice.bellot@lsis.org Abstract : In this paper, we present

More information

AROMA results for OAEI 2009

AROMA results for OAEI 2009 AROMA results for OAEI 2009 Jérôme David 1 Université Pierre-Mendès-France, Grenoble Laboratoire d Informatique de Grenoble INRIA Rhône-Alpes, Montbonnot Saint-Martin, France Jerome.David-at-inrialpes.fr

More information

Using XML Logical Structure to Retrieve (Multimedia) Objects

Using XML Logical Structure to Retrieve (Multimedia) Objects Using XML Logical Structure to Retrieve (Multimedia) Objects Zhigang Kong and Mounia Lalmas Queen Mary, University of London {cskzg,mounia}@dcs.qmul.ac.uk Abstract. This paper investigates the use of the

More information

Grid. Skeletal framework to organize information making it clear and optimally accessible

Grid. Skeletal framework to organize information making it clear and optimally accessible Grid Skeletal framework to organize information making it clear and optimally accessible Space When typographic elements introduced in space > divisions Letterform: centered=motionless; off-center > velocity;

More information

The 12 most common newsletter design mistakes

The 12 most common newsletter design mistakes The 12 most common newsletter design mistakes www.targetmarketingnetwork.com By: Roger C. Parker Your newsletter s success depends on its design. An attractive, easy to read newsletter encourages readers

More information

Modeling and Simulation in Scilab/Scicos with ScicosLab 4.4

Modeling and Simulation in Scilab/Scicos with ScicosLab 4.4 Modeling and Simulation in Scilab/Scicos with ScicosLab 4.4 Stephen L. Campbell, Jean-Philippe Chancelier and Ramine Nikoukhah Modeling and Simulation in Scilab/Scicos with ScicosLab 4.4 Second Edition

More information

Graph Matching: Fast Candidate Elimination Using Machine Learning Techniques

Graph Matching: Fast Candidate Elimination Using Machine Learning Techniques Graph Matching: Fast Candidate Elimination Using Machine Learning Techniques M. Lazarescu 1,2, H. Bunke 1, and S. Venkatesh 2 1 Computer Science Department, University of Bern, Switzerland 2 School of

More information

PROMOTIONAL MARKETING

PROMOTIONAL MARKETING PROMOTIONAL MARKETING OVERVIEW Participants create marketing tools that could be used in a TSA Promotional Kit. The theme and required elements for this event will be posted on the TSA website under Competitions/Themes

More information

Revisiting the Upper Bounding Process in a Safe Branch and Bound Algorithm

Revisiting the Upper Bounding Process in a Safe Branch and Bound Algorithm Revisiting the Upper Bounding Process in a Safe Branch and Bound Algorithm Alexandre Goldsztejn 1, Yahia Lebbah 2,3, Claude Michel 3, and Michel Rueher 3 1 CNRS / Université de Nantes 2, rue de la Houssinière,

More information

Part 1. Getting Started. Chapter 1 Creating a Simple Report 3. Chapter 2 PROC REPORT: An Introduction 13. Chapter 3 Creating Breaks 57

Part 1. Getting Started. Chapter 1 Creating a Simple Report 3. Chapter 2 PROC REPORT: An Introduction 13. Chapter 3 Creating Breaks 57 Part 1 Getting Started Chapter 1 Creating a Simple Report 3 Chapter 2 PROC REPORT: An Introduction 13 Chapter 3 Creating Breaks 57 Chapter 4 Only in the LISTING Destination 75 Chapter 5 Creating and Modifying

More information

Simulation of Zhang Suen Algorithm using Feed- Forward Neural Networks

Simulation of Zhang Suen Algorithm using Feed- Forward Neural Networks Simulation of Zhang Suen Algorithm using Feed- Forward Neural Networks Ritika Luthra Research Scholar Chandigarh University Gulshan Goyal Associate Professor Chandigarh University ABSTRACT Image Skeletonization

More information

EBSCO Discovery Service (EDS): a Research Study for NYU Libraries!!!!!!

EBSCO Discovery Service (EDS): a Research Study for NYU Libraries!!!!!! EBSCO Discovery Service (EDS): a Research Study for NYU Libraries Juliana Culbert, Houda El-Mimouni, Nadaleen Tempelman-Kluit NYU Bobst Library, UX Department Spring 2014 Table of Contents Executive Summary

More information

Parallel Evaluation of Hopfield Neural Networks

Parallel Evaluation of Hopfield Neural Networks Parallel Evaluation of Hopfield Neural Networks Antoine Eiche, Daniel Chillet, Sebastien Pillement and Olivier Sentieys University of Rennes I / IRISA / INRIA 6 rue de Kerampont, BP 818 2232 LANNION,FRANCE

More information

Automatic visual recognition for metro surveillance

Automatic visual recognition for metro surveillance Automatic visual recognition for metro surveillance F. Cupillard, M. Thonnat, F. Brémond Orion Research Group, INRIA, Sophia Antipolis, France Abstract We propose in this paper an approach for recognizing

More information

6.001 Notes: Section 8.1

6.001 Notes: Section 8.1 6.001 Notes: Section 8.1 Slide 8.1.1 In this lecture we are going to introduce a new data type, specifically to deal with symbols. This may sound a bit odd, but if you step back, you may realize that everything

More information

Character Recognition

Character Recognition Character Recognition 5.1 INTRODUCTION Recognition is one of the important steps in image processing. There are different methods such as Histogram method, Hough transformation, Neural computing approaches

More information

Matching and Alignment: What is the Cost of User Post-match Effort?

Matching and Alignment: What is the Cost of User Post-match Effort? Matching and Alignment: What is the Cost of User Post-match Effort? (Short paper) Fabien Duchateau 1 and Zohra Bellahsene 2 and Remi Coletta 2 1 Norwegian University of Science and Technology NO-7491 Trondheim,

More information

Creating Accessible Documents in Adobe Acrobat Pro 9

Creating Accessible Documents in Adobe Acrobat Pro 9 Creating Accessible Documents in Adobe Acrobat Pro 9 Create an Electronic Copy of the Book 1. Remove the binding from the book so it can be placed in an automatic document feeder. This requires a fairly

More information

Script Characterization in the Old Slavic Documents

Script Characterization in the Old Slavic Documents Script Characterization in the Old Slavic Documents Darko Brodić 1 2, Zoran N. Milivojević,andČedomir A. Maluckov1 1 University of Belgrade, Technical Faculty in Bor, Vojske Jugoslavije 12, 19210 Bor,

More information

Automated data entry system: performance issues

Automated data entry system: performance issues Automated data entry system: performance issues George R. Thoma, Glenn Ford National Library of Medicine, Bethesda, Maryland 20894 ABSTRACT This paper discusses the performance of a system for extracting

More information

Keyword Spotting in Document Images through Word Shape Coding

Keyword Spotting in Document Images through Word Shape Coding 2009 10th International Conference on Document Analysis and Recognition Keyword Spotting in Document Images through Word Shape Coding Shuyong Bai, Linlin Li and Chew Lim Tan School of Computing, National

More information

From Passages into Elements in XML Retrieval

From Passages into Elements in XML Retrieval From Passages into Elements in XML Retrieval Kelly Y. Itakura David R. Cheriton School of Computer Science, University of Waterloo 200 Univ. Ave. W. Waterloo, ON, Canada yitakura@cs.uwaterloo.ca Charles

More information

Toward a robust 2D spatio-temporal self-organization

Toward a robust 2D spatio-temporal self-organization Toward a robust 2D spatio-temporal self-organization Thomas Girod, Laurent Bougrain and Frédéric Alexandre LORIA-INRIA Campus Scientifique - B.P. 239 F-54506 Vandœuvre-lès-Nancy Cedex, FRANCE Abstract.

More information

1.8 Database and data modelling

1.8 Database and data modelling Introduction Organizations often maintain large amounts of data, which are generated as a result of day-to-day operations. A database is an organized form of such data. It may consist of one or more related

More information

Web publishing training pack Level 3 Forms

Web publishing training pack Level 3 Forms Web publishing training pack Level 3 Forms Learning objective: Forms for submitting data - create and manage forms where data is saved in the Web Publishing System (e.g. questionnaire, registration, feedback).

More information

Content-Based Image Retrieval with LIRe and SURF on a Smartphone-Based Product Image Database

Content-Based Image Retrieval with LIRe and SURF on a Smartphone-Based Product Image Database Content-Based Image Retrieval with LIRe and SURF on a Smartphone-Based Product Image Database Kai Chen 1 and Jean Hennebert 2 1 University of Fribourg, DIVA-DIUF, Bd. de Pérolles 90, 1700 Fribourg, Switzerland

More information

Throughout the chapter, we will assume that the reader is familiar with the basics of phylogenetic trees.

Throughout the chapter, we will assume that the reader is familiar with the basics of phylogenetic trees. Chapter 7 SUPERTREE ALGORITHMS FOR NESTED TAXA Philip Daniel and Charles Semple Abstract: Keywords: Most supertree algorithms combine collections of rooted phylogenetic trees with overlapping leaf sets

More information