Faster Segmentation-Free Handwritten Chinese Text Recognition with Character Decompositions. Théodore Bluche & Ronaldo Messina
|
|
- Penelope Summers
- 6 years ago
- Views:
Transcription
1 Faster Segmentation-Free Handwritten Chinese Text Recognition with Character Decompositions Théodore Bluche & Ronaldo Messina
2 Handwritten Chinese Text Recognition Main difficulty: large number of characters (4000+) in Chinese (and they are complex in shape) Has long been and still is mostly character-based (character segmentation, then recognition) that approach is now deprecated for most scripts (Latin, Arabic, ): we recognize text lines directly
3 Segmentation-free Chinese Text Recognition Messina, Ronaldo, and Jerome Louradour. "Segmentation-free handwritten Chinese text recognition with LSTM-RNN." Document Analysis and Recognition (ICDAR), th International Conference on. IEEE, Recognition from line images with MDLSTM-RNNs Training with CTC (4000+ labels)
4 Segmentation-free Chinese Text Recognition The size of the character set is an issue! - The last linear layer involves a product with a huge matrix The softmax should normalize a lot of activations It makes this model very slow for Chinese text, compared to Latin or Arabic scripts!
5 Usual Time Distribution Most of the time is spent in the first LSTM - >50% of total time ~20ms / line
6 Processing Time for Chinese HWR The first LSTM still takes ~20ms / line but over 60% of the processing time is spent in the last linear layer (big matrix multiplication), collapse and even softmax takes 25%!! Prohibitive in production models
7 How would a human do? The model transcribes lines of text - looks at the image - types its contents - by choosing each character among a list of That d be quite long for a human too to type a transcript on a 4000-key keyboard...
8 Not everyone can afford (or would even want to use) that keyboard!
9 Outline of this talk Chinese Input Methods - character decompositions Method Overview Results Conclusion
10 Outline of this talk Chinese Input Methods - character decompositions Method Overview Results Conclusion
11 Input Methods for Chinese Input method = simplify entering Chinese text on QWERTY keyboards Sequences of keys are mapped to Chinese characters i.e. reduce the alphabet from several thousands to a few dozens Two main categories: Phonetic-based (e.g. pinyin) ma: 傌, 马, 亇,么 Graphic-based each key (more or less) represents a component of a character (e.g. Cangjie, Wubi)
12 Cangjie Graphical decomposition 24 basic code units X for collisions or "difficult to decompose" parts Z auxiliar for entering punctuation Rules Most have 4 codes Direction Connected forms Un-connected forms...and Exceptions Fixed decompositions Arbitrary codes for characters that cannot be decomposed (卍, 姊, 臼, )
13 Wubi Graphical decomposition 25 codes Keyboard is divided according to the type of stroke (H,V, At most five, but many with less than 4 codes Most characters are uniquely defined Rules for disambiguation /, \, hook)
14 Outline of this talk Chinese Input Methods - character decompositions Method Overview Results Conclusion
15 Proposed Method Instead of recognizing Chinese characters, we propose to recognize their decomposition according to an input method (i.e. the neural network output the sequences of keys you should write to obtain the right transcription) Training: The ground-truth is converted into sequences of codes A character delimiter symbol is inserted between the codes MDLSTM trained with CTC to predict the sequences of codes (we added two BLSTM layers on top of the network to help it capture the code dependencies) Recognition: the codes should be mapped back to characters The sub-sequences between delimiters may be mapped to characters with the input methods (wrong codes sequences are mapped to an error) (more robust) a transducer accepting valid code sequences and outputting characters
16 Arbitrary decompositions Goal: Assess whether the network's internal representations should be related to the graphical aspect of the characters Fixed length: we just adapt the number of "codes" wrt charset 522 = = = 4096 Random assignments Similar characters do not necessarily have a similar encoding (unlike the chosen input methods)
17 Examples 加快步伐 Arb-2: Qr Xy Mi ag qz Arb-3: EML DJC MGL MID DEI Arb-4: GGGF EFGE DAGA FHBF CECG cangjie: K S R P D K Y L M H O I wubi: lkg nakg hir wat
18 Outline of this talk Chinese Input Methods - character decompositions Method Overview Results Conclusion
19 Database CASIA off-line handwriting database HWDB2.0 - HWDB2.2 (segmented into lines) characters (85/15)% random split for (train/valid) 2666 character classes in training set Evaluation set: ICDAR 2013 Task 4 Held-out part of CASIA 3397 lines characters 1379 character classes Line-segmented
20 Per-layer timings
21 Results (validation) Model Codes (%ED) Baseline Character (%ED) 5.1 Cangjie Wubi Arb Arb Arb ED: edit distance, or character error rate Train/valid = 85/15% CASIA-2.x
22 Results (test) Model NN+Map (%CER) NN+LM (%CER) Baseline Cangjie Wubi Arb Arb Arb ICDAR CER: character error rate - LM: here a character 3-gram test = ICDAR 2013 Task 4
23 加快步伐 (RNN) l k g n n w y h i i u h a t 加 快 Correct codes 步: h i r 伐: w a t "Near" codes 淼: i i i u 越: f h a t 戏: c a t 找: r a t (RNN+LM) wubi sample 加快步伐
24 加快步伐 (RNN) K S R P D K Y M V H O I 加 快 迓 伐 NB: "O I" maps to 3 different "characters" (亽, 仏, 伐) 步: Y M L H (RNN+LM) 加 快 步 伐 cangjie sample
25 加快步伐 (RNN) Q r X y A z b f q Z 加 快 烛 线 NB: 步 : M i 伐 : a G (RNN+LM) 加 快 速 线 Arb-2 sample
26 Outline of this talk Chinese Input Methods - character decompositions Method Overview Results Conclusion
27 Conclusion RNNs can predict sequences instead of characters Shape matters for decompositions We achieved a 4x speedup with a mild degradation wrt baseline Processing speed is now at par with Latin script languages Smaller footprint models
28 Future research directions Optimize decompositions for recognition performance wubi/cangjie were designed for ease of use (humans) Mixed traditional/simplified Chinese models More data re-use Alternative encodings Universal multi-lingual model with Unicode points
29 Thank you! Questions?
A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition
A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition Théodore Bluche, Hermann Ney, Christopher Kermorvant SLSP 14, Grenoble October
More informationDeep Neural Networks Applications in Handwriting Recognition
Deep Neural Networks Applications in Handwriting Recognition 2 Who am I? Théodore Bluche PhD defended at Université Paris-Sud last year Deep Neural Networks for Large Vocabulary Handwritten
More informationDeep Neural Networks Applications in Handwriting Recognition
Deep Neural Networks Applications in Handwriting Recognition Théodore Bluche theodore.bluche@gmail.com São Paulo Meetup - 9 Mar. 2017 2 Who am I? Théodore Bluche PhD defended
More informationBoosting Handwriting Text Recognition in Small Databases with Transfer Learning
Boosting Handwriting Text Recognition in Small Databases with Transfer Learning José Carlos Aradillas University of Seville Seville, Spain 41092 Email: jaradillas@us.es Juan José Murillo-Fuentes University
More informationarxiv: v3 [cs.cv] 23 Aug 2016
Scan, Attend and Read: End-to-End Handwritten Paragraph Recognition with MDLSTM Attention arxiv:1604.03286v3 [cs.cv] 23 Aug 2016 Théodore Bluche Jérôme Louradour Ronaldo Messina A2iA SAS 39 rue de la Bienfaisance
More information3 Keynote Speech:
3 Keynote Speech: Digital Tools for Chinese Language Learning and Teaching: CKC Code and its Online Dictionary By Dr. Esther S. C. Chan & Dr. K. H. Tse The Hong Kong Institute of Education When we study,
More informationHandwritten Gurumukhi Character Recognition by using Recurrent Neural Network
139 Handwritten Gurumukhi Character Recognition by using Recurrent Neural Network Harmit Kaur 1, Simpel Rani 2 1 M. Tech. Research Scholar (Department of Computer Science & Engineering), Yadavindra College
More informationJoint Line Segmentation and Transcription for End-to-End Handwritten Paragraph Recognition
Joint Line Segmentation and Transcription for End-to-End Handwritten Paragraph Recognition Théodore Bluche A2iA SAS 39 rue de la Bienfaisance 75008 Paris tb@a2ia.com Abstract Offline handwriting recognition
More informationExplicit fuzzy modeling of shapes and positioning for handwritten Chinese character recognition
2009 0th International Conference on Document Analysis and Recognition Explicit fuzzy modeling of and positioning for handwritten Chinese character recognition Adrien Delaye - Eric Anquetil - Sébastien
More informationOffline Handwriting Recognition on Devanagari using a new Benchmark Dataset
Offline Handwriting Recognition on Devanagari using a new Benchmark Dataset Kartik Dutta, Praveen Krishnan, Minesh Mathew and C.V. Jawahar CVIT, IIIT Hyderabad, India Email:{kartik.dutta, praveen.krishnan,
More informationInteractive Handwritten Text Recognition and Indexing of Historical Documents: the transcriptorum Project
Interactive Handwritten Text Recognition and ing of Historical Documents: the transcriptorum Project Alejandro H. Toselli ahector@prhlt.upv.es Pattern Recognition and Human Language Technology Reseach
More informationPenpower Handwriter for Mac User Manual
Penpower Handwriter for Mac User Manual Version: 6.2 Release: July, 2011 Edition: 3 Penpower Technology Ltd. Software User License Agreement You are licensed to legally use this software program ( the
More informationConvolution Neural Networks for Chinese Handwriting Recognition
Convolution Neural Networks for Chinese Handwriting Recognition Xu Chen Stanford University 450 Serra Mall, Stanford, CA 94305 xchen91@stanford.edu Abstract Convolutional neural networks have been proven
More informationMassive Scalability With InterSystems IRIS Data Platform
Massive Scalability With InterSystems IRIS Data Platform Introduction Faced with the enormous and ever-growing amounts of data being generated in the world today, software architects need to pay special
More informationHandwritten Text Recognition
Handwritten Text Recognition M.J. Castro-Bleda, S. España-Boquera, F. Zamora-Martínez Universidad Politécnica de Valencia Spain Avignon, 9 December 2010 Text recognition () Avignon Avignon, 9 December
More informationSupport for word-by-word, non-cursive handwriting
Decuma Latin 3.0 for SONY CLIÉ / PalmOS 5 Support for word-by-word, non-cursive handwriting developed by Decuma AB Copyright 2003 by Decuma AB. All rights reserved. Decuma is a trademark of Decuma AB in
More informationRecurrent Neural Networks
Recurrent Neural Networks 11-785 / Fall 2018 / Recitation 7 Raphaël Olivier Recap : RNNs are magic They have infinite memory They handle all kinds of series They re the basis of recent NLP : Translation,
More informationA comprehensive study of hybrid neural network hidden Markov model for offline handwritten Chinese text recognition
International Journal on Document Analysis and Recognition (IJDAR) https://doi.org/10.1007/s10032-018-0307-0 ORIGINAL PAPER A comprehensive study of hybrid neural network hidden Markov model for offline
More informationHandwriting recognition for IDEs with Unicode support
Technical Disclosure Commons Defensive Publications Series December 11, 2017 Handwriting recognition for IDEs with Unicode support Michal Luszczyk Sandro Feuz Follow this and additional works at: http://www.tdcommons.org/dpubs_series
More informationStart, Follow, Read: End-to-End Full-Page Handwriting Recognition
Start, Follow, Read: End-to-End Full-Page Handwriting Recognition Curtis Wigington 1,2, Chris Tensmeyer 1,2, Brian Davis 1, William Barrett 1, Brian Price 2, and Scott Cohen 2 1 Brigham Young University
More informationEND-TO-END CHINESE TEXT RECOGNITION
END-TO-END CHINESE TEXT RECOGNITION Jie Hu 1, Tszhang Guo 1, Ji Cao 2, Changshui Zhang 1 1 Department of Automation, Tsinghua University 2 Beijing SinoVoice Technology November 15, 2017 Presentation at
More informationCITY UNIVERSITY OF HONG KONG 香港城市大學. Enhanced Stroke-based Chinese Input Methods for Mobile Devices 強化的移動裝置中文筆劃輸入法
CITY UNIVERSITY OF HONG KONG 香港城市大學 Enhanced Stroke-based Chinese Input Methods for Mobile Devices 強化的移動裝置中文筆劃輸入法 Submitted to Department of Electronic Engineering 電子工程系 in Partial Fulfillment of the Requirements
More informationWho Said Anything About Punycode? I Just Want to Register an IDN.
ICANN Internet Users Workshop 28 March 2006 Wellington, New Zealand Who Said Anything About Punycode? I Just Want to Register an IDN. Cary Karp MuseDoma dotmuseum You don t really have to know anything
More informationPenpower Handwriter for Mac User Manual
Penpower Handwriter for Mac User Manual Version: 6.1 Release: February, 2009 Penpower Technology Ltd. Software User License Agreement You are licensed to legally use this software program ( the Software
More informationKeyboards for inputting Chinese Language: A study based on US Patents
From the SelectedWorks of Umakant Mishra April, 2005 Keyboards for inputting Chinese Language: A study based on US Patents Umakant Mishra Available at: https://works.bepress.com/umakant_mishra/11/ Keyboard
More informationMachine Learning 13. week
Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of
More informationText Recognition in Videos using a Recurrent Connectionist Approach
Author manuscript, published in "ICANN - 22th International Conference on Artificial Neural Networks, Lausanne : Switzerland (2012)" DOI : 10.1007/978-3-642-33266-1_22 Text Recognition in Videos using
More informationTowards a Robust OCR System for Indic Scripts
Towards a Robust OCR System for Indic Scripts Praveen Krishnan, Naveen Sankaran, Ajeet Kumar Singh, C. V. Jawahar Center for Visual Information Technology, IIIT Hyderabad, India. Abstract The current Optical
More informationUnsupervised Feature Learning for Optical Character Recognition
Unsupervised Feature Learning for Optical Character Recognition Devendra K Sahu and C. V. Jawahar Center for Visual Information Technology, IIIT Hyderabad, India. Abstract Most of the popular optical character
More information1. Question about how to choose a suitable program. 2. Contrasting and analyzing fairness of these two programs
MWG/2-N8 音码 :Phonetic code 形码 :Graphic code 义码 :Semantic code A contrastive study on Phonetic code program and Graphic code program Liang Jinbao March 23, 2018 1. Question about how to choose a suitable
More informationStructured Attention Networks
Structured Attention Networks Yoon Kim Carl Denton Luong Hoang Alexander M. Rush HarvardNLP ICLR, 2017 Presenter: Chao Jiang ICLR, 2017 Presenter: Chao Jiang 1 / Outline 1 Deep Neutral Networks for Text
More informationLearning Spatial-Semantic Context with Fully Convolutional Recurrent Network for Online Handwritten Chinese Text Recognition
Learning Spatial-Semantic Context with Fully Convolutional Recurrent Network for Online Handwritten Chinese Text Recognition Zecheng Xie 1, Zenghui Sun 1, Lianwen Jin 1, Hao Ni 24, Terry Lyons 34 1 College
More informationLBSC 690: Information Technology Lecture 05 Structured data and databases
LBSC 690: Information Technology Lecture 05 Structured data and databases William Webber CIS, University of Maryland Spring semester, 2012 Interpreting bits "my" 13.5801 268 010011010110 3rd Feb, 2014
More informationPage 1. Interface Input Modalities. Lecture 5a: Advanced Input. Keyboard input. Keyboard input
Interface Input Modalities Lecture 5a: Advanced Input How can a user give input to an interface? keyboard mouse, touch pad joystick touch screen pen / stylus speech other more error! harder! (?) CS 530:
More informationPan-Unicode Fonts. Text Layout Summit 2007 Glasgow, July 4-6. Ben Laenen, DejaVu Fonts
Pan-Unicode Fonts Text Layout Summit 2007 Glasgow, July 4-6 Ben Laenen, DejaVu Fonts Introduction Feature request last Friday for DejaVu: Request for Khmer characters U+1780-17DD, 17E0-17E9, 17F0-17F9:
More informationA Touching Character Database from Chinese Handwriting for Assessing Segmentation Algorithms
2012 International Conference on Frontiers in Handwriting Recognition A Touching Character Database from Chinese Handwriting for Assessing Segmentation Algorithms Liang Xu, Fei Yin, Qiu-Feng Wang, Cheng-Lin
More informationOnline Bangla Handwriting Recognition System
1 Online Bangla Handwriting Recognition System K. Roy Dept. of Comp. Sc. West Bengal University of Technology, BF 142, Saltlake, Kolkata-64, India N. Sharma, T. Pal and U. Pal Computer Vision and Pattern
More informationDeep Neural Networks for Recognizing Online Handwritten Mathematical Symbols
Deep Neural Networks for Recognizing Online Handwritten Mathematical Symbols Hai Dai Nguyen 1, Anh Duc Le 2 and Masaki Nakagawa 3 Tokyo University of Agriculture and Technology 2-24-16 Nakacho, Koganei-shi,
More informationA semi-incremental recognition method for on-line handwritten Japanese text
2013 12th International Conference on Document Analysis and Recognition A semi-incremental recognition method for on-line handwritten Japanese text Cuong Tuan Nguyen, Bilan Zhu and Masaki Nakagawa Department
More informationRobust line segmentation for handwritten documents
Robust line segmentation for handwritten documents Kamal Kuzhinjedathu, Harish Srinivasan and Sargur Srihari Center of Excellence for Document Analysis and Recognition (CEDAR) University at Buffalo, State
More informationCASIA-OLHWDB1: A Database of Online Handwritten Chinese Characters
2009 10th International Conference on Document Analysis and Recognition CASIA-OLHWDB1: A Database of Online Handwritten Chinese Characters Da-Han Wang, Cheng-Lin Liu, Jin-Lun Yu, Xiang-Dong Zhou National
More informationA Novel Approach to On-Line Handwriting Recognition Based on Bidirectional Long Short-Term Memory Networks
A Novel Approach to n-line Handwriting Recognition Based on Bidirectional Long Short-Term Memory Networks Marcus Liwicki 1 Alex Graves 2 Horst Bunke 1 Jürgen Schmidhuber 2,3 1 nst. of Computer Science
More informationanyocr: A Sequence Learning Based OCR System for Unlabeled Historical Documents
anyocr: A Sequence Learning Based OCR System for Unlabeled Historical Documents Martin Jenckel University of Kaiserslautern German Research Center for Artificial Intelligence (DFKI) Kaiserslautern, Germany.
More informationOn-line handwriting recognition using Chain Code representation
On-line handwriting recognition using Chain Code representation Final project by Michal Shemesh shemeshm at cs dot bgu dot ac dot il Introduction Background When one preparing a first draft, concentrating
More informationSEVERAL METHODS OF FEATURE EXTRACTION TO HELP IN OPTICAL CHARACTER RECOGNITION
SEVERAL METHODS OF FEATURE EXTRACTION TO HELP IN OPTICAL CHARACTER RECOGNITION Binod Kumar Prasad * * Bengal College of Engineering and Technology, Durgapur, W.B., India. Rajdeep Kundu 2 2 Bengal College
More informationConditioned Generation
CS11-747 Neural Networks for NLP Conditioned Generation Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Language Models Language models are generative models of text s ~ P(x) The Malfoys! said
More informationOn-line One Stroke Character Recognition Using Directional Features
On-line One Stroke Character Recognition Using Directional Features Blind for review process 1 1 Blind for review process Abstract. This paper presents a method based on directional features for recognizing
More informationA Technique for Classification of Printed & Handwritten text
123 A Technique for Classification of Printed & Handwritten text M.Tech Research Scholar, Computer Engineering Department, Yadavindra College of Engineering, Punjabi University, Guru Kashi Campus, Talwandi
More informationOCRoRACT: A Sequence Learning OCR System Trained on Isolated Characters
OCRoRACT: A Sequence Learning OCR System Trained on Isolated Characters Adnan Ul-Hasan* University of Kaiserslautern, Kaiserslautern, Germany. Email: adnan@cs.uni-kl.de Syed Saqib Bukhari* German Research
More information[February 1, 2017] Neural Networks for Dummies Rolf van Gelder, Eindhoven, NL
[February 1, 2017] Neural Networks for Dummies 2017 Rolf van Gelder, Eindhoven, NL Contents Introduction... 2 How does it work?... 4 Training a Network... 4 Testing / Using a Network... 4 The Test Case:
More informationUsing Neural Cells to Improve Image Textual Line Segmentation
Using Neural Cells to Improve Image Textual Line Segmentation Patrick Schone; Family Search, Salt Lake City, Utah. Abstract Before one can begin applying automatic transcription processes to a document
More informationWhat is a good pen based application? HCI For Pen Based Computing. What is a good UI? Keystroke level model. Targeting
What is a good pen based application? HCI For Pen Based Computing The windows desktop and browser are NOT good pen based apps! Richard Anderson CSE 481 B Winter 2007 What is a good UI? How do you measure
More informationFastText. Jon Koss, Abhishek Jindal
FastText Jon Koss, Abhishek Jindal FastText FastText is on par with state-of-the-art deep learning classifiers in terms of accuracy But it is way faster: FastText can train on more than one billion words
More informationKeyword Spotting in Document Images through Word Shape Coding
2009 10th International Conference on Document Analysis and Recognition Keyword Spotting in Document Images through Word Shape Coding Shuyong Bai, Linlin Li and Chew Lim Tan School of Computing, National
More informationFine Classification of Unconstrained Handwritten Persian/Arabic Numerals by Removing Confusion amongst Similar Classes
2009 10th International Conference on Document Analysis and Recognition Fine Classification of Unconstrained Handwritten Persian/Arabic Numerals by Removing Confusion amongst Similar Classes Alireza Alaei
More informationRecurrent neural networks for polyphonic sound event detection in real life recordings
Recurrent neural networks for polyphonic sound event detection in real life recordings Giambattista Parascandolo, Heikki Huttunen, Tuomas Virtanen Tampere University of Technology giambattista.parascandolo@tut.fi
More informationClassification of Printed Chinese Characters by Using Neural Network
Classification of Printed Chinese Characters by Using Neural Network ATTAULLAH KHAWAJA Ph.D. Student, Department of Electronics engineering, Beijing Institute of Technology, 100081 Beijing, P.R.CHINA ABDUL
More informationToward Interlinking Asian Resources Effectively: Chinese to Korean Frequency-Based Machine Translation System
Toward Interlinking Asian Resources Effectively: Chinese to Korean Frequency-Based Machine Translation System Eun Ji Kim and Mun Yong Yi (&) Department of Knowledge Service Engineering, KAIST, Daejeon,
More informationBUILDING CORPORA OF TRANSCRIBED SPEECH FROM OPEN ACCESS SOURCES
BUILDING CORPORA OF TRANSCRIBED SPEECH FROM OPEN ACCESS SOURCES O.O. Iakushkin a, G.A. Fedoseev, A.S. Shaleva, O.S. Sedova Saint Petersburg State University, 7/9 Universitetskaya nab., St. Petersburg,
More informationCME 213 SPRING Eric Darve
CME 213 SPRING 2017 Eric Darve Final project Final project is about implementing a neural network in order to recognize hand-written digits. Logistics: Preliminary report: Friday June 2 nd Final report
More informationA Methodology for End-to-End Evaluation of Arabic Document Image Processing Software
MP 06W0000108 MITRE PRODUCT A Methodology for End-to-End Evaluation of Arabic Document Image Processing Software June 2006 Paul M. Herceg Catherine N. Ball 2006 The MITRE Corporation. All Rights Reserved.
More informationHMM-Based Handwritten Amharic Word Recognition with Feature Concatenation
009 10th International Conference on Document Analysis and Recognition HMM-Based Handwritten Amharic Word Recognition with Feature Concatenation Yaregal Assabie and Josef Bigun School of Information Science,
More informationCHAPTER 1 INTRODUCTION
CHAPTER 1 INTRODUCTION 1.1 Introduction Pattern recognition is a set of mathematical, statistical and heuristic techniques used in executing `man-like' tasks on computers. Pattern recognition plays an
More informationCHAPTER 8 COMPOUND CHARACTER RECOGNITION USING VARIOUS MODELS
CHAPTER 8 COMPOUND CHARACTER RECOGNITION USING VARIOUS MODELS 8.1 Introduction The recognition systems developed so far were for simple characters comprising of consonants and vowels. But there is one
More informationLearning-Based Candidate Segmentation Scoring for Real-Time Recognition of Online Overlaid Chinese Handwriting
2013 12th International Conference on Document Analysis and Recognition Learning-Based Candidate Segmentation Scoring for Real-Time Recognition of Online Overlaid Chinese Handwriting Yan-Fei Lv 1, Lin-Lin
More informationSemi-Automatic Transcription Tool for Ancient Manuscripts
The Venice Atlas A Digital Humanities atlas project by DH101 EPFL Students Semi-Automatic Transcription Tool for Ancient Manuscripts In this article, we investigate various techniques from the fields of
More informationII. WORKING OF PROJECT
Handwritten character Recognition and detection using histogram technique Tanmay Bahadure, Pranay Wekhande, Manish Gaur, Shubham Raikwar, Yogendra Gupta ABSTRACT : Cursive handwriting recognition is a
More informationDynamic Glyph Generation Based on variable length encoding
Kyoto University 21st Century COE Program Dynamic Glyph Generation Based on variable length encoding schema Yap Cheah Shen 1) Abstract About 20 years ago, Prof. Hsieh Ching-Chun from Academia Sinica proposed
More informationCharacter Encodings. Fabian M. Suchanek
Character Encodings Fabian M. Suchanek 22 Semantic IE Reasoning Fact Extraction You are here Instance Extraction singer Entity Disambiguation singer Elvis Entity Recognition Source Selection and Preparation
More informationMultilingual Hybrid Text Processing in Ancient Uighur (Chaghatai) Digitalized System
Journal of Chinese Language and Computing 15 (4): (211-218) Multilingual Hybrid Text Processing in Ancient Uighur (Chaghatai) Digitalized System College of Information Science and Engineering, Xinjiang
More informationSparse Non-negative Matrix Language Modeling
Sparse Non-negative Matrix Language Modeling Joris Pelemans Noam Shazeer Ciprian Chelba joris@pelemans.be noam@google.com ciprianchelba@google.com 1 Outline Motivation Sparse Non-negative Matrix Language
More informationLinear Discriminant Analysis in Ottoman Alphabet Character Recognition
Linear Discriminant Analysis in Ottoman Alphabet Character Recognition ZEYNEB KURT, H. IREM TURKMEN, M. ELIF KARSLIGIL Department of Computer Engineering, Yildiz Technical University, 34349 Besiktas /
More informationSlide credit from Hung-Yi Lee & Richard Socher
Slide credit from Hung-Yi Lee & Richard Socher 1 Review Word Vector 2 Word2Vec Variants Skip-gram: predicting surrounding words given the target word (Mikolov+, 2013) CBOW (continuous bag-of-words): predicting
More informationAn Accurate and Efficient System for Segmenting Machine-Printed Text. Yi Lu, Beverly Haist, Laurel Harmon, John Trenkle and Robert Vogt
An Accurate and Efficient System for Segmenting Machine-Printed Text Yi Lu, Beverly Haist, Laurel Harmon, John Trenkle and Robert Vogt Environmental Research Institute of Michigan P. O. Box 134001 Ann
More informationMono-font Cursive Arabic Text Recognition Using Speech Recognition System
Mono-font Cursive Arabic Text Recognition Using Speech Recognition System M.S. Khorsheed Computer & Electronics Research Institute, King AbdulAziz City for Science and Technology (KACST) PO Box 6086, Riyadh
More informationDeconvolution Networks
Deconvolution Networks Johan Brynolfsson Mathematical Statistics Centre for Mathematical Sciences Lund University December 6th 2016 1 / 27 Deconvolution Neural Networks 2 / 27 Image Deconvolution True
More informationLEKHAK [MAL]: A System for Online Recognition of Handwritten Malayalam Characters
LEKHAK [MAL]: A System for Online Recognition of Handwritten Malayalam Characters Gowri Shankar, V. Anoop and V. S. Chakravarthy, Department of Electrical Engineering, Indian Institute of Technology, Madras,
More informationEECS 496 Statistical Language Models. Winter 2018
EECS 496 Statistical Language Models Winter 2018 Introductions Professor: Doug Downey Course web site: www.cs.northwestern.edu/~ddowney/courses/496_winter2018 (linked off prof. home page) Logistics Grading
More informationReviewed by Tyler M. Heston, University of Hawai i at Mānoa
Vol. 7 (2013), pp. 114-122 http://nflrc.hawaii.edu/ldc http://hdl.handle.net/10125/4576 Ukelele From SIL International Reviewed by Tyler M. Heston, University of Hawai i at Mānoa 1. Introduction. Ukelele
More informationOn-line handwriting recognition Introduction
On-line handwriting recognition Introduction Although the problem of handwriting recognition has been considered for more than 30 years [http://www.ocr.eu/], there are still many unsolved issues, especially
More informationCursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network
Cursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network Utkarsh Dwivedi 1, Pranjal Rajput 2, Manish Kumar Sharma 3 1UG Scholar, Dept. of CSE, GCET, Greater Noida,
More informationEquation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation.
Equation to LaTeX Abhinav Rastogi, Sevy Harris {arastogi,sharris5}@stanford.edu I. Introduction Copying equations from a pdf file to a LaTeX document can be time consuming because there is no easy way
More informationCryptography: Matrices and Encryption
Cryptography: Matrices and Encryption By: Joseph Pugliano and Brandon Sehestedt Abstract The focus of this project is investigating how to generate keys in order to encrypt words using Hill Cyphers. Other
More informationAS Channels Capacitive Touch Sensor IC From Santa Clara, United States of America
ASI Competitor Equivalent A Competitor Equivalent B Volts Leading Performance: ESD HBM >8k Volts (Directly Applied to All IC Pins) Operating Temperature up to >+95 0 C Features Overview Analog and Digital
More informationDynamic Routing Between Capsules
Report Explainable Machine Learning Dynamic Routing Between Capsules Author: Michael Dorkenwald Supervisor: Dr. Ullrich Köthe 28. Juni 2018 Inhaltsverzeichnis 1 Introduction 2 2 Motivation 2 3 CapusleNet
More informationInformation and documentation Romanization of Chinese
INTERNATIONAL STANDARD ISO 7098 Third edition 2015-12-15 Information and documentation Romanization of Chinese Information et documentation Romanisation du chinois Reference number ISO 2015 COPYRIGHT PROTECTED
More informationCUED-RNNLM An Open-Source Toolkit for Efficient Training and Evaluation of Recurrent Neural Network Language Models
CUED-RNNLM An Open-Source Toolkit for Efficient Training and Evaluation of Recurrent Neural Network Language Models Xie Chen, Xunying Liu, Yanmin Qian, Mark Gales and Phil Woodland April 1, 2016 Overview
More informationTECHNICAL ADVISORY GROUP ON MACHINE READABLE TRAVEL DOCUMENTS (TAG/MRTD)
International Civil Aviation Organization INFORMATION PAPER TAG/MRTD/20-IP/4 22/08/11 English Only TECHNICAL ADVISORY GROUP ON MACHINE READABLE TRAVEL DOCUMENTS (TAG/MRTD) TWENTIETH MEETING Montréal, 7
More informationUser Manual Data Collection Tool 1.0.1
User Manual Data Collection Tool 1.0.1 lipitk.sourceforge.net Table of Contents 1 Introduction...2 2 Data collection tool (DCT)...2 2.1 Introduction... 2 2.2 Requirements for running the tool...2 2.2.1
More informationContinuous Chinese Handwriting Recognition with Language Model
Continuous Chinese Handwriting Recognition with Language Model Yanming Zou Kun Yu Kongqiao Wang Nokia Research Centre Beijing, BDA, 100176 P.R.China {yanming.zou,kun.1.yu,kongqiao.wang}@nokia.com Abstract
More informationScanned Documents. LBSC 796/INFM 718R Douglas W. Oard Week 8, March 30, 2011
Scanned Documents LBSC 796/INFM 718R Douglas W. Oard Week 8, March 30, 2011 Expanding the Search Space Scanned Docs Identity: Harriet Later, I learned that John had not heard High Payoff Investments
More informationINTRODUCTION TO COMPUTERS
INTRODUCTION TO COMPUTERS When we talk about computers, we really are talking about a Computer System. Computer System: It is a combination of Hardware and Software. This combination allows a computer
More informationA cross-application architecture for pen-based mathematical interfaces
A cross-application architecture for pen-based mathematical interfaces Elena Smirnova Stephen M. Watt Ontario Research Centre for Computer Algebra University of Western Ontario London ON, Canada N6A 3L8
More informationICDAR2015 Writer Identification Competition using KHATT, AHTID/MW and IBHC Databases
ICDAR2015 Writer Identification Competition using KHATT, AHTID/MW and IBHC Databases Handwriting is considered to be one of the commonly used modality to identify persons in commercial, governmental and
More informationA Fast Recognition System for Isolated Printed Characters Using Center of Gravity and Principal Axis
Applied Mathematics, 2013, 4, 1313-1319 http://dx.doi.org/10.4236/am.2013.49177 Published Online September 2013 (http://www.scirp.org/journal/am) A Fast Recognition System for Isolated Printed Characters
More informationOffline Handwriting Recognition with Multidimensional Recurrent Neural Networks
Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks Alex Graves TU Munich, Germany graves@in.tum.de Jürgen Schmidhuber IDSIA, Switzerland and TU Munich, Germany juergen@idsia.ch
More informationProposal to Encode Phonetic Symbols with Retroflex Hook in the UCS
Proposal to Encode Phonetic Symbols with Retroflex Hook in the UCS Date: 2003-5-30 Author: Peter Constable, SIL International Address 7500 W. Camp Wisdom Rd. Dallas, TX 75236 USA Tel: +1 972 708 7485 Email:
More informationUnsupervised learning in Vision
Chapter 7 Unsupervised learning in Vision The fields of Computer Vision and Machine Learning complement each other in a very natural way: the aim of the former is to extract useful information from visual
More informationComparison of Bernoulli and Gaussian HMMs using a vertical repositioning technique for off-line handwriting recognition
2012 International Conference on Frontiers in Handwriting Recognition Comparison of Bernoulli and Gaussian HMMs using a vertical repositioning technique for off-line handwriting recognition Patrick Doetsch,
More informationWord-wise Hand-written Script Separation for Indian Postal automation
Word-wise Hand-written Script Separation for Indian Postal automation K. Roy U. Pal Dept. of Comp. Sc. & Engg. West Bengal University of Technology, Sector 1, Saltlake City, Kolkata-64, India Abstract
More information