Using Neural Cells to Improve Image Textual Line Segmentation

Similar documents
Using Neural Cells to Improve Image Textual Line Segmentation

Apparel Classifier and Recommender using Deep Learning

Smarties Estimate and Count

Types of Edges. Why Edge Detection? Types of Edges. Edge Detection. Gradient. Edge Detection

Artificial Intelligence Introduction Handwriting Recognition Kadir Eren Unal ( ), Jakob Heyder ( )

Deep Learning Photon Identification in a SuperGranular Calorimeter

Multi-resolution image recognition. Jean-Baptiste Boin Roland Angst David Chen Bernd Girod

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

A Deep Learning Framework for Authorship Classification of Paintings

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

Review: Identification of cell types from single-cell transcriptom. method

Arkansas Curriculum Framework for Computer Applications II

Face Detection. Gary Chern, Paul Gurney, and Jared Starman

Contents. Group 2 Excel Handouts 2010

PLT: Inception (cuz there are so many layers)

Polytechnic University of Tirana

Midterm Examination CS 534: Computational Photography

Open Excel by following the directions listed below: Click on Start, select Programs, and the click on Microsoft Excel.

Adaptive Robotics - Final Report Extending Q-Learning to Infinite Spaces

ARE you? CS229 Final Project. Jim Hefner & Roddy Lindsay

Dynamic Routing Between Capsules. Yiting Ethan Li, Haakon Hukkelaas, and Kaushik Ram Ramasamy

Deep Learning: Image Registration. Steven Chen and Ty Nguyen

Object Recognition II

LSTM: An Image Classification Model Based on Fashion-MNIST Dataset

Starting Excel application

comma separated values .csv extension. "save as" CSV (Comma Delimited)

MANAGING SHOW DATA. This document introduces a method using Microsoft Excel and Microsoft Word to:

Deep Learning. Deep Learning provided breakthrough results in speech recognition and image classification. Why?

Kaggle Data Science Bowl 2017 Technical Report

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs

WORD Creating Objects: Tables, Charts and More

Two-Stream Convolutional Networks for Action Recognition in Videos

Introduction to Excel 2013

Indexing and Searching

How to stay connected. Stay connected with DIIT

Candy is Dandy Project (Project #12)

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601

Machine Learning Techniques at the core of AlphaGo success

CS 523: Multimedia Systems

Group Administrator. ebills csv file formatting by class level. User Guide

CHAPTER 8 COMPOUND CHARACTER RECOGNITION USING VARIOUS MODELS

A Framework for Efficient Fingerprint Identification using a Minutiae Tree

Robust line segmentation for handwritten documents

exam. Number: Passing Score: 800 Time Limit: 120 min MICROSOFT Excel 2013 Expert Part One.

27: Hybrid Graphical Models and Neural Networks

Book 5. Chapter 1: Slides with SmartArt & Pictures... 1 Working with SmartArt Formatting Pictures Adjust Group Buttons Picture Styles Group Buttons

Farming Example. Lecture 22. Solving a Linear Program. withthe Simplex Algorithm and with Excel s Solver

Know your data - many types of networks

Appendix A Design. User-Friendly Web Pages

2.1. Project Information and Protection Global Settings Working with Tags... 15

Advanced Image Processing, TNM034 Optical Music Recognition

Making Word Files Accessible

Clergy and Chancery, Parish and School Staff. Tom Hardy, Director, Office of Information Technology

Correcting Grammar as You Type

SOLOMON: Parentage Analysis 1. Corresponding author: Mark Christie

Divide and Conquer Algorithms

MoonRiver: Deep Neural Network in C++

Information Fusion Dr. B. K. Panigrahi

Spatial Localization and Detection. Lecture 8-1

Statistics and Graphics Functions

Analysis of Algorithms. 5-Dec-16

Correcting Grammar as You Type. 1. Right-click the text marked with the blue, wavy underline. 2. Click the desired option on the shortcut menu.

An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation

MANAGING SHOW DATA This document introduces a method using Microsoft Excel and Microsoft Word to manage show data, including:

Analysis of cells within a spreadsheet

Traffic Sign Localization and Classification Methods: An Overview

LEARNING TO GENERATE CHAIRS WITH CONVOLUTIONAL NEURAL NETWORKS

How to use Excel Spreadsheets for Graphing

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

CS 188: Artificial Intelligence Fall 2008

Excel Format cells Number Percentage (.20 not 20) Special (Zip, Phone) Font

Initial Results in Offline Arabic Handwriting Recognition Using Large-Scale Geometric Features. Ilya Zavorin, Eugene Borovikov, Mark Turner

Time Series prediction with Feed-Forward Neural Networks -A Beginners Guide and Tutorial for Neuroph. Laura E. Carter-Greaves

Advanced Databases: Parallel Databases A.Poulovassilis

DECISION TREES & RANDOM FORESTS X CONVOLUTIONAL NEURAL NETWORKS

EXCEL 2003 DISCLAIMER:

Supplementary A. Overview. C. Time and Space Complexity. B. Shape Retrieval. D. Permutation Invariant SOM. B.1. Dataset

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Object Detection on Self-Driving Cars in China. Lingyun Li

Tutorial on Machine Learning Tools

MS1b Statistical Data Mining Part 3: Supervised Learning Nonparametric Methods

Advanced Video Content Analysis and Video Compression (5LSH0), Module 8B

Classification of objects from Video Data (Group 30)

Creating a Spreadsheet by Using Excel

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

DAY 7: EXCEL CHAPTER 5. Divya Ganesan February 5, 2013

EFFICIENT ATTACKS ON HOMOPHONIC SUBSTITUTION CIPHERS

Exploring Microsoft Office Word 2007

Selective Space Structures Manual

Automated Digital Conversion of Hand-Drawn Plots

SAMLab Tip Sheet #5 Creating Graphs

Department of Electronic Engineering FINAL YEAR PROJECT REPORT

The Detection of Faces in Color Images: EE368 Project Report

An Exploration of Computer Vision Techniques for Bird Species Classification

CAP 6412 Advanced Computer Vision

(Coins & Distributions & Bias s)

Trace-based JIT Compilation

Kernels vs. DNNs for Speech Recognition

Deep Learning and Its Applications

Transcription:

Using Neural Cells to Improve Image Textual Line Segmentation Patrick Schone (patrickjohn.schone@ldschurch.org) 7 February 2017 Standards Technical Conference

Overview Motivation Neural Cells for Line Counting Hybrid Segmentation Evaluation Methods 2

Motivation To do OCR/HWR, usually need to break up into lines. Most use common image processing &/or statistics. Techniques work fine on small / homogeneous sets. Huge heterogeneous DBs are a challenge. Activity or Connectivity vs intentionality. Deep Neural Nets can be taught some about humanness. Can they be useful for line segmentation? 3

Base System: PreDNN We start off with a system we call PreDNN with the following properties: Multi-swath projection Bi-directional Dynamic programming with two-pass seam carving (first detects peaks, the troughs) Grayscale-based Use of connected components to help detect false lines or falsely merged lines. Statistical analyses to discard overgenerated peaks or troughs We believe that this PreDNN rivals the state of the art. 4

DNNs to Count Lines Want to create a neural network that can determine if we find well-segmented lines. A B C D E F G 5 A B C D E F G Category No-text Line Single Text Line Vertical Bar Only Less Than One Text Line Two Fragment Lines More than one but less than two lines Two-plus Line

Tagged Cells: Counts # in TRAIN # in TEST No-text Line 7330 625 Single Text Line 12651 848 Vertical Bar Only 703 89 Less Than One Text Line 16813 1098 Two Fragments 5027 407 More than one less than two 8573 801 Two-plus Line 14162 1409 Totals 70.5K 5.3K 6

Tagged Cells: Counts II Note that the lined-ness of a cell is still the same even if the image has been rotated 180 degrees of flipped with respect to y-axis. We will refer to these as the legal permutations. Using the legal permutations, we have 4 times more train/test: 280K training 21K testing. 7

The DNN We train a Convolutional Neural Network for line count prediction. We use Google s TensorFlow. And use a variant of their MNIST-Digit-Recognition recipe. Except, for speed, we reduce the parameters: Kernel Size = 3 Layer #1 = 16 Layer #2 = 32 Layer #3 = 216 This yields a network with 91.0% accuracy. HOWEVER, we can get about 2% improvement by either considering the four permutations or by overlapping decision regions and voting. 8

Line Counter= Line Fixer Potential Usage: How about if we just apply the system to PreDNN s outputs and then try to correct? Example to the right is one where line segmenter does poorly. Can we fix it? 9

Line Counter = Line Fixer Start off by colorizing the image. For each swath, cut into overlapping regions. Predict color of each region. Use voting to predict most likely color of each intersected area. 10

Line Counter = Line Fixer Compute Greenness : # GREEN Cells + 0.1* (#PINK + #PURPLE) Cells ---------------------------------------------------------------------- # GREEN + 0.1* (#PINK + #PURPLE) + #{BLUE, RED, YELLOW, ORANGE} Greenness = 69.9% 11

Line Counter = Line Fixer We created formulas that comparing each row to the one above/below: Symptoms of Potential False Split: Blue/Blue: Likely split Blue/Pink: Possible split Green/Pink: Slight chance of split Green/Green: Probably OK Etc. Symptoms of Potential False Merge: Red/Green: Likely Merge Red/Red: Almost definite merge Red/Orange: Probable merge 12

Line Counter = Line Fixer We handle potential false splits first, then false merges. We sort from the most likely to the least, and throw out candidates with low scores. For potential false splits (and fusions would be similar): <= Start with row pair. Offline, merge cells and evaluate. <= If the results are better, replace the original with the new. <= If results are worse, though, skip that potential pairing. 13

Line Counter = Line Fixer Using this process, we are able to completely fix what was broken. The resultant image s line segmentation has 100% greenness! 14

Evaluation Methods Option 1: Score against human-vetted lines. Do-able, but costly to evaluate. Option 2: Evaluate by Greenness Very inexpensive, but cheating a bit Option 3: Evaluate by Recognition Ultimately, our end goal, so this is good eval. 16

Evaluation Option 2: Evaluate by Greenness PreDNN PostDNN # of 95+% green 69 92 # of 90-94% green 47 58 # of 80-89% green 52 46 # of 70-79% green 31 14 # Below 70% green 20 9 Average Greenness 86.88% 91.30% 17

Evaluation Option 3: Evaluate by Recognition: We built a test collection of 565 handwritten, prose-style US legal documents with 137K test words. Also trained a handwriting recognition system using comparable but different training documents. Then ran recognition using both PreDNN and PostDNN systems: HWR Word Accuracy PreDNN PostDNN 83.9% 85.1% Line Segmentation cost 19% more, but recognition costs 11% less because there are fewer lines. So for fairly comparable costs, we get 1.2% absolute gain. 18

Synopsis Can detect line count at the cellular level. Greenness: allows one to detect areas of potential problems. Neural improves segmentation of an already-good system. We expect it to be applicable w/ other systems. Final segmentation actually results in HWR improvements. ANY QUESTIONS? 19