Shape Context Matching For Efficient OCR

Similar documents
Shape Context Matching For Efficient OCR. Sudeep Pillai

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

THE MNIST DATABASE of handwritten digits Yann LeCun, Courant Institute, NYU Corinna Cortes, Google Labs, New York

Efficient Algorithms may not be those we think

A New Shape Matching Measure for Nonlinear Distorted Object Recognition

Beyond bags of Features

Deep Learning for Generic Object Recognition

3D Object Recognition using Multiclass SVM-KNN

Shape Contexts. Newton Petersen 4/25/2008

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

Last week. Multi-Frame Structure from Motion: Multi-View Stereo. Unknown camera viewpoints

Object Recognition with Invariant Features

Scale Invariant Feature Transform

Multiple-Choice Questionnaire Group C

Patch-based Object Recognition. Basic Idea

Scale Invariant Feature Transform

Feature Descriptors. CS 510 Lecture #21 April 29 th, 2013

SVM-KNN : Discriminative Nearest Neighbor Classification for Visual Category Recognition

2. LITERATURE REVIEW

Automatic Image Alignment (feature-based)

Local Features Tutorial: Nov. 8, 04

Image Processing. Image Features

Outline 7/2/201011/6/

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking

Cell Clustering Using Shape and Cell Context. Descriptor

Lecture 8 Object Descriptors

Shape Matching. Brandon Smith and Shengnan Wang Computer Vision CS766 Fall 2007

Computer vision: models, learning and inference. Chapter 13 Image preprocessing and feature extraction

2D Image Processing Feature Descriptors

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

FACE RECOGNITION USING INDEPENDENT COMPONENT

SCALE INVARIANT FEATURE TRANSFORM (SIFT)

Introduction. Introduction. Related Research. SIFT method. SIFT method. Distinctive Image Features from Scale-Invariant. Scale.

Artistic ideation based on computer vision methods

TA Section 7 Problem Set 3. SIFT (Lowe 2004) Shape Context (Belongie et al. 2002) Voxel Coloring (Seitz and Dyer 1999)

CS 231A Computer Vision (Winter 2014) Problem Set 3

Computer Vision. Recap: Smoothing with a Gaussian. Recap: Effect of σ on derivatives. Computer Science Tripos Part II. Dr Christopher Town

Improving Shape retrieval by Spectral Matching and Meta Similarity

Motion Estimation and Optical Flow Tracking

Image Features: Local Descriptors. Sanja Fidler CSC420: Intro to Image Understanding 1/ 58

Learning to Recognize Faces in Realistic Conditions

Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong)

Lecture 6: Multimedia Information Retrieval Dr. Jian Zhang

Examination in Image Processing

A NEW FEATURE BASED IMAGE REGISTRATION ALGORITHM INTRODUCTION

Sparse coding for image classification

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

Supplementary Material for SphereNet: Learning Spherical Representations for Detection and Classification in Omnidirectional Images

CS 4495 Computer Vision A. Bobick. CS 4495 Computer Vision. Features 2 SIFT descriptor. Aaron Bobick School of Interactive Computing

Matching and Recognition in 3D. Based on slides by Tom Funkhouser and Misha Kazhdan

The Curse of Dimensionality

Beyond Bags of Features

Object Classification Problem

Last update: May 4, Vision. CMSC 421: Chapter 24. CMSC 421: Chapter 24 1

Su et al. Shape Descriptors - III

Multi-view 3D reconstruction. Problem formulation Projective ambiguity Rectification Autocalibration Feature points and their matching

Using Machine Learning for Classification of Cancer Cells

Content-based Image Retrieval (CBIR)

Part-based and local feature models for generic object recognition

An Introduction to Content Based Image Retrieval

Recognition of Degraded Handwritten Characters Using Local Features. Markus Diem and Robert Sablatnig

Local Features: Detection, Description & Matching

3D Environment Reconstruction

CS 231A Computer Vision (Fall 2012) Problem Set 3

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Raghuraman Gopalan Center for Automation Research University of Maryland, College Park

CHAPTER 1 Introduction 1. CHAPTER 2 Images, Sampling and Frequency Domain Processing 37

Training-Free, Generic Object Detection Using Locally Adaptive Regression Kernels

Discriminative classifiers for image recognition

2: Image Display and Digital Images. EE547 Computer Vision: Lecture Slides. 2: Digital Images. 1. Introduction: EE547 Computer Vision

LOCAL AND GLOBAL DESCRIPTORS FOR PLACE RECOGNITION IN ROBOTICS

CS231A Section 6: Problem Set 3

Scale Invariant Feature Transform by David Lowe

Using Geometric Blur for Point Correspondence

Voronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013

Image Segmentation and Registration

Classification of objects from Video Data (Group 30)

Implementing the Scale Invariant Feature Transform(SIFT) Method

A Comparison of SIFT, PCA-SIFT and SURF

Multiple-View Object Recognition in Band-Limited Distributed Camera Networks

Shape Descriptor using Polar Plot for Shape Recognition.

CS4495/6495 Introduction to Computer Vision. 8C-L1 Classification: Discriminative models

Semi-Automatic Transcription Tool for Ancient Manuscripts

3D object recognition used by team robotto

Comparison of Feature Detection and Matching Approaches: SIFT and SURF

EE368/CS232 Digital Image Processing Winter

Instance-level recognition

Bag-of-features. Cordelia Schmid

Shape spaces. Shape usually defined explicitly to be residual: independent of size.

An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation

CAP 5415 Computer Vision Fall 2012

CS664 Lecture #21: SIFT, object recognition, dynamic programming

Instance-level recognition

Distribution Distance Functions

CS 223B Computer Vision Problem Set 3

EECS 556 Image Processing W 09. Interpolation. Interpolation techniques B splines

Digital Image Processing

Recognizing hand-drawn images using shape context

Large scale object/scene recognition

Digital Image Processing

Transcription:

Matching For Efficient OCR May 14, 2012 Matching For Efficient OCR

Table of contents 1 Motivation Background 2 What is a? Matching s Simliarity Measure 3 Matching s via Pyramid Matching Matching For Efficient OCR

Motivation Motivation Background Automatic translation/transcription of handwritten/printed text Printed text has several geometric constraints that can be utilized for improved performance Significant push for accuracy, not too much on optimization Matching For Efficient OCR

Object Character Recognition Motivation Background MNIST database performance Digits size normalized, and centered in a fixed-size image 60,000 training examples, 10,000 test examples Classifier Preprocessing Test Error Rate % Linear Classfiers Linear classifier (1-layer NN) None 12.0 Pairwise linear classifier Deskewing 7.6 K-Nearest Neighbors K-NN, Euclidean (L2) None 3.09 K-NN, Euclidean (L3) Deskewing, noise removal 1.22 K-NN, Shape context matching Shape context extraction 0.63 Matching For Efficient OCR

Object Character Recognition Motivation Background MNIST database performance Digits size normalized, and centered in a fixed-size image 60,000 training examples, 10,000 test examples Classifier Preprocessing Test Error Rate % SVMSs SVM Gaussian Kernel None 1.4 Virtual SVM, deg-9 poly, 2-pixel jittered None 0.56 Neural Nets Deep convex net, unsup pre-training None 0.83 Convolution Nets Committe of 35 conv. net Normalization 0.23 Matching For Efficient OCR

Object Character Recognition Motivation Background Figure: A few digits from the MNIST database Matching For Efficient OCR

Object Character Recognition Motivation Background MNIST database performance Digits size normalized, and centered in a fixed-size image 60,000 training examples, 10,000 test examples Classifier Preprocessing Test Error Rate % Linear Classfiers Linear classifier (1-layer NN) None 12.0 Pairwise linear classifier Deskewing 7.6 K-Nearest Neighbors K-NN, Euclidean (L2) None 3.09 K-NN, Euclidean (L3) Deskewing, noise removal 1.22 K-NN, Shape context matching Shape context extraction 0.63 Matching For Efficient OCR

What is a? What is a? Matching s Simliarity Measure Definition (Shape) A shape is represented as a sequence of boundary points: P = {p 1,..., p n }, p i R 2 Definition () Shape context is a descriptor of interest point i.e. a histogram h i (k) = #{p j j i, x j x i bin(k)}, in which bins are uniformly divided in log-polar space Matching For Efficient OCR

Representation What is a? Matching s Simliarity Measure Figure: Graphical representation of shape context bins Matching For Efficient OCR

Histogram What is a? Matching s Simliarity Measure Figure: Graphical representation of shape context histograms R 60 Matching For Efficient OCR

Matching s What is a? Matching s Simliarity Measure The cost of matching point p i on the first shape to point q j on the second shape (chi-square distance) C ij = 1 2 K k=1 [h i (k) h j (k)] 2 h i (k) + h j (k) Minimize the total matching cost: i C(p i, q π (i)) Optimal matching One possible technique to solve this problem is to use Hungarian method in O(n 3 ) time complexity Matching For Efficient OCR

Properties of shape contexts What is a? Matching s Simliarity Measure Invariant to translation and scale (as it is normalized by the mean distance of the n 2 point pairs) Can be made invariant to rotation (local tangent orientation) Tolerant to small affine distortion (log-polar, spatial blur proportional to r) Matching For Efficient OCR

Simliarity Measure What is a? Matching s Simliarity Measure Definition On employing a cubic spline transformation T, the two shapes similarity can be measured via a weighted sum D = ad ac + D sc + bd be D sc Shape context distance D ac Appearance cost D be Bending energy or transformation cost Matching For Efficient OCR

Matching s via Pyramid Matching Approximate matching is possible with full shape context feature A low-dimensional feature descriptor is desirable for performance purposes Uniform bin approximation will make matching accuracy decline with feature dimension d 2 Multiple modalities are representable even with a reduced subspace Use Principal Components Analysis to determine bases that define this shape context subspace Approximate matching can be performed faster once all R 60 vectors are projected onto R 3 Matching For Efficient OCR

Matching s via Pyramid Matching Figure: Projecting histograms of contour points onto the shape context subspace. The points on the human figure on the right are colored according to their 3-D shape context subspace feature values Matching For Efficient OCR

Matching s via Pyramid Matching Figure: Visualization of feature subspace constructed from shape context histograms for two different data sets. The RGB channels of each point on the contours are colored according to its histograms 3-D PCA coefficient values. Set matching in this feature space means that contour points of similar color have a low matching cost, while highly contrasting colors incur a high matching cost Matching For Efficient OCR

Tradeoffs Matching s via Pyramid Matching Larger d is Smaller the PCA reconstruction error Larger the distortion induced by the L1 embedding Larger the complexity of computing the embedding Do we really need a R 60 feature vector to represent a shape? Shapes are almost never similar Approximate measures make more sense Extract only most discriminating dimensions as descriptor Matching For Efficient OCR

Pyramid Matching Matching s via Pyramid Matching X and Y are two sets of vectors in a R d feature space Find an approximate correspondence between X and Y Matching For Efficient OCR

Pyramid Matching Overview Matching s via Pyramid Matching Matching For Efficient OCR

Pyramid Matching Kernels Matching s via Pyramid Matching Construct a sequence of grids at resolution 0,..., L where a grid at a resolution l has D = 2 dl cells. Compute the histograms HX l and l Y where HX l and Hl Y are histograms of X and Y at resolution l HX l (i) and Hl Y (i) are the number of points of X and Y in the ith cell Compute the number of matches for each resolution using: I(H l X, H l Y ) = D min(hx(i), l HY l (i)) i=1 Matching For Efficient OCR

Pyramid Matching Kernels Matching s via Pyramid Matching Summing all the I l giving more importance to the high resolution with: K(X, Y ) = I L + L l=0 1 1 2 L 1 (Il I l+1 ) = 1 2 L I0 + L l=1 2 1 L l+1 Il where I l I l+1 is the number of new matches Matching For Efficient OCR

Pyramid Matching (l = 0) Matching s via Pyramid Matching Matching For Efficient OCR

Pyramid Matching (l = 1) Matching s via Pyramid Matching Matching For Efficient OCR

Pyramid Matching (l = 2) Matching s via Pyramid Matching Matching For Efficient OCR

Pyramid Matching Matching s via Pyramid Matching Matching For Efficient OCR

Comparison with Optimal Matching Matching s via Pyramid Matching Matching For Efficient OCR

Vocabulary-guided Matching Matching s via Pyramid Matching Figure: The bins are concentrated on decomposing the space where features cluster, particularly for high-dimensional features (in this figure R 2 ). Features are small points in red, bin centers are larger black points, and blue lines denote bin boundaries. The vocabulary-guided bins are irregularly shaped Voronoi cells. Matching For Efficient OCR

Performance Matching s via Pyramid Matching Computing partial matching Earth Mover s Distance O(dm 3 log m) Hungarian method O(dm 3 ) Greedy matching O(dm 2 log m) Pyramid match O(dmL) for sets with O(m) R d features and pyramids with L levels Matching For Efficient OCR

Affine Constraints - RANSAC Matching s via Pyramid Matching Figure: Interest points computed on image 1 Figure: Interest points computed on image 2 Matching For Efficient OCR

Affine Constraints - RANSAC Matching s via Pyramid Matching Figure: Find correspondences between interest points Matching For Efficient OCR

Affine Constraints - RANSAC Matching s via Pyramid Matching Figure: Outlier removal via RANSAC (Random Sampling And Consensus) Matching For Efficient OCR

Additional improvements Matching s via Pyramid Matching RANSAC gives an initial estimate of affine transformation between canonical set of points and query points Utilize affine transformation estimate to perform vocabulary/geometrically guided searching/matching Could use MLESAC/PROSAC to perform probabilistic searching Ability to add constraints to the pyramid matching scheme to reduce query time, and improve robustness to partial matching Matching For Efficient OCR

Conclusions Matching s via Pyramid Matching Investigated and implemented a shape descriptor invariant to rotation and scale Integrated an approximate matching scheme that has a linear time complexity Scheme extends well with increase in size of the databse of descriptors Significant improvement in speed with little tradeoff in accuracy Source code available soon Matching For Efficient OCR

Conclusions Matching s via Pyramid Matching Thanks! Matching For Efficient OCR