Matching For Efficient OCR May 14, 2012 Matching For Efficient OCR
Table of contents 1 Motivation Background 2 What is a? Matching s Simliarity Measure 3 Matching s via Pyramid Matching Matching For Efficient OCR
Motivation Motivation Background Automatic translation/transcription of handwritten/printed text Printed text has several geometric constraints that can be utilized for improved performance Significant push for accuracy, not too much on optimization Matching For Efficient OCR
Object Character Recognition Motivation Background MNIST database performance Digits size normalized, and centered in a fixed-size image 60,000 training examples, 10,000 test examples Classifier Preprocessing Test Error Rate % Linear Classfiers Linear classifier (1-layer NN) None 12.0 Pairwise linear classifier Deskewing 7.6 K-Nearest Neighbors K-NN, Euclidean (L2) None 3.09 K-NN, Euclidean (L3) Deskewing, noise removal 1.22 K-NN, Shape context matching Shape context extraction 0.63 Matching For Efficient OCR
Object Character Recognition Motivation Background MNIST database performance Digits size normalized, and centered in a fixed-size image 60,000 training examples, 10,000 test examples Classifier Preprocessing Test Error Rate % SVMSs SVM Gaussian Kernel None 1.4 Virtual SVM, deg-9 poly, 2-pixel jittered None 0.56 Neural Nets Deep convex net, unsup pre-training None 0.83 Convolution Nets Committe of 35 conv. net Normalization 0.23 Matching For Efficient OCR
Object Character Recognition Motivation Background Figure: A few digits from the MNIST database Matching For Efficient OCR
Object Character Recognition Motivation Background MNIST database performance Digits size normalized, and centered in a fixed-size image 60,000 training examples, 10,000 test examples Classifier Preprocessing Test Error Rate % Linear Classfiers Linear classifier (1-layer NN) None 12.0 Pairwise linear classifier Deskewing 7.6 K-Nearest Neighbors K-NN, Euclidean (L2) None 3.09 K-NN, Euclidean (L3) Deskewing, noise removal 1.22 K-NN, Shape context matching Shape context extraction 0.63 Matching For Efficient OCR
What is a? What is a? Matching s Simliarity Measure Definition (Shape) A shape is represented as a sequence of boundary points: P = {p 1,..., p n }, p i R 2 Definition () Shape context is a descriptor of interest point i.e. a histogram h i (k) = #{p j j i, x j x i bin(k)}, in which bins are uniformly divided in log-polar space Matching For Efficient OCR
Representation What is a? Matching s Simliarity Measure Figure: Graphical representation of shape context bins Matching For Efficient OCR
Histogram What is a? Matching s Simliarity Measure Figure: Graphical representation of shape context histograms R 60 Matching For Efficient OCR
Matching s What is a? Matching s Simliarity Measure The cost of matching point p i on the first shape to point q j on the second shape (chi-square distance) C ij = 1 2 K k=1 [h i (k) h j (k)] 2 h i (k) + h j (k) Minimize the total matching cost: i C(p i, q π (i)) Optimal matching One possible technique to solve this problem is to use Hungarian method in O(n 3 ) time complexity Matching For Efficient OCR
Properties of shape contexts What is a? Matching s Simliarity Measure Invariant to translation and scale (as it is normalized by the mean distance of the n 2 point pairs) Can be made invariant to rotation (local tangent orientation) Tolerant to small affine distortion (log-polar, spatial blur proportional to r) Matching For Efficient OCR
Simliarity Measure What is a? Matching s Simliarity Measure Definition On employing a cubic spline transformation T, the two shapes similarity can be measured via a weighted sum D = ad ac + D sc + bd be D sc Shape context distance D ac Appearance cost D be Bending energy or transformation cost Matching For Efficient OCR
Matching s via Pyramid Matching Approximate matching is possible with full shape context feature A low-dimensional feature descriptor is desirable for performance purposes Uniform bin approximation will make matching accuracy decline with feature dimension d 2 Multiple modalities are representable even with a reduced subspace Use Principal Components Analysis to determine bases that define this shape context subspace Approximate matching can be performed faster once all R 60 vectors are projected onto R 3 Matching For Efficient OCR
Matching s via Pyramid Matching Figure: Projecting histograms of contour points onto the shape context subspace. The points on the human figure on the right are colored according to their 3-D shape context subspace feature values Matching For Efficient OCR
Matching s via Pyramid Matching Figure: Visualization of feature subspace constructed from shape context histograms for two different data sets. The RGB channels of each point on the contours are colored according to its histograms 3-D PCA coefficient values. Set matching in this feature space means that contour points of similar color have a low matching cost, while highly contrasting colors incur a high matching cost Matching For Efficient OCR
Tradeoffs Matching s via Pyramid Matching Larger d is Smaller the PCA reconstruction error Larger the distortion induced by the L1 embedding Larger the complexity of computing the embedding Do we really need a R 60 feature vector to represent a shape? Shapes are almost never similar Approximate measures make more sense Extract only most discriminating dimensions as descriptor Matching For Efficient OCR
Pyramid Matching Matching s via Pyramid Matching X and Y are two sets of vectors in a R d feature space Find an approximate correspondence between X and Y Matching For Efficient OCR
Pyramid Matching Overview Matching s via Pyramid Matching Matching For Efficient OCR
Pyramid Matching Kernels Matching s via Pyramid Matching Construct a sequence of grids at resolution 0,..., L where a grid at a resolution l has D = 2 dl cells. Compute the histograms HX l and l Y where HX l and Hl Y are histograms of X and Y at resolution l HX l (i) and Hl Y (i) are the number of points of X and Y in the ith cell Compute the number of matches for each resolution using: I(H l X, H l Y ) = D min(hx(i), l HY l (i)) i=1 Matching For Efficient OCR
Pyramid Matching Kernels Matching s via Pyramid Matching Summing all the I l giving more importance to the high resolution with: K(X, Y ) = I L + L l=0 1 1 2 L 1 (Il I l+1 ) = 1 2 L I0 + L l=1 2 1 L l+1 Il where I l I l+1 is the number of new matches Matching For Efficient OCR
Pyramid Matching (l = 0) Matching s via Pyramid Matching Matching For Efficient OCR
Pyramid Matching (l = 1) Matching s via Pyramid Matching Matching For Efficient OCR
Pyramid Matching (l = 2) Matching s via Pyramid Matching Matching For Efficient OCR
Pyramid Matching Matching s via Pyramid Matching Matching For Efficient OCR
Comparison with Optimal Matching Matching s via Pyramid Matching Matching For Efficient OCR
Vocabulary-guided Matching Matching s via Pyramid Matching Figure: The bins are concentrated on decomposing the space where features cluster, particularly for high-dimensional features (in this figure R 2 ). Features are small points in red, bin centers are larger black points, and blue lines denote bin boundaries. The vocabulary-guided bins are irregularly shaped Voronoi cells. Matching For Efficient OCR
Performance Matching s via Pyramid Matching Computing partial matching Earth Mover s Distance O(dm 3 log m) Hungarian method O(dm 3 ) Greedy matching O(dm 2 log m) Pyramid match O(dmL) for sets with O(m) R d features and pyramids with L levels Matching For Efficient OCR
Affine Constraints - RANSAC Matching s via Pyramid Matching Figure: Interest points computed on image 1 Figure: Interest points computed on image 2 Matching For Efficient OCR
Affine Constraints - RANSAC Matching s via Pyramid Matching Figure: Find correspondences between interest points Matching For Efficient OCR
Affine Constraints - RANSAC Matching s via Pyramid Matching Figure: Outlier removal via RANSAC (Random Sampling And Consensus) Matching For Efficient OCR
Additional improvements Matching s via Pyramid Matching RANSAC gives an initial estimate of affine transformation between canonical set of points and query points Utilize affine transformation estimate to perform vocabulary/geometrically guided searching/matching Could use MLESAC/PROSAC to perform probabilistic searching Ability to add constraints to the pyramid matching scheme to reduce query time, and improve robustness to partial matching Matching For Efficient OCR
Conclusions Matching s via Pyramid Matching Investigated and implemented a shape descriptor invariant to rotation and scale Integrated an approximate matching scheme that has a linear time complexity Scheme extends well with increase in size of the databse of descriptors Significant improvement in speed with little tradeoff in accuracy Source code available soon Matching For Efficient OCR
Conclusions Matching s via Pyramid Matching Thanks! Matching For Efficient OCR