Identifying and Reading Visual Code Markers

Similar documents
EE368 Project: Visual Code Marker Detection

EE795: Computer Vision and Intelligent Systems

DESIGNING A REAL TIME SYSTEM FOR CAR NUMBER DETECTION USING DISCRETE HOPFIELD NETWORK

Auto-Digitizer for Fast Graph-to-Data Conversion

Pictures at an Exhibition

EE368 Project Report Detection and Interpretation of Visual Code Markers

Lecture 7: Most Common Edge Detectors

Mobile Camera Based Calculator

EXAM SOLUTIONS. Image Processing and Computer Vision Course 2D1421 Monday, 13 th of March 2006,

C E N T E R A T H O U S T O N S C H O O L of H E A L T H I N F O R M A T I O N S C I E N C E S. Image Operations II

Cover Page. Abstract ID Paper Title. Automated extraction of linear features from vehicle-borne laser data

Image Processing: Final Exam November 10, :30 10:30

Minimizing Noise and Bias in 3D DIC. Correlated Solutions, Inc.

Face Detection on Similar Color Photographs

EECS490: Digital Image Processing. Lecture #23

Segmentation algorithm for monochrome images generally are based on one of two basic properties of gray level values: discontinuity and similarity.

Optical Verification of Mouse Event Accuracy

Chapter 11 Representation & Description

A Low Power, High Throughput, Fully Event-Based Stereo System: Supplementary Documentation

ECG782: Multidimensional Digital Signal Processing

Chapter 3: Intensity Transformations and Spatial Filtering

ECG782: Multidimensional Digital Signal Processing

CS443: Digital Imaging and Multimedia Binary Image Analysis. Spring 2008 Ahmed Elgammal Dept. of Computer Science Rutgers University

Chapter 3 Image Registration. Chapter 3 Image Registration

IRIS SEGMENTATION OF NON-IDEAL IMAGES

Practical Image and Video Processing Using MATLAB

An Approach for Real Time Moving Object Extraction based on Edge Region Determination

EECS 556 Image Processing W 09. Image enhancement. Smoothing and noise removal Sharpening filters

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

DATA EMBEDDING IN TEXT FOR A COPIER SYSTEM

Subpixel Corner Detection Using Spatial Moment 1)

SECTION 5 IMAGE PROCESSING 2

Identification of Paintings in Camera-Phone Images

EE368 Project Report CD Cover Recognition Using Modified SIFT Algorithm

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

CoE4TN4 Image Processing

Basic relations between pixels (Chapter 2)

Digital Image Processing (CS/ECE 545) Lecture 5: Edge Detection (Part 2) & Corner Detection

Keywords Binary Linked Object, Binary silhouette, Fingertip Detection, Hand Gesture Recognition, k-nn algorithm.

Robbery Detection Camera

EE 584 MACHINE VISION

Extracting Layers and Recognizing Features for Automatic Map Understanding. Yao-Yi Chiang

Filtering Images. Contents

A MORPHOLOGY-BASED FILTER STRUCTURE FOR EDGE-ENHANCING SMOOTHING

EE795: Computer Vision and Intelligent Systems

CHAPTER 3 RETINAL OPTIC DISC SEGMENTATION

UNIVERSITY OF OSLO. Faculty of Mathematics and Natural Sciences

Solving Word Jumbles

Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong)

Advanced Image Processing, TNM034 Optical Music Recognition

Research on QR Code Image Pre-processing Algorithm under Complex Background

Algorithms for Edge Detection and Enhancement for Real Time Images: A Comparative Study

3D RECONSTRUCTION OF BRAIN TISSUE

Edge detection. Stefano Ferrari. Università degli Studi di Milano Elaborazione delle immagini (Image processing I)

Robot vision review. Martin Jagersand

Requirements for region detection

Keywords: Thresholding, Morphological operations, Image filtering, Adaptive histogram equalization, Ceramic tile.

Statistical Approach to a Color-based Face Detection Algorithm

Anno accademico 2006/2007. Davide Migliore

Text Information Extraction And Analysis From Images Using Digital Image Processing Techniques

Real-Time Human Detection using Relational Depth Similarity Features

Morphological Image Processing

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

MULTI ORIENTATION PERFORMANCE OF FEATURE EXTRACTION FOR HUMAN HEAD RECOGNITION

A SYSTEM OF THE SHADOW DETECTION AND SHADOW REMOVAL FOR HIGH RESOLUTION CITY AERIAL PHOTO

Binary Image Processing. Introduction to Computer Vision CSE 152 Lecture 5

Quaternion-based color difference measure for removing impulse noise in color images

ECE 172A: Introduction to Intelligent Systems: Machine Vision, Fall Midterm Examination

Part-Based Skew Estimation for Mathematical Expressions

Structural Analysis of Aerial Photographs (HB47 Computer Vision: Assignment)

Fourier Descriptors. Properties and Utility in Leaf Classification. ECE 533 Fall Tyler Karrels

Optical Character Recognition (OCR) for Printed Devnagari Script Using Artificial Neural Network

IN5520 Digital Image Analysis. Two old exams. Practical information for any written exam Exam 4300/9305, Fritz Albregtsen

Image Segmentation Image Thresholds Edge-detection Edge-detection, the 1 st derivative Edge-detection, the 2 nd derivative Horizontal Edges Vertical

CS 5540 Spring 2013 Assignment 3, v1.0 Due: Apr. 24th 11:59PM

Estimating the wavelength composition of scene illumination from image data is an

Skew Detection and Correction of Document Image using Hough Transform Method

Copyright Detection System for Videos Using TIRI-DCT Algorithm

A robust method for automatic player detection in sport videos

CS 223B Computer Vision Problem Set 3

Classifying Hand Shapes on a Multitouch Device

UNIT - 5 IMAGE ENHANCEMENT IN SPATIAL DOMAIN

Image Analysis. Edge Detection

Using Web Camera Technology to Monitor Steel Construction

Using Edge Detection in Machine Vision Gauging Applications

Available online at ScienceDirect. Energy Procedia 69 (2015 )

Blood vessel tracking in retinal images

COMPARATIVE STUDY OF DIFFERENT APPROACHES FOR EFFICIENT RECTIFICATION UNDER GENERAL MOTION

A Vision System for Automatic State Determination of Grid Based Board Games

OBJECT detection in general has many applications

Types of image feature and segmentation

The Detection of Faces in Color Images: EE368 Project Report

Product information. Hi-Tech Electronics Pte Ltd

Computer Vision 2. SS 18 Dr. Benjamin Guthier Professur für Bildverarbeitung. Computer Vision 2 Dr. Benjamin Guthier

Outline 7/2/201011/6/

A Method of Annotation Extraction from Paper Documents Using Alignment Based on Local Arrangements of Feature Points

1.Some Basic Gray Level Transformations

Logical Templates for Feature Extraction in Fingerprint Images

Face Tracking in Video

Line, edge, blob and corner detection

Transcription:

O. Feinstein, EE368 Digital Image Processing Final Report 1 Identifying and Reading Visual Code Markers Oren Feinstein, Electrical Engineering Department, Stanford University Abstract A visual code marker is an 11x11 element pattern with two fixed guide bars and three fixed cornerstones, which is used to uniquely label everyday objects such as CDs, DVDs, books, etc. Each marker has an 83-bit data code that can be captured using any digital camera and transmitted over the Internet to a central database that stores more information about the labeled object. This paper describes a method to find the locations of visual code markers in a 24-bit RGB image and read the 83-bit data embedded in each marker. Alternatives at critical stages of the algorithm are discussed. Many different images are shown to prove robustness. Index Terms digital image processing, visual code markers, adaptive thresholding, rotation and scale invariance. I. INTRODUCTION The goal of this project is to find all of the visual code markers in a 24-bit RGB image regardless of orientation using MATLAB. Upon identifying a visual code marker, the origin is reported as well as the 83-bit data embedded in the marker. After explaining the method used in this study, the results are shown and discussed. B. Overview of Algorithm The algorithm implemented can be described as a five-stage process (see Figure 2). The first stage takes the input image and pre-processes it to produce a cleaner image for the next stage. The second stage creates a binary image by using a rowby-row adaptive threshold. The third stage labels all of the unique regions and extracts useful information such as eccentricity, orientation, area, etc. The fourth stage searches for a candidate vertical guide bar, and if found, searches for the horizontal guide bar and the three cornerstones. Using the three cornerstone pixel coordinates as well as the centroid of the horizontal guide bar, the fifth stage defines the 11x11 element plane and outputs the 83 data bits for each marker found. II. METHODOLOGY A. Definition of Visual Code Marker A visual code marker is defined as an 11x11 square pattern with two guide bars and three cornerstones as shown in Figure 1 below. The 83 data bits are enclosed in red, and are read column by column, from top to bottom. A black square represents a data bit of 1, and a white square is a data bit of 0. The vertical guide bar is seven elements long and the horizontal guide bar is five. The origin (0,0) is defined as the upper left cornerstone, while the upper right cornerstone and lower left cornerstone are (10,0) and (0,10), respectively. Fig. 2. Five-stage algorithm with outputs of each phase in red. Fig. 1. A visual code marker example. Manuscript received June 02, 2006. O. Feinstein is with the Electrical Engineering Department at Stanford University, Stanford, CA 94305 USA (email: feinstei@stanford.edu). C. Pre-processing Since color is not needed in identifying the visual code markers, the first step in the algorithm is to convert the 24-bit RGB input image to a gray-scale image. However, instead of using the MATLAB function rgb2gray, [1] suggests weighting the gray-scale image as half red and half green because the blue component of the image has poor quality for sharpness and contrast. This optimization is also less computationally intensive and saves a fraction of a second in the overall run-time. Occasionally, the input image can be noisy, thus further preprocessing is needed. For this application, the most frequent type of noise is blurring either due to motion of the camera

O. Feinstein, EE368 Digital Image Processing Final Report while acquiring the image or a low quality camera as is common to cellular telephones. With that knowledge, a high pass filter is needed to counteract the low pass characteristics of the image. Even if there is no noise in the image, a high pass filter is used to further emphasize the transitions between black and white, which is prevalent throughout the visual code marker. A 3x3 kernel is constructed to convolve with the gray-scale image: 0-5 0-5 21-5 0-5 0 2 D. Adaptive Thresholding After pre-processing the image, the next step is to create a binary image as shown in Figure 4, where dark pixels are represented by the value 1 and light pixels are 0. According to [1], an adaptive threshold with a zigzag traversal of the scan lines is best. However, upon implementation, undesirable effects occur at even scan lines. Thus, a reasonable adaptation to [1] is to traverse row-by-row without traversing in a zigzag manner. Specifically, an average gs(n) is kept while passing through the image: gs(n) = gs(n-1) * (1 (1/s)) + pn But, since this high pass filter strongly amplifies high frequency components, a simple 3x3 average filter is developed to attempt to maintain the fidelity of the image: The combination of the high pass filter followed by an average filter preserves the quality of the image and simplifies the task of the adaptive threshold since the visual code marker gets boosted in the image. Figure 3 shows the results of preprocessing for a given input image. An alternative to the high pass and average filter combination was to use the scale-by-max color balancing technique on the gray-scale image. The idea was to boost the white components of the visual code marker and leave the black components relatively unaffected. However, the results were poor at best the thresholding stage did not define the transitions between black and white nearly as well as the high pass and average filter. where pn is the gray value of the current pixel, s = (1/8) * width of the image, and gs(0) = (1/2)*255*s. The pixel value binary_img(n) is selected by: binary_img(n) = 1, if pn < gs(n)/s * (100-t)/100 0, otherwise with t = 28. Using a constant threshold does not produce results like the adaptive threshold. Particularly, edges are not defined in the right places and there are more false positives detected as dark pixels. No morphological operations can be used to create a cleaner binary image because erosion may remove the cornerstones since they are innately small and dilation can connect the guide bars. Also, both erosion and dilation destroys the data in the code marker. Fig. 4. Binary image created by adaptive threshold stage. (a) (b) (c) (d) Fig. 3. The pre-processing stage. a) 24-bit RGB input image. b) gray = (red + green) / 2. c) Gray image after using high pass filter. d) Gray image after using high pass filter and then average filter. E. Label and Extract Key Information The binary image is then labeled uniquely according to 8connected white regions. The MATLAB function bwlabel is used for labeling by taking two passes through the image. The first pass traverses row-by-row or column-by-column and assigns preliminary labels for each region found. It may occur that two regions with different labels could be the same region, so a second pass handles equivalencies by assigning a final label. After labeling the binary image, we must extract key information for each region: area, orientation, eccentricity, centroid, major axis length, and minor axis length. MATLAB s regionprops function is used for this step. The key information is critical in defining metrics to search for candidate guide bars and cornerstones.

O. Feinstein, EE368 Digital Image Processing Final Report 3 F. Identify Visual Code Markers To find all of the visual code markers in the labeled image, the first step is to iterate through all of the unique labels and find potential vertical guide bars. This is accomplished by filtering out any regions that do not satisfy eccentricity and area conditions. For instance, the eccentricity as defined by MATLAB should be close to 1 for a guide bar. Also, if a human cannot see and read the visual code marker, then the computer likely will not either, thus there must be a constraint on size. Since the vertical guide bar is 7 elements long, a good threshold for size is typically 25 pixels or roughly 4 pixels per element. After finding a potential vertical guide bar, its orientation is used to locate a horizontal guide bar and three cornerstones. Only when the two guide bars and three cornerstones are found can a visual code marker be successfully classified. The process employed to find the horizontal guide bar and three cornerstones is exemplified in Figure 5. The technique essentially is to a project a line based on the orientation angle of the guide bars past the expected location of the target object. Then, the algorithm searches a window for the target object. If the object is found, the algorithm searches for the next target object, otherwise, it starts moving the window in the direction closest to the target object. The algorithm first searches for the horizontal guide bar by projecting a line from the centroid of the vertical guide bar. Then, if the horizontal guide bar is found, it projects a line from the centroid of the vertical guide bar in the opposite direction to find the upper right cornerstone. After finding the upper right cornerstone, a line is projected from the centroid of the horizontal guide bar to search for the lower left cornerstone. Finally, a decision needs to be made on how to search for the upper left cornerstone. Either the algorithm can project a line from the lower left cornerstone or it can project a line from the upper right cornerstone. Due to perspective in some images, the choice is made by taking the minimum of the absolute value of the orientation angle of the two guide bars to minimize the error. Therefore, the algorithm projects the line from the lower left cornerstone if the absolute value of the orientation angle of the vertical guide bar is less than the absolute value of the orientation angle of the horizontal guide bar, otherwise it projects from the upper right cornerstone. G. Read Code After obtaining the two guide bars and the three cornerstones, four point correspondences define the visual code marker plane. As in [1], the four points are the centroid of the horizontal guide bar and the centers of the three cornerstones: Element Image Coordinate Code Coordinate upper left cornerstone (x 0,y 0 ) (0,0) upper right cornerstone (x 1,y 1 ) (10,0) horizontal guide bar (x 2,y 2 ) (8,10) lower left cornerstone (x 3,y 3 ) (0,10) A code coordinate (u,v) can then be mapped to an image coordinate (x,y) by x = (au + bv + 10c) / (gu + hv + 10) y = (du + ev + 10f) / (gu + hv + 10) where the parameters a through h are calculated from the four point correspondences as follows: dx 1 = x 1 x 2, dy 1 = y 1 y 2 dx 2 = x 3 x 2, dy 2 = y 3 y 2 sig_x = 0.8x 0 0.8x 1 + x 2 x 3 sig_y = 0.8y 0 0.8y 1 + y 2 y 3 g = (sig_x*dy 2 sig_y*dx 2 ) / (dx 1 *dy 2 dy 1 *dx 2 ) h = (sig_y*dx 1 sig_x*dy 1 ) / (dx 1 *dy 2 dy 1 *dx 2 ) a = x 1 x 0 + gx 1 d = y 1 y 0 +gy 1 b = x 3 x 0 + hx 3 e = y 3 y 0 + hy 3 c = x 0 f = y 0 To read the code, the algorithm traverses code coordinates column-by-column from top to bottom starting with code coordinate (0,2). Since the visual code marker has built-in white space surrounding the cornerstones and guide bars, not all of the 11x11 elements in the marker are read. Specifically, the algorithm only reads the data code coordinates. For example, the left-most two columns have 7 rows of valid data, the next three columns have 11, and the next four columns have 9, totaling 83 data bits. Fig. 5. Algorithm to search for the horizontal guide bar and cornerstones. III. RESULTS To prove the robustness of the algorithm a set of 12 training images were provided. However, the training set provided could not possibly be sufficient to test the many possible code marker and camera orientations, thus 37 more images were added to the training set. The algorithm ran on average between a fraction of a second and three seconds, and had an accuracy of nearly 96%. The two images that the algorithm failed to detect the visual code marker were extreme cases where the camera was either too close with a severe perspective or at a perspective that

O. Feinstein, EE368 Digital Image Processing Final Report 4 skewed the cornerstones and made the horizontal guide bar appear much larger than the vertical guide bar. Figures 6-16 show pairs of images the algorithm succeeded on. The left image of the pair is the 24-bit RGB input image, and the right image is a binary image that highlights the vertical guide bar of each visual code marker found. Fig. 11. training_29.jpg Fig. 6. training_8.jpg Fig. 12. training_32.jpg Fig. 7. training_9.jpg Fig. 13. training_45.jpg Fig. 8. training_10.jpg Fig. 14. training_46.jpg Fig. 9. training_11.jpg Fig. 15. training_49.jpg Fig. 10. training_2.jpg Fig. 16. training_5.jpg

O. Feinstein, EE368 Digital Image Processing Final Report 5 IV. CONCLUSION An algorithm to identify visual code markers in 24-bit RGB images has been proposed. Employing a five-stage process, the algorithm identifies the origin of any visual code marker in the image and reads the 83-bit data associated with each marker. Various alternatives at certain stages were discussed. Several results were shown to prove the robustness of the algorithm. On average, the program did not run longer than 4 or 5 seconds, making it near real-time for this particular application. The accuracy was very near perfection. Further work for this project could include optimizing the code to run faster, creating an even cleaner binary image from the adaptive thresholding stage, and exploring alternatives for identifying visual code markers that are severely skewed. ACKNOWLEDGEMENTS Many thanks to Professor Bernd Girod, Chuo-Ling, and Aditya for conducting an excellent image processing class. REFERENCES [1] M. Rohs, Real-World Interaction with Camera-Phones, 2 nd International Symposium on Ubiquitous Computing Systems (UCS 2004), Tokyo, Japan, November 2004. [2] M. Rohs, Visual Code Widgets for Marker-Based Interaction, IWSAWC 05: Proceedings of the 25 th IEEE International Conference on Distributed Computing Systems Workshops (ICDCS 2005 Workshops), Columbus, Ohio, USA, June 6-10, 2005.