A Computer Vision System for Graphical Pattern Recognition and Semantic Object Detection

Similar documents
Several pattern recognition approaches for region-based image analysis

EE 584 MACHINE VISION

Practical Image and Video Processing Using MATLAB

Chapter 11 Representation & Description

Image retrieval based on region shape similarity

Machine vision. Summary # 6: Shape descriptors

CLASSIFICATION OF BOUNDARY AND REGION SHAPES USING HU-MOMENT INVARIANTS

Region-based Segmentation

- Low-level image processing Image enhancement, restoration, transformation

Image Processing, Analysis and Machine Vision

A Content Based Image Retrieval System Based on Color Features

MORPHOLOGICAL BOUNDARY BASED SHAPE REPRESENTATION SCHEMES ON MOMENT INVARIANTS FOR CLASSIFICATION OF TEXTURES

3D Object Recognition using Multiclass SVM-KNN

Nearest Clustering Algorithm for Satellite Image Classification in Remote Sensing Applications

Automatic Categorization of Image Regions using Dominant Color based Vector Quantization

Boundary descriptors. Representation REPRESENTATION & DESCRIPTION. Descriptors. Moore boundary tracking

Chapter 11 Representation & Description

9 length of contour = no. of horizontal and vertical components + ( 2 no. of diagonal components) diameter of boundary B

Figure 1: Workflow of object-based classification

Color Image Segmentation

Shape Classification and Cell Movement in 3D Matrix Tutorial (Part I)

Robust Shape Retrieval Using Maximum Likelihood Theory

An Introduction to Pattern Recognition

IMPROVING SPATIO-TEMPORAL FEATURE EXTRACTION TECHNIQUES AND THEIR APPLICATIONS IN ACTION CLASSIFICATION. Maral Mesmakhosroshahi, Joohee Kim

Robotics Programming Laboratory

Morphological Image Processing

Fully Automatic Methodology for Human Action Recognition Incorporating Dynamic Information

Topic 6 Representation and Description

Sketch Based Image Retrieval Approach Using Gray Level Co-Occurrence Matrix

CS4733 Class Notes, Computer Vision

Research on Recognition and Classification of Moving Objects in Mixed Traffic Based on Video Detection

CS 534: Computer Vision Segmentation and Perceptual Grouping

Fourier Descriptors. Properties and Utility in Leaf Classification. ECE 533 Fall Tyler Karrels

Image Analysis - Lecture 5

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

CS 188: Artificial Intelligence Fall 2008

ROTATION INVARIANT TRANSFORMS IN TEXTURE FEATURE EXTRACTION

Content Based Image Retrieval (CBIR) Using Segmentation Process

EE368 Project Report CD Cover Recognition Using Modified SIFT Algorithm

EE 701 ROBOT VISION. Segmentation

CoE4TN4 Image Processing

Digital Image Processing

Fuzzy Segmentation. Chapter Introduction. 4.2 Unsupervised Clustering.

Pattern recognition. Classification/Clustering GW Chapter 12 (some concepts) Textures

Problem definition Image acquisition Image segmentation Connected component analysis. Machine vision systems - 1

DIGITAL IMAGE ANALYSIS. Image Classification: Object-based Classification

Face Recognition Based on LDA and Improved Pairwise-Constrained Multiple Metric Learning Method

Texture Analysis and Applications

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction

WAVELET USE FOR IMAGE CLASSIFICATION. Andrea Gavlasová, Aleš Procházka, and Martina Mudrová

Edge Histogram Descriptor, Geometric Moment and Sobel Edge Detector Combined Features Based Object Recognition and Retrieval System

Texture Segmentation by Windowed Projection

Machine learning Pattern recognition. Classification/Clustering GW Chapter 12 (some concepts) Textures

Morphological Image Processing

CS443: Digital Imaging and Multimedia Binary Image Analysis. Spring 2008 Ahmed Elgammal Dept. of Computer Science Rutgers University

Glasses Detection for Face Recognition Using Bayes Rules

Digital Image Processing

Content Based Image Retrieval Using Combined Color & Texture Features

Multi prototype fuzzy pattern matching for handwritten character recognition

Blood Microscopic Image Analysis for Acute Leukemia Detection

Prewitt. Gradient. Image. Op. Merging of Small Regions. Curve Approximation. and

Clustering Analysis based on Data Mining Applications Xuedong Fan

EE795: Computer Vision and Intelligent Systems

Computer vision: models, learning and inference. Chapter 13 Image preprocessing and feature extraction

Content Based Image Retrieval Using Color Quantizes, EDBTC and LBP Features

The Institute of Telecommunications and Computer Sciences, UTP University of Science and Technology, Bydgoszcz , Poland

Firm Object Classification using Butterworth Filters and Multiscale Fourier Descriptors Saravanakumar M

Processing of binary images

Texture. Texture is a description of the spatial arrangement of color or intensities in an image or a selected region of an image.

Lecture 8 Object Descriptors

NCC 2009, January 16-18, IIT Guwahati 267

Clustering Methods for Video Browsing and Annotation

HCR Using K-Means Clustering Algorithm

Fundamentals of Digital Image Processing

Short Survey on Static Hand Gesture Recognition

Clustering CS 550: Machine Learning

ENHANCED GENERIC FOURIER DESCRIPTORS FOR OBJECT-BASED IMAGE RETRIEVAL

Digital Image Processing. Lecture # 15 Image Segmentation & Texture

Bags of Strokes Based Approach for Classification and Indexing of Drop Caps

Pattern recognition. Classification/Clustering GW Chapter 12 (some concepts) Textures

Texture Analysis for Recognition of Satellite Cloud Images Using Orthogonal Polynomial Model

Image Segmentation for Image Object Extraction

A Novel Texture Classification Procedure by using Association Rules

Tongue Recognition From Images

Texture Segmentation and Classification in Biomedical Image Processing

Multiresolution Texture Analysis of Surface Reflection Images

09/11/2017. Morphological image processing. Morphological image processing. Morphological image processing. Morphological image processing (binary)

Normalized Texture Motifs and Their Application to Statistical Object Modeling

Clustering and Visualisation of Data

TEVI: Text Extraction for Video Indexing

A Texture Feature Extraction Technique Using 2D-DFT and Hamming Distance

Multispectral Image Segmentation by Energy Minimization for Fruit Quality Estimation

K S Prasanna Kumar et al,int.j.computer Techology & Applications,Vol 3 (1),

MULTI ORIENTATION PERFORMANCE OF FEATURE EXTRACTION FOR HUMAN HEAD RECOGNITION

Image Analysis Image Segmentation (Basic Methods)

Water-Filling: A Novel Way for Image Structural Feature Extraction

Improving Image Segmentation Quality Via Graph Theory

CHAPTER 4 SEMANTIC REGION-BASED IMAGE RETRIEVAL (SRBIR)

Shape description and modelling

Lecture 6: Multimedia Information Retrieval Dr. Jian Zhang

Transcription:

A Computer Vision System for Graphical Pattern Recognition and Semantic Object Detection Tudor Barbu Institute of Computer Science, Iaşi, Romania Abstract We have focused on a set of problems related to image pattern recognition and interpretation domain. In the introduction, we describe the general aspects of this computer vision task. Then we provide an approach for image objects identification and we provide a structure for these objects. A shape classification and a structure-based classification of the obtained image objects are then performed. Finally, after completing the steps of extraction and recognition, an objects interpretation (semantic extraction) method is also proposed. The paper is ending with a conclusions section. Keywords: pattern recognition, shape classification, graphical objects structure, image interpretation, semantic extraction, human interaction, image retrieval 1. Introduction Our main task is to implement a computer vision system which is detecting, recognizing and interpreting the image objects. As we know, any color or grayscale image contains a finite number of uniform or (and) textured regions. We have considered a graphical object as a connected set of such image regions. Our proposed computer vision system, is completing the following steps: 1. A preprocessing stage 2. A region based image segmentation 3. Graphical (image) objects identification 4. Image objects recognition 5. A semantic-based classification of the recognized objects We will not insist here on the first two steps. Depending on the image s state, several enhancement filtering actions are applied in the preprocessing state: image smoothing, edge enhancement or contrast adjustement ([1],[2]). For the resulted enhanced image we then perform a region-based segmentation, using the uniformity (or texture) similarity as a similarity criterion ([1],[4],[5]). In the third step we provide a structure for the graphical objects, based on the image segments obtained in second step. Then a recursive extraction method for image objects is proposed. At the fourth level, a moment based shape feature extraction, then supervised and unsupervised shape classification methods, and finally a structure based clustering, are provided. In the last step we give an image interpretation algorithm. We have done Matlab implementations for the described approaches and obtained a set of satisfactory results. 2. A recursive image object detection approach In this section we present a structuring method and an extraction approach for graphical objects. The clusters obtained as the result of previous image segmentation are analysed and labeled as background or foreground. We have proposed the following reasoning for foreground extraction: each cluster it is analysed and if the area of the smallest convex polygon including that cluster is equal with or greater than 3/4 of the total image s area, then the cluster is considered as background, otherwise it is a foreground segment. The computed foreground clusters become the primitives, our basic graphical units. A graphical primitive is not characterized by texture or omogenity anymore, the shape being its only characteristic. Each image object is modeled as an n order complex object, which represents a set of connected primitives. A first order one is called a simple object. A complex object is characterized not only by shape but also by its structure, which is related to the types of its components. Our image recognition approach is based on the following property: two objects are considered similar if they have similar shapes and similar structures. The uniformity (texture) similarity being not compulsory, we do not consider the texture

(uniformity) as a primitive feature. Let p 1, p 2,..., p N be the image primitives. An n- order object x of input image I can be modeled as x = {p i1, p i2,..., p in }, where i 1, i 2,..., i n {1,..., N}. An m order subobject of x is a complex object y = {p j1,..., p jm }, where j 1,..., j m {i 1,..., i n }. For x we have defined its shape, Shape(x), as the set of coordonates (X, Y ) of its pixels, therefore having relation: Shape(x) = Shape(p i1 )...Shape(p in ) (1) Also we express the structure of x, Struct(x), as the pair: Struct(x) = (S(x), R(x)) (2) where S(x) = {i 1, i 2,..., i n } and R(x) = R(p i1,..., p in ) represents a relationship between primitives, related to the way of combining them in the body of x. There are many ways to express R, each of them describing the relations of neighborhood in the set {p i1,..., p in }. The basis form of R is that of a neighborhoods matrix n n, neigh n x, given by: { 1, if neigh n pia, p ib connected x(a, b) = (3) 0, if p ia, p ib unconnected For getting easier implementations of matrices neigh n x, we have considered the general neighborhoods matrix of I, defined as: { 1, if pi and p j connected neigh I (i, j) = (4) 0, otherwise This matrix is computed considering each pair of different simple objects, (p i, p j ), i j. These primitives are neighbors if there is at least a pair of connected pixels such as a pixel belongs to p i and the other to p j. We use 4- neighborhoods in this work. For each (p i, p j ), i < j and i, j {1,..., N} all coordonate elements from Shape(p i ) and Shape(p j ) are searched until a pair (C 1, C 2 ), C 1 Shape(p i ) and C 2 Shape(p j ), is found having the property d(c 1, C 2 ) = 1 (d being euclidean distance between vectors). In this case p i and p j are neighbors, otherwise they are not. Extracting an object x from the image I means computing Shape(x), S(x) and R(x). The next recursive algorithm is computing S(x) only, the other two measures being obtained from (1) and (3). Our method extracts the (n 1) order objects first, then adds to each of them the primitives from their neighborhood. The n order objects are obtained this way. 1. If n = 1 then the n-objects are x i = {p i }, i = 1, N, and S(x i ) = {i}, i = 1, N. Else {n > 1} 2. Find the (n 1) order objects recursively: x i and S(x i ), k < N, i = 1, K, resulting For i = 1 to K do For each p s with s / S(x i ) If t S(x i ) such that neigh I (s, t) = 1 3. Find n order object x, given by S(x) = S(x i ) {s}; All the complex objects of I can be identified, applying this procedure for each order n. Having all the graphical objects, we can focus on their recognition now. Every complex object x is given by the pair (Shape(x), Struct(x)) = (Shape(x), (S(x), R(x))), therefore the classification task has to be divided in two problems, related to Shape(x) and (S(x), R(x)), respectively. Obviously, objects of different orders cannot belong to the same class, even if they have similar shapes, because of their structures. The classification making sense for same order objects only, we can consider each n as determining a category of image objects: the class of n order objects (let name it n class). Each n class will be divided in new subclasses on the principle of shape similarity and structure similarity. 3. A shape recognition approach We have formulated our shape recognition task as a pattern recognition problem with the two steps: feature extraction and pattern classification. Considering a given shape as pattern, a similar new shape could be obtained through geometric transforms, such as translation, rotation, scaling and reflection, or smooth deformation transforms, such as stretching, squeezing and dilation. For featuring a shape we need therefore an invariant measure to these transformations. There are many approaches in this direction, the most known being the Fourier descriptors ([6]), the moment based methods ([1],[3]), Zernike polynomials or scalar methods. We have used a moment based approach for shape feature extraction. Testing first area moments, then perimeter moments, for shape description, we have obtained similar feature vectors. Therefore we describe here the area moments case only, when all the pixels of a pattern are used in the computation. For an object o we consider all the elements of Shape(o), the (x, y) coordonate paris corresponding to the pixels of o, and the image function f(x, y) = { 1, (x, y) Shape(o) 0, (x, y) / Shape(o). (5)

For this f, the (p + q)-order moment of o become: m p,q = x p y q f(x, y) = x p y q (6) x y x y The center of gravity (centroid) of the object o has the coordonate pair ( x, ȳ) given by: x = m 10 m 00, ȳ = m 01 m 00 (7) The central moment of o is obtained by computing the pq moment with respect to the centroid: µ pq = (x x) p (y ȳ) q (8) x y The obtained shape feature µ pq is a translation invariant moment. Using the standard deviations in the x and y directions, given by µ20 µ02 σ x =, σ y =, (9) m 00 m 00 a stretching, squeezing and dilation invariant moment is computed by: µ pq = ( ) p ( ) q x x y ȳ (10) σ x y x σ y. To make it scaling invariant, this new central moment must be normalized as below: η pq = µ pq µ γ, γ = p + q + 1 (11) 00 2 Using the η pq moments, we propose a feature vector v(o) = (f 1, f 2 ) where: f 1 = (η 30 + η 12 ) 2 + (η 03 + η 21 ) 2, (12) a Hu moment ([3]) used in expressing the object s eccentricity, rotation reflection invariant, and f 2 = η 24 + η 42, (13) a rotation reflection invariant moment based feature too. We have tested both v(o) = (f 1, f 2 ) and also scalar form v(o) = f 1 + f 2 and obtained similar results in the classification stage following feature detection step. For a set of shapes, supervised or unsupervised learning approaches can be applied, depending of the knowledge we have about classes. Our supervised classification approach is based on a VQ (Vector Quantization) method ([1],[7]), implementing a variant of the k means algorithm. The number of classes, K, is set by human interaction after displaying the shapes on the screen. For obtaining a training set the user will select, by right clicking on screen, K appropriate different shapes, their feature vectors becoming training vectors. We have proposed the next k means algorithm: For i = 1 to K do 1. C i := {t i }; t i, i = 1, K being the training vectors Repeat For i = 1 to N do (v 1,..., v N being all feature vectors) 2. Find j for which min j d(v i, mean(c j )) is obtained If v i / C j then {position s change} 3. C j := C j {v i }; 4. Find C l, l j, such that v i C l ; 5. C l := C l \ {v i }; Until no more position changes {optimization criterion} Our optimization criterion conditions that all feature vectors are at right place. The algorithm performes a supervised classification, the sets C i being the resulted classes of vectors. Unsupervised learning consists of a natural grouping in the feature space, no training sets being used ([1]). We have proposed a region growing algorithm for unsupervised classification. In the next pseudocode, K, the classes number, is supposed known (it can be set also interactively). For i = 1 to N 1. C i = {V (x i )}; 2. C = { }; index = 1; {C helping set}, Repeat 2. m := number of C i s (initially N); For i = 1 to m 1 For j = 2 to m If d(mean(c i ), mean(c j )=d min {smallest overall distance} then 3. C(index) := C i C j ; {region merging} 4. inc(index); For i = 1 to index 1 5. C i := C(i); Until m = K {optimization criterion} At each loop the most closed regions are unified in a new region. The process continues until the optimization criterion is satisfied: regions number equals the classes number. We could obtain a more natural grouping if we eliminate the knowledge of K and therefore the human interaction. If the number of clusters is unknown, at step 3 we introduce one more condition for the two classes merging: C i and C j are unified if d(mean(c i ), mean(c j )) < T,

where T is a threshold depending on values from C i and C j. The optimization criterion is modified as: Repeat... until d min T. We have tested as threshold the value T = 1 α min(mean(c i), mean(c j )). The K classes of shapes being obtained, the recognition task is therefore solved. 4. Structure-based object classifications Let us present now the complex objects recognition solution. We have considered the following situations: 1. Perform the order classification of the objects computing the n classes of image I. For each n class do the following steps: 2. n = 1: In this case only a shape classification it is performed, as presented in last section. In next figure the shape classification results for a segmented image are described: five classes, containing 3, 2 or 1 simple objects, distinguished by different colors (background is represented in black). Then for each primitive p i, its class value is associated: Class1(i) = the number, between 1 and K, of the class of p i. Fig. 1 4. If n 3, a shape classification is done first. Then, for each shape class an S classification is performed over it. Finally, every S class will be divided into subclasses determined by R similarity. It is obviously that the case n 3 implies a total structure classification (using both S and R), while the n = 2 case implies a partial structure classification (using S only). Fig. 2 a) (1,1) b) (2,2) c) (3,3) d) (4,4) e) (5,5) f) (6,1) g) (7,2) h) (8,3) 3. If n = 2, the 2-order image objects are computed and classified by shape, first. Then, for each shape class an S classification is performed. For each x, belonging to a class and given by S(x) = {i 1, i 2 }, the unordered set {Class1(i 1 ), Class1(i 2 )} is considered. If no other object, processed before it, is described by this set, x is inserted in a new S-class. Otherwise it is included in the S class whose objects have that set. The obtained S-classes being final classes, a class value is associated for each object: Class2(i) = the class number of the 2-order object x i. The results for our particular case are represented in the next figure. For each a)-h) image, representing a complex 2-order object x i, the labeling pair (i, Class2(i)) has been associated. The reason of this situation is that in the case of 2 order objects the neighborhood notion has no relevance. If x is an 2 order object and p i, p j are 2 primitives of it, it is obviously that p i an p j are connected. But if x is an n-order object, n being equal or greater than 3, p i could be connected with p j or not, therefore the relation of neighborhood, given by R(x), is very important. The examples offered in Fig. 3 give a graphical explanation of the difference between cases. We can easily observe that the two 3-order objects displayed in image are not similar, although they have similar shapes and primitives, because of different R relationships. We may use R(x) = neigh n x, n 3, or we can obtain new forms for R by transforming the neighborhood matrix of x. Determining the 2-order subobjects of x and then classifying them is a good approach for the structure classification of x. Fig. 3

a) (1,1) b) (2,2) c) (3,3) d) (4,4) e) (5,5) f) (6,6) g) (7,7) h) (8,1) 5. An semantic extraction method From relation 3, the equivalence between neigh n x and the set {(p ia, p ib ) a, b {1,..., n}, a b p ia, p ib being neighbors} results. This set is equivalent with {(i a, i b ) p ia connected with p ib, i a, i b S(x)} = {S(x 2 ) x 2 x, x 2 is 2-order object}, the 2-order subobjects being identified this way. Two complex objects will be considered similar if they have similar shapes and also imilar 2-order subobjects. Therefore, for each object x of a shape class, we consider the set of the indices of its 2-order subobjects, S 2 (x) = {j 1,..., j t }. Then we compute the set {Class2(j 1 ),..., Class2(j t )} as a feature vector for object x. Then we perform a classification of these vectors, in the S-classification manner previously described, the final classes being obtained. The results of the case n = 4 are represented in the next figure, each complex graphical object x i being placed in an image labeled with a pair (i, index of class of x i ). Fig. 4 The last processing operation implemented by our computer vision system is a graphical object interpretation (understanding). As we have just described, a number of classes have been obtained as the object recognition result. It is obviously that each resulted class has the following property: all its members are either semantic entities or nonsemantic ones. We have to handle an interpretation-based clustering task: to separate the semantic classes from the meaningless ones. Of course, the term semantic it is a relative one, depending on what graphical objects are of interest for us. They become semantic objects and the others remain just sintactic ones. We can associate to semantic classes a text description of their objects meaning and to the non-sematic ones, the label meaningless. For example, in Fig. 2 we could label the first class with the text House, because it contains two objects representing houses, and the 4th class with meaningless. We have proposed the next method for automatic interpretation-based classification: a graphical object recognition is performed for an input image, a number of classes resulting; a set of text-labeled image objects representing entities of interest are created (houses, trees, cars, for example) and their orders are determined; each such labeled object is then compared with the classes having the same order with it and if it can be insert in such a class, that class becomes semantic and it is labeled with object s description; finally, all labeled classes are considered semantic classes and the unlabeled ones become non-semantic, being labeled with meaningless. The semantic extraction task is therefore solved.

6. Conclusions Graphical recognition and interpretation approaches have been proposed in this paper. They have been implemented and their results have been presented. These results can be succesfully applied in image retrieval domain, more exactly in solving the graphical object retrieval tasks. For example, considering an n-order object as an image query, we have to solve the task of finding it in an image database. The output must be composed of all the images which contain similar graphical objects. Using our image understanding approach, the image query can be replaced with a text-query related to object s interpretation. Given a text label as input, the database is searched and the images which contain graphical objects having a closed description are retrieved. For solving this task we should extract, classify and text-label the complex objects of each image from analyzed database and compare the obtained labels with the input query. References [1] Anil K. Jain, Fundamentals of Digital Image Processing, Prentice Hall International, Inc., 1989. [2] Jan T. Young, Jan. J. Gerbronds, L. J. van Sliet, Fundamentals of Image Processing, 1998. [3] Hu, M.K., Visual pattern recognition by moment invariants, IRE Transactions on Information Theory, vol. IT- 8, February 1962, pp. 179 187. [4] B.S. Monjunath and W.Y. Ma, Texture Features for Browsing and Retrieval of Image Data, IEEE Transactions on Pattern Analysis and machine Intelligence, vol. 18, No. 8, 1996. [5] Andrew Lian and Jian Fan, An Adaptive Approach for Texture Segmentation by Multichannel Wavelet Frames, Center for Computer Vision and Visualization, August 1993. [6] Dengsheng Zhang and Guojun Lu, Enhanced generic Fourier descriptors for object-based image retrieval, International Conference on Acoustics, Speech and Signal Processing, 2002 [7] Richard O. Duda, Pattern Recognition for HCI, Department of Electrical Engineering San Jose State University, 1996-1997