Lecture 6: Multimedia Information Retrieval Dr. Jian Zhang

Lecture 6: Multimedia Information Retrieval Dr. Jian Zhang NICTA & CSE UNSW COMP9314 Advanced Database S1 2007 jzhang@cse.unsw.edu.au

Reference Papers and Resources Papers: Colour spaces-perceptual, historical and applicational background: An overview of colour spaces used in image processing. Colour indexing: using Histogram Intersection for object identification and Histogram Backprojection for object location. Comparing Images Using Color Coherence Vectors: The original paper for CCV. Using Perceptually Weighted Histograms for Colour-based Image Retrieval: The original paper for PWH. The QBIC Project-Querying Images By Content Using Color, Texture, and Shape: The original paper for IBM QBIC project. Useful resources MPEG-7 homepage: http://www.chiariglione.org/mpeg/standards/mpeg-7/mpeg-7.htm IBM QBIC system homepage: http://wwwqbic.almaden.ibm.com/ UIUC CBIR system homepage: http://www.ifp.uiuc.edu/~qitian/mars.html COMP9314 Advanced Database Systems Lecture 6 Slide 2 J Zhang

6.1 Image Retrieval based on Texture Texture Introduction to texture feature The concept of texture is intuitively obvious but has no precise definition Texture can be described by its tone and structure Tone based on pixel intensity properties Structure describes spatial relationships of primitives COMP9314 Advanced Database Systems Lecture 6 Slide 3 J Zhang

6.1 Image Retrieval based on Texture Texture MPEG-7 standard The homogeneous texture descriptor (HTD). Two components of the HTD will be performed in the whole extraction procedure Mean energy Energy deviation The 2-D frequency plane is partitioned into 30 frequency channels The syntax of HTD = [fdc, fsd, e1,e2, e30,d1,d2,,d30]. where fdc and fsd are the mean and standard deviation of the image respectively, Where ei and di are the mean energy and energy deviation that nonlinearly scaled and quantized of the ith channel COMP9314 Advanced Database Systems Lecture 6 Slide 4 J Zhang

6.1 Image Retrieval based on Texture Texture The frequency plane partitioning is uniform along the angular direction but not uniform along the radial direction. COMP9314 Advanced Database Systems Lecture 6 Slide 5 J Zhang

6.1 Image Retrieval based on Texture Texture Each channel is modeled using Gabor function: If a channel indexed by (s,r) where s is the radial index and r is the angular index. Then the (s,r)-channel in the freq. domain G s,r ( ω, θ ) 2 ( ω ω ) ( θ θ s r = exp 2 exp 2 2σ 2τ s r ) 2 σ τ Where and are the standard deviation of the Gaussian in the s r radial direction and the angular direction, respectively COMP9314 Advanced Database Systems Lecture 6 Slide 6 J Zhang

6.2 Image Retrieval based on Texture Texture The energy of each channel is defined as the log-scaled sum of the square of the Gabor-filtered Fourier transform coefficients of an image e i = log 10 [1 + p i ] where p i = 1 360 [ Gs, r ( ω, θ ) ω P( ω, θ )] + + ω= 0 θ= (0 ) 2 COMP9314 Advanced Database Systems Lecture 6 Slide 7 J Zhang

6.2 Image Retrieval based on Texture Texture P( ω, θ ) P( ω, θ ) = F( ω cosθ, ω sinθ ) F( u,v ) the Fourier transform of an image represent in the polar freq. domain where is the Fourier transform in the Cartesian coordinate system F( u, v) = 1 MN M 1N 1 f ( x, y) e x = 0 y = 0 j2π ( ux/ M + vy / N) The energy deviation of each feature channel is defined as the log-scaled standard deviation of the square of the Gabor-filtered Fourier transform coefficients of an image d = log [ 1+ i 10 q i ] where q 1 360 2 2 { [ Gs, r ( ω, θ ) ω P( ω, θ )] pi} + + ω= 0 θ= (0 ) The HTD consists of the mean and standard deviation of the image intensity, the energy and energy deviation for each feature channel e i i d i = COMP9314 Advanced Database Systems Lecture 6 Slide 8 J Zhang

6.2 Image Retrieval based on Texture Texture Texture [4] can also be defined as a function of the spatial variation in pixel intensities. One example is to use statistical properties of the spatial distribution of gray-levels of an image. Two types of statistical properties can be used, i.e. (1) first-order statistics and (2) second-order statistics. The first-order statistics measures only depend on the individual pixel gray-levels. Define L-- the number of distinct grey levels Define z the random variable denoting the grey-level Define p( z i ) -- the probability of a grey level occurring in the image COMP9314 Advanced Database Systems Lecture 6 Slide 9 J Zhang

6.2 Image Retrieval based on Texture Texture The first-order statistics measures only depend on the individual pixel gray-levels. Define L-- the number of distinct grey levels Define z the random variable denoting the grey-level Define p( z i ) -- the probability of a grey level occurring in the image Overall mean Overall standard deviation m L = 1 i = 0 z p( z ) i i σ = Skewness R-Inverse variance L 1 3 3( ) 1 μ z = ( zi m) p( z i ). R = 1 i= 0 1+ σ 2 ( z) Overall Uniformity Overall Entropy L 1 i = 0 2 U = p ( z i ) e = i= L 1 i= 0 m) 2 i z i COMP9314 Advanced Database Systems Lecture 6 Slide 10 J Zhang L 1 0 ( z p( z p( ) i ) log 10 p( z i )

6.2 Image Retrieval based on Texture The second-order statistics take into account the relationship between the pixel and its neighbors The Grey-level Co-occurrence Matrix (GLCM) is used to calculate the second-order statistics. Suppose the following 4x4 pixel image with 3 distinct greylevels: 1 1 0 0 1 1 0 0 0 0 2 2 And d = (dx, dy) = (1,0) means that compute the cooccurrences of the pixels to the left of the current one. 0 0 2 2 COMP9314 Advanced Database Systems Lecture 6 Slide 11 J Zhang

6.2 Image Retrieval based on Texture The 3x3 co-occurrence matrix is defined as follows. From the table, the element [0,0] in the GLCM matrix is 4. That is the number of counts of pixels with grey-level 0 that have a unit with a gray-level of 0 in the left 1 1 0 0 1 1 0 0 0 0 2 2 0 0 2 2 COMP9314 Advanced Database Systems Lecture 6 Slide 12 J Zhang

6.2 Image Retrieval based on Texture The Symmetrical GLCM can be computed by adding it to its transpose such as with the position operator (-1,0). A GLCM will be then normalized by dividing each individual element by the total count in the matrix giving the co-occurrence probabilities. Computing the GLCM over the full 256 gray-level is very expensive and it will also not achieve a good statistical approximation due to a lot of cells with zero values A 16 linearly scaled grey-levels is commonly used in CBIR application. The position operation in a CBIR system can be: (1,0), (0,1), (1,1) and (-1,0). COMP9314 Advanced Database Systems Lecture 6 Slide 13 J Zhang

6.2 Image Retrieval based on Texture Based on GLCM, the second-order statistics are then computed as follows: Angular Second Moment (Energy) measures the homogeneity of the image A = i j Entropy has the same meaning with one of the first-order statistics but using GLCM instead: δ = i j Inverse Difference Moment (Homogeneity) I is another measure of homogeneity which is sometimes called local homogeneity cij I = 2 1+ ( i j) i j 2 c ij cij log 2 c ij A COMP9314 Advanced Database Systems Lecture 6 Slide 14 J Zhang

6.2 Image Retrieval based on Texture Contrast (Inertia) measures how inhomogeneous the image is C = ( i i j 2 c ij Correlation cor measures the linear dependency on the pairs of pixels: cor = i j x x y j) ( i μ )( j μ ) c σ σ y ij Where μ = [i ] x c ij i j μ = [ j ] y c ij j i 2 2 σ = [( i μ ) c ] σ y = [( j μ y ) cij ] x i x j ij j i COMP9314 Advanced Database Systems Lecture 6 Slide 15 J Zhang

6.2 Image Retrieval based on Texture Local Edge Histograms The edge histogram descriptor (EHD) defined in MPEG- 7 represents local edge distribution in the image Specifically, the image is first divided into sub-images. The local-edge distribution for each sub-image can be represented by a histogram. To generate the histogram, edges in the sub-images are categorized into five types: vertical, horizontal, 45 degree diagonal, 135 degree diagonal, nondirectional edges and then computed for each sub-images Since there are 16 sub-images, totally 5x16=80 histogram bins are required COMP9314 Advanced Database Systems Lecture 6 Slide 16 J Zhang

6.2 Image Retrieval based on Texture Local Edge Histograms (0,0) (0,1) (0,2) (0,3) 256 (1,0) (2,0) (1,1) (2,1) (1,2) (2,2) (1,3) (2,3) (3,0) (3,1) (3,2) (3,3) 64 Image 384 96 Sub-Image 4 4 a 0 a 1 4 8 8 Image Block 4 a 2 a 3 An example for dividing an image into sub-images and 8x8 image blocks COMP9314 Advanced Database Systems Lecture 6 Slide 17 J Zhang

6.2 Image Retrieval based on Texture Local Edge Histograms EHD extraction: Each sub-image is first converted to grey-scale levels. The EHD calculation is based on image blocks such as 8x8 pixels. For a 384x256 size of image, 16 sub-images is divided and each sub-image is further divided into 8x8 blocks, the average intensities in the image block are defined as a0, a1, a2 and a3 respectively. The edge direction of a block is determined by calculating the edge magnitudes. COMP9314 Advanced Database Systems Lecture 6 Slide 18 J Zhang

6.2 Image Retrieval based on Texture EHD extraction The largest edge magnitude is chosen as the edge direction if the magnitude is larger than the threshold If the magnitude is smaller than the threshold, the block will be decided as containing no-edge and its counts are discarded and not used in computing histograms. The direction of the edge is shown below m 90 (Vertical) 135 o 45 o m 45 m 0 (Horizontal) The direction of the edge COMP9314 Advanced Database Systems Lecture 6 Slide 19 J Zhang

6.2 Image Retrieval based on Texture EHD extraction The edge magnitude can be calculated (digital filtering) as follows m = a a + a a m = a + a a a 90 0 1 2 3 0 0 1 2 3 m = 2a 2a m = 2a 2a 45 0 3 135 1 2 m = a 2a 2a + 2 non directional 0 1 2 2 a 3 After calculating the edge magnitude for each image block, 5 histogram columns for this sub-image will be calculated COMP9314 Advanced Database Systems Lecture 6 Slide 20 J Zhang

6.3 Image Indexing and Retrieval based on Shape Shape Basic concept on shape The shape of an object or region reflects to its profile and physical structure. A low-level feature shape of objects within the images For retrieval based on shapes, image must be segmented into individual objects Due to the difficulty of robust and accurate image segmentation, the use of shape features for image retrieval has been limited to special applications where objects or regions are readily available COMP9314 Advanced Database Systems Lecture 6 Slide 21 J Zhang

6.3 Image Indexing and Retrieval based on Shape Shape Basic concept on shape A good shape representation and similarity measurement for recognition and retrieval purposes should have the following two important properties: Each shape should have a unique representation, invariant to translation, rotation and scale; Similar shapes should have similar representations so that retrieval can be based on distance among shape representation COMP9314 Advanced Database Systems Lecture 6 Slide 22 J Zhang

6.3 Image Indexing and Retrieval based on Shape Shape Representation Boundary-based methods Chain Codes, fitting line segmentation, Fourier description Region-based methods Moments, orientation Geometry-based methods Perimeter measurement, area attribute Structure-based methods Medial axis transform (MAT) Skeleton and thinning algorithm COMP9314 Advanced Database Systems Lecture 6 Slide 23 J Zhang

6.3 Image Indexing and Retrieval based on Shape Boundary-based methods -- Chain Code Chain codes are used to represent a boundary by a connected sequence of straight-line segments of special length and direction Typically, this representation is based on 4- or 8-connectivity of the segments. The direction of each segment is coded by using a numbering scheme Direction numbers for 4-directional chain code Direction numbers for 8-directional chain code COMP9314 Advanced Database Systems Lecture 6 Slide 24 J Zhang

6.3 Image Indexing and Retrieval based on Shape Boundary-based methods -- Chain Code COMP9314 Advanced Database Systems Lecture 6 Slide 25 J Zhang

6.3 Image Indexing and Retrieval based on Shape Boundary-based methods -- Fourier Descriptors (FDs) A shape is first represented by a feature function called a shape signature. A discrete Fourier Transform (in frequency domain) is applied to the signature to obtain FD of the shape. F n = 1 N 1 N i= 0 f () i exp j2πui N For u=0 to N-1, Where N is the number of samples of f(i). Three commonly used signature: curvature based radius based boundary coordinator based COMP9314 Advanced Database Systems Lecture 6 Slide 26 J Zhang

6.3 Image Indexing and Retrieval based on Shape Boundary-based methods -- Fourier Descriptors (FDs) The Radius-based signature consists of a number of ordered distance from the shape centroid to boundary points (called radii). The radii are defined as ( ) r i = ( x ) 2 ( ) 2 c xi + y y c i ( ) Where x, y are the coordinates of the centroid and x i, y c c i for i=0 to 63 are the coordinates of the 64 sample points along the shape boundary and the number of pixels between each two neighboring points is the same A feature vector which is invariant to start point (p), rotation (r) and scale (s) should be calculated. x = F F 1 0,... F F 63 0 COMP9314 Advanced Database Systems Lecture 6 Slide 27 J Zhang

6.3 Image Indexing and Retrieval based on Shape Boundary-based methods -- Fourier Descriptors (FDs) The distance between shapes is calculated as the Euclidean distance between their feature vectors. Using FDs is to convert the sensitive radius lengths into the frequency domain where the data is more robust to small changes and noise. The FDs capture the general features and form of the shape instead of each individual detail COMP9314 Advanced Database Systems Lecture 6 Slide 28 J Zhang

6.3 Image Indexing and Retrieval based on Shape Region-based shape representation and similarity measure The shape similarity measurements based on shape representations, in general, do not conform to human perception. The following similarity measurements do not match well with human similarity judgment. They are: Algebraic Spline curve distance Cumulative turning angle Sign of curvature and, Hausdorff-distance COMP9314 Advanced Database Systems Lecture 6 Slide 29 J Zhang

6.3 Image Indexing and Retrieval based on Shape Region-based shape representation and similarity measure Basic idea of region-based shape representation As shown in the figure below, if 1 is assigned to the cell with at least 15% of pixels covered by the shape, and a 0 to each of the other cells. The more grids, the more accurate the shape Rep. A binary sequence is created by scanning from left to right and top to bottom 11100000,11111000,01111110,01111111. Generation of binary sequence for a shape COMP9314 Advanced Database Systems Lecture 6 Slide 30 J Zhang

6.3 Image Indexing and Retrieval based on Shape Rotation normalization Rotate the shape so that its major axis is parallel with the x- axis including two possibilities: Two possible orientations with the major axis along the x direction Only one of the binary sequences is saved while two orientations are accounted for during retrieval time by representing the query shape using two binary sequences COMP9314 Advanced Database Systems Lecture 6 Slide 31 J Zhang

6.3 Image Indexing and Retrieval based on Shape Scale normalization All shapes are scaled so that their major axes have the same fixed length. Unique shape representation shape index After rotation and scale normalization and selection of a grid cell size, a unique binary sequence for each shape based on a unique major axis. This binary sequence is used as a index of the shape When the cell size is decided, the number of grid cells in the x direction is fixed (i.e 8), The number of cells in the y direction depends on the eccentricity of the shape. The cell number for Y can range from 1 to 8. COMP9314 Advanced Database Systems Lecture 6 Slide 32 J Zhang

6.3 Image Indexing and Retrieval based on Shape Similarity measure between two shapes based on their indexes Based on the shape eccentricities, there are three cases for similarity measurement Same basic rectangle of two normalized shapes: bitwise compare and distance calculation between the shape point position values, For example: A and B have the same eccentricity of 4 A = 11111111 11100000 and B= 11111111 1111100, then the distance value between A and B is 3 If two normalized shape have very different basic rectangles, we can assume these two shapes are quite different (i.e. different on Minor Axis) COMP9314 Advanced Database Systems Lecture 6 Slide 33 J Zhang

6.3 Image Indexing and Retrieval based on Shape If two normalized shapes have slightly different basic rectangles, the perceptual similarity is still possible. Add the 0s at the end of the index of the shape with shorter minor axis to extend the index to the same length as the other shape Example: A = (2, 11111111 11110000),and B = (3, 11111111 11111000 11100000), then the shape A binary number is extended to the same length of B. Hence A = (3, 11111111 11110000 00000000). The distance of A and B is 4 COMP9314 Advanced Database Systems Lecture 6 Slide 34 J Zhang

6.4 Data Structure for Efficient Multimedia Similarity Search Introduction The retrieval is based on the similarity between the query vector and the feature vector If the feature dimensions high and the number of stored objects are huge, it will be too slow to do the linearly search for all features vectors Techniques and data structures are required to re-organize feature vectors and develop fast search method to locate the relevant features quickly The main idea is to divide the high dimension feature vector space into many sub-space and focus on one or a few subspaces for effective search COMP9314 Advanced Database Systems Lecture 6 Slide 35 J Zhang

6.4 Data Structure for Efficient Multimedia Similarity Search Three common queries: Point query users query is represented as a vector Feature vectors exactly match Range query users query is represented as a feature vector and distance range The distance metrics i.e. L1 and L2 (Euclidean distance) The k nearest neighbours query users query is specified by a vector and a integer k. The k objects whose distances from the query are the smallest are retrieved. COMP9314 Advanced Database Systems Lecture 6 Slide 36 J Zhang

6.4 Data Structure for Efficient Multimedia Similarity Search -- Filtering Process Query methods based on color-histogram Use histograms with very few bins to select potential retrieval candidates Then use the full histograms to calculate the distance For a special case, calculate the average of RGB value T such as x R G, B ) = (, avg avg avg A avg p A( p) p = =1 p where A = {R,G, B} Given the average color vectors and y of two images. 3 The Euclidean distance: 2 davg ( x, y) = ( xi yi ) x i= 1 COMP9314 Advanced Database Systems Lecture 6 Slide 37 J Zhang

6.5 Data Structure for Efficient Multimedia Similarity Search B+ Tree To achieve an efficient way for query process The weakness of traditional similarity calculation on feature vectors within search space is sequential A B+ tree is a hierarchical structure with a number of nodes to store the feature vectors to record 60 to record 20 to record 10 COMP9314 Advanced Database Systems Lecture 6 Slide 38 J Zhang

6.5 Data Structure for Efficient Multimedia Similarity Search B+ Tree Multidimensional B+ Tree Each feature vector has two dimensions. The entire feature space is formed as a large rectangle identified by its lower left and top right corners. Replace each key value with a rectangular region The pointers of leaf nodes point to lists of feature vectors within corresponding rectangular regions. D 1,2 0 D 1,0 0 D 2,1 0 D 0,0 D 0,1 D 1,0 D 1,1 D 1,2 D 2,0 D 2,1 D 3,0 0 to L 0,0 to L 0,1 to L 1,0 to L 1,1 to L 1,2 to L 2,0 to L 2,1 to L 3,0 COMP9314 Advanced Database Systems Lecture 6 Slide 39 J Zhang

6.6 Similarity Comparison Given two feature vectors, I, J, the distance is defined as D(I,J) = f(i,j) Typical similarity metrics Lp (Minkowski distance) Χ2 metric KL (Kullback-Leibler Divergence) JD (Jeffrey Divergence) QF (Quadratic Form) EMD (Earth Mover s Distance) COMP9314 Advanced Database Systems Lecture 6 Slide 40 J Zhang