MPEG-7 Framework. Compression Coding MPEG-1,-2,-4. Management Filtering. Transmission Retrieval Streaming. Acquisition Authoring Editing

Size: px

Start display at page:

Download "MPEG-7 Framework. Compression Coding MPEG-1,-2,-4. Management Filtering. Transmission Retrieval Streaming. Acquisition Authoring Editing"

Samuel Wright
6 years ago
Views:

1 MPEG-7

2 Motivation MPEG-7 formally named Multimedia Content Description Interface, is a standard for describing the multimedia content data that supports some degree of interpretation of the information s meaning, which can be passed onto, or accessed by, a device or a computer code. The goal of the MPEG-7 standard is to allow interoperable searching, indexing, filtering and access of audio-visual (AV) content by enabling interoperability among devices and applications that deal with AV content description MPEG-7 is set of standardized tools for describing multimedia content at different abstraction levels MPEG-7 lacks of explicit semantics: ambiguities resulting from flexibility in structuring the descriptions. Leaves room for different interpretations

moving regions, shots, frames, Audio-visual features: color, texture, shape, Semantics: people, events, objects,

3 MPEG-7 Framework Compression Coding MPEG-1,-2,-4 Transmission Retrieval Streaming Management Filtering Acquisition Authoring Editing Searching Indexing MPEG-7 Browsing Navigation Rich multimedia content description: Video segments, moving regions, shots, frames, Audio-visual features: color, texture, shape, Semantics: people, events, objects, scenes, R e fe r e n c e R e g io n R e fe r e n c e R e g io n R e fe r e n c e R e g io n M o tio n M o tio n M o tio n

4 4 Information flow

5 Scope of the Standard Description Production (extraction) Standard Description Description Consumption Normative part of MPEG-7 standard MPEG-7 describes specific features of AV content as well as information related to AV content management for a diversity of Applications: Multimedia, Music/Audio, Graphics, Video. Specifically it provides four types of normative elements: Descriptors, Description Schemes (DSs), Description Definition Language (DDL), coding schemes. MPEG-7 does not specify how to extract descriptions how to use descriptions what similarity between contents

6 Parts of the MPEG7 standard MPEG7 is composed of: MPEG-7 Visual the Description Tools dealing with Visual descriptions. MPEG-7 Audio the Description Tools dealing with Audio descriptions MPEG-7 Multimedia Description Schemes - the Description Tools dealing with generic features and multimedia descriptions. MPEG-7 Description Definition Language defining the syntax of the MPEG-7 Description Tools and for defining new Description Schemes. MPEG-7 descriptions take two possible forms: a textual XML form suitable for editing, searching, and filtering a binary form suitable for storage, transmission, and streaming delivery

7 Performance evaluation NG(q) number of ground truth images for a query q NR(q), number of found items in first K(q) retrievals, where: K(q)=min(4*NG(q), 2*GTM) GTM is max{ng(q)} for all q s of a data set. MR(q)= NG(q)-NR(q), number of missed items Compute from the ranks Rank(k) of the found items counting the rank of the first retrieved item as 1. A Rank of (1.25K(q)) is assigned to each of the ground truth images which are not in the first K(q) retrievals. Compute the normalized modified retrieval rank NMRR(q) (always in the range of [0.0,1.0])

8 Average Retrieval Rate (AVR) and ANMRR Compute AVR(q) for query q as follows: AVR ( q ) NG k ( q ) 1 Rank NG ( k ) ( q ) Compute the modified retrieval rank as follows: M R R ( q ) = A V R ( q ) - 0.5(1 + N G ( q )) Normalized MRR, NMRR = MRR(q)/Norm(q) where Norm(q)=1.25*K *NG(q) ANMRR 1 Q q Q 1 NMRR ( q )

9 Application Areas of MPEG-7 Broadcast media selection (e.g., radio channel, TV channel) Cultural services (history museums, art galleries, etc.). Digital libraries (e.g., image catalogue, musical dictionary, film, video and radio archives). E-Commerce (e.g., personalised advertising, on-line catalogues). Education (e.g., repositories of multimedia courses, multimedia search for material). Multimedia directory services (e.g. yellow pages, Tourist information, Geographical information systems). Remote sensing (e.g., cartography, natural resources management). Surveillance and investigation services (e.g., humans recognition, forensics, traffic control, surface transportation). MPEG-7 will also make the web as searchable for multimedia content as it is searchable for text today. This would apply especially to large content archives, which are being made accessible to the public, as well as to multimedia catalogues enabling people to identify content for purchase.

10 MPEG-7 Visual Descriptors Color Descriptors Texture Descriptors Shape Descriptors Motion Descriptors for Video

Color Descriptors Color Descriptors Dominant Color Scalable Color HSV space Color Structure HMMD space Color Layout YCbCr space Group Of Frames / Pictures histogram Constrained color spaces: -

11 Color Descriptors Color Descriptors Dominant Color Scalable Color HSV space Color Structure HMMD space Color Layout YCbCr space Group Of Frames / Pictures histogram Constrained color spaces: - Scalable Color Descriptor uses HSV - Color Structure Descriptor uses HMMD Color Space: - R, G, B - Y, Cr, Cb - H, S, V - Monochrome - Linear transformation of R, G, B - HMMD (hue-min-max-diff) 11

12 Scalable Color Descriptor (SCD) Scalable Color Descriptor is in the form of a color histogram in the HSV color space encoded using a Haar transform. The binary representation is scalable in terms of number of bins used (from 16 to 256) and the number of bits per bin. In the case of 255 bins, SCD uniformly quantizes the H component into 16 bins, and the S and V components of the pixel into 4 bins each. After all the pixels are processed, the histogram is calculated with the probability for each bin, truncated into an 11-bit value. These values are then non-uniformly quantized into 4-bit values according to the table provided in the ISO specification 13 for more efficient encoding, giving higher significance to small values The Haar Transform is applied across the histogram bins to the 4 bit values

13 Scalable Color Descriptor Extraction As SCD is encoded by a Haar transform, its binary representation is scalable in terms of bin numbers. Representations can be stored in different resolutions, ranging from 256 down to 16 coefficients per histogram and bit representation accuracy over a broad range of data rates No. coeff # bins: H # bins: S #bins: V to 11bits/bin to 4bits/bin Nbits/bin (#bin<256)

14 Performance evaluation Matching between SCD realizations can be performed by using the Haar coefficients or histogram bin values employing an L1 norm. More accurate values are expected by reconstructing the histogram values 0,6 A N M R R 0,5 0,4 0,3 0,2 0, H - R e c N u m b e r o f b its Results with different numbers of Haar coefficients (16-256) quantized at different numbers of bits. H-Rec signifies retrieval results after reconstruction of histogram from Haar coefficients at full bit resolution.

15 RECALL Haar transform

16 Discrete Wavelet Transform In numerical analysis and functional analysis, the Discrete Wavelet Transform (DWT) refers to wavelet transforms for which the wavelets are discretely sampled. The first DWT was invented by the Hungarian mathematician Alfréd Haar. For an input represented by a list of 2n numbers, the Haar wavelet transform may be considered to simply pair up input values, storing the difference and passing the sum. This process is repeated recursively, pairing up the sums to provide the next scale: finally resulting in 2n 1 differences and one final sum. The discrete wavelet transform has nice properties: It can be performed in O(n) operations; It captures not only some notion of the frequency content of the input, by examining it at different scales, but also captures the temporal content, i.e. the times at which these frequencies occur. Combined, these two properties make an alternative to the conventional Fast Fourier Transform.

17 The Haar wavelet can be described as a step function: F(x) 2x2 matrix H <= x < ½ 1-1 ½ < x < =1 0 otherwise 1-1 Given a sequence (a 0, a 1, a 2,a 3 a 2n+1 ) of even lenght this can be transformed into a sequence of twocomponent vectors ((a 0,a 1 ), (a 2n,a 2n+1 )). If one multiplies each vector with the matrix H one gets the result ((s 0,d 0 )..(s n,d n ) of one stage of the Haar wavelet transform (sum, difference). The two sequences s and d are separated and the process is repeated with the sequence s (s 0, s 1, s 2, s 3 s 2n+1 )

18 In the one dimensional case, this is the same as the signal is broken into subbands by passing it through a low pass filter and a high pass filter, and both subbands are downsampled by 2. The DWT of a signal x is calculated by passing it through a series of filters. First the samples are passed through a low pass filter with impulse response g resulting in a convolution of the two: The same procedure can then be applied to the low frequency subband using a high-pass filter h, and repeated for as many levels of decomposition as desired. If the filters used satisfy certain properties, the original signal can be reconstructed by reversing the procedure. It is important that the two filters are related to each other and they are known as a quadrature mirror filter.

19 The outputs give: the detail coefficients (from the high-pass filter) the approximation coefficients (from the low-pass). At each level, since half the frequencies of the signal are removed, half of the samples can be discarded according to Nyquist s rule. The filter outputs are therefore downsampled by 2. Therefore the time resolution is halved (only half of each filter output characterises the signal). Since each output has half the frequency band of the input, the frequency resolution is doubled. high h With the downsampling operator the above summation can be written more concisely: Due to the decomposition process the input signal must be a multiple of 2n where n is the number of levels.

delay, 2: down-sample by 2 Recursive application of wavelet transform in

20 1D Discrete Wavelet Transform x(n) z 1 H0 H1 2 2 z 1 H0 H1 2 2 z 1 H0 H1 2 2 y0 y1 y2 y3 HO: low pass digital filter H1: high pass digital filter Z-1: delay, 2: down-sample by 2 Recursive application of wavelet transform in spatial domain corresponds to dyadic partition of data in the frequency domain. y0 y1 y2 y3

21 End RECALL

22 GoF/GoP Color Descriptor Extends Scalable Color Descriptor for a video segment or a group of pictures (joint color histogram is then possessed as SCD - Haar transform encoding) Histograms Aggregation methods: Average: sensitivity to outliers (lighting changes occlusion, text overlays) Median: increased computational complexity for sorting Intersection: a least common color trait viewpoint Applications: Browsing a large collection of images to find similar images Use histogram Intersection as a color similarity measure for clustering a collection of images Represent each cluster by GoP descriptor 22

23 Dominant Color Descriptor (DCD) DCD assumes that a given image is described in terms of a set of region labels and the associated color descriptors: Each pixel has a unique region label. Each region is characterized by a variable bin color histogram Colors in a given region are clustered into a small number of representative colors. The descriptor consists of the representative colors, their percentages in a region, spatial coherency of the color, and color variance: { { i i i } } F = c, p, v, s, ( i = 1, 2,, N ) c i : the i-th representative color p i : its percentages in the region v i : its color variance s : its spatial coherency

24 Similarity Distance Measure Typically when using DCD similarity is evaluated simply comparing the corresponding dominant color percentages and dominant color distances N 1 N 2 N 1 N 2 D 2 (F 1,F 2 ) 2 p 1i 2 p 2 j 2 a 1i, 2 j p 1i p 2 j i 1 j 1 i 1 j 1 a k,l : similarity coefficient between two colors c k and c l a k, l 1 d k, l / d max d k, l T d 0 d k, l T d d k, l c k c l d k,l : Euclidean distance between two color c k and c l T d : maximum distance for two colors to be considered similar, d k, l c k c l d max = T d, values , T d values Equivalent to the quadratic color histogram distance measure D 2 ( F, F ) ( F F ) A ( F F ) T

Dominant Color Descriptor enhancements DCD variance is computed as the variance of each of the dominant color DCD spatial coherency is computed as a single value by the weighted sum of

25 Dominant Color Descriptor enhancements DCD variance is computed as the variance of each of the dominant color DCD spatial coherency is computed as a single value by the weighted sum of per-dominant-color spatial coherencies. The weight is proportional to the number of pixels corresponding to each dominant color. Spatial coherency per dominant color captures how coherent the pixels corresponding to the dominant color are and whether they appear to be a solid color in the given image region. It gives an idea of the spatial homogeneity of the dominant colors of a region Spatial coherency per dominant color is computed by the normalized average connectivity (8-connectedness) for the corresponding dominant color pixels.

26 DCD is suitable for local (object or region) features, when a small number of colors is enough to characterize the color information. Before feature extraction, images must be segmented into regions. maximum of 8 dominant colors can be used to represent the region (3 bits). Percentage values are quantized to 5 bits each. Variance: 3 bits/dominant color. Spatial coherence: 5 bits. The color quantization depends on the color space specifications defined for the entire database and need not be specified with each descriptor. Experiments with 6 bits/color index.

27 Dominant color representation is sufficiently accurate and compact compared to the traditional color histogram color bins quantized from each image region instead of fixed 3 bins on average instead of 256 or more It supports efficient database indexing and search No high-dimensional indexing complexity of the searching depends only on the desired degree of the similarity of the matching, not directly on the database size insertion and deletion of database entries do not cause index structure rebuilding retrieval results accurate and fast compared to the traditional color histogram. Not effective for smooth regions

28 Color Structure Descriptor (CSD) Similar to a histogram, the Color Structure Descriptor represents an image by both the color distribution and the local structure. Scalable Color Descriptor may not distinguish both images. But the CSD can do it. CSD is obtained by scanning the image by an 8x8 structure element in a sliding window approach: with each shift of the structuring element, the number of times a particular color is contained in the structure element is counted, and a color histogram is constructed. The HMMD color space is used

29 CSD is characterized by a color structure histogram for M quantized color c m, where M is {256, 128, 64, 32}. The bin value h(m) is the number of structuring elements containing one or more pixels with color c m. Let be I the set of quantized color index of an image and S be the set of quantized color index existing inside the subimage region covered by the structuring element. With the structuring element scanning the image, the color histogram bins are accumulated. Thus, the final value of h(m) is determined by the number of positions at which the structuring element contains color c m. C O L O R C0 B I N C1 +1 C2 C3 +1 C4 C5 C6 C x 8 s tr u c tu r in g e le m e n t The HMMD color space should be used with CSD descriptor.

30 RECALL HSV and HMMD color spaces

31 HSV Color Space The HMMD color space regards the colors adjacent to a given color in the color space as the neighboring colors. It is closely related to HSV. Cyan (180 o ) Green (120 o ) Yellow (60 o ) Red (0 o ) Blue (240 o ) Magenta (300 o ) Value White Hue Black Saturation

32 RGB to HSV HSV values can be derived by the RGB values: Max = max(r, G, B); Min = min( R, G, B); Value = max(r, G, B); if( Max = 0 ) then Saturation = 0; else Saturation = (Max-Min)/Max; if( Max = Min ) then Hue is undefined (achromatic color); otherwise: if( Max = R && G > B ) Hue = 60*(G-B)/(Max-Min) else if( Max = R && G < B ) Hue = *(G-B)/(Max-Min) else if( G = Max ) Hue = 60*(2.0 + (B-R)/(Max-Min)) else Hue = 60*(4.0 + (R-G)/(Max-Min))

HMMD Color space In HMMD the Hue is the same as in the HSV space (0-360 ), Max and Min are the maximum and minimum among the R, G, and B values (how much black and how much white are present

33 HMMD Color space In HMMD the Hue is the same as in the HSV space (0-360 ), Max and Min are the maximum and minimum among the R, G, and B values (how much black and how much white are present respectively), Diff component is the difference between Max and Min (how much a color is close to pure color) Sum = (Max + Min) / 2 can also be defined. Max [0,1) is obtained with the same RGB transform as V in HSV but with a different subspace Diff [0,1) same as S in HSV but with a different subspace Sum refers to brightness Only three of the four components are sufficient to describe the HMMD space (H, Max, Min) or (H, Diff, Sum). HMMD color space can be depicted using the double cone structure In the MPEG-7 core experiments for image retrieval, the HMMD color space is very effective and compared favorably with the HSV color space. Note that the HMMD color space is a slight twist on the HSI color space, where the diff component is scaled by the intensity value

34 End RECALL

35 HMMD subspace quantization Subspace 0 Subspace 1 Four nonuniform quantizations are defined that partition the space into 256, 128, 64, 32 cells Subspace 2 Subspace 3 Each 3D quantization is defined via five subspaces. The Diff axis is defined in 5 subintervals [0,6), [6,20), [20, 60), [60,110), [110, 255). Each subspace has sum and hue allowed to take all values in their ranges. They are partitioned into uniform intervals according to a table. Subspace 4 Example: 128-bins (Cells) of the HMMD color space Hue Sum white black HMMD can accomplish a color quantization close to the change of the color sensed by the human eye, thereby capable of enhancing a performance of the image searching system based on content

36 Matching is performed by computing L1 distance measure between CSDs:. dist( A, B ) h A ( i ) h B ( i ) i CSD provides more accurate similarity retrieval because of the inclusion of spatial color information. This representation is more closely related to the human perception and, thus, is more useful for indexing and retrieval. Structure histogram describes color feature very well and can give very high retrieval accuracy. Although the color structure histogram contributes to the high retrieval accuracy of CSD wrt DCD, the fixed color space requirement of the histogram results in redundancy in the representation. For example: a DCD with 8 colors need bytes in binary representation, a DCD with 4 colors only need bytes. The most compact CSD 32 uses 32 bytes per descriptor, which is about times of DCD.

37 A N M R R CSD: Experimental results desc rip to r bit- len gth

38 Color Layout Descriptor (CLD) CLD is very Compact Descriptor (63 bit) per image based on: Grid-based Dominant Color in the YCbCr-Color Space (the dominant color may also be the average color) DCT transformation on a 2D-array of Dominant Colors Final Quantization Step to 63 bits F ={CoefPattern, Y-DCcoef, Cb-DCcoef, Cr-DCcoef, Y-ACcoef, Cb-ACcoef, Cr-ACcoef} Y = 0.299*R *G *B Cb = *R *G *B Cr = 0.500*R *G *B

Color Layout Descriptor extraction The image is clustered into 64 (8x8) blocks A single representative color is selected from each block (the average of the pixel colors in a block suggested as the

39 Color Layout Descriptor extraction The image is clustered into 64 (8x8) blocks A single representative color is selected from each block (the average of the pixel colors in a block suggested as the representative color). The selection results in an image of size 8x8 Derived average colors are transformed into a series of coefficients by performing DCT A few low-frequency coefficients are selected using zigzag scanning and quantized to form a CLD (large quantization step in quantizing AC coeff / small quantization step in quantizing DC coff). If the time domain data is smooth (with little variation in data) then frequency domain data will make low frequency data larger and high frequency data smaller.

40 Color Layout Descriptor Matching CLD is efficient for: Sketch-based image retrieval Content Filtering using image indexing The distance of two CLDs CL and CL with 12 coefficients (6 for Y, 3 for Cb and Cr: CL{Y0,..., Y5, Cr0, Cr1, Cr2, Cb0, Cb1, Cb2} ) is defined as follows :

41 RECALL Discrete Cosine Transformation

DCT (Discrete Cosine Transformation) DCT applies to 8x8 image blocks For each block, DCT allows to shift from spatial domain to frequency domain: f(i,j) is the value that is present in

42 DCT (Discrete Cosine Transformation) DCT applies to 8x8 image blocks For each block, DCT allows to shift from spatial domain to frequency domain: f(i,j) is the value that is present in the (i,j) position of the 8x8 block of the original image F(u,v) is the DCT coefficient of the 8x8 block in the (u,v) position of the 8x8 matrix that encodes the transformed coefficients

43 The 64 (8 x 8) DCT Discrete Cosine Transform basis functions: F[0,0]

44 End RECALL

45 What applications Dominant Color(s) descriptor is most suitable for representing local (object or image region) features where a small number of colors are enough to characterize the color information. A spatial coherency on the entire descriptor is also defined, and used in similarity retrieval. Scalable Color descriptor is useful for image-to-image matching and retrieval based on color feature. Retrieval accuracy increases with the number of bits used in the representation. Color Layout descriptor allows image-to-image matching at very small computational costs and ultra high-speed sequence-to-sequence matching also at different resolutions. It is feasible to apply to mobile terminal applications where the available resources is strictly limited. Users can easily introduce perceptual sensitivity of human vision system for similarity calculation. Color structure descriptor is suited to image-to-image matching and its intended use is for stillnatural image retrieval, where an image may consist of either a single rectangular frame or arbitrarily shaped, possibly disconnected, regions.

46 Texture Descriptors Homogenous Texture Descriptor Non-Homogenous Texture Descriptor (Edge Histogram)

frequency coefficients F = {f DC, f SD, e 1,, e 30, d 1,, d 30 } With HTD one can perform: Rotation invariance matching

47 Homogenous Texture Descriptor (HTD) Procedure for HTD extraction: Partitioning the frequency domain into 30 channels (modeled by a 2D-Gabor function) Computing the energy and energy deviation for each channel Computing mean and standard variation of frequency coefficients F = {f DC, f SD, e 1,, e 30, d 1,, d 30 } With HTD one can perform: Rotation invariance matching Intensity invariance matching (f DC removed from the feature vector) Scale-Invariant matching F = {f DC, f SD, e 1,, e 30, d 1,, d 30 } 47

48 RECALL GABOR function, energy function

49 1D-Gabor Function Gabor filter is used to determine the sinusoidal frequency and phase content of local sections of a signal as it changes over time. The function to be transformed is first multiplied by a Gaussian function, which can be regarded as a window, and the resulting function is then transformed with a Fourier transform to derive the time-frequency analysis. The window function means that the signal near the time being analyzed will have higher weight. The Gabor transform of a signal x(t) is defined by this formula: Through time frequency analysis by applying the Gabor transform, the available bandwidth can be known and those frequency bands can be used for other applications and bandwidth is saved.

50 2D-Gabor Function It is a Gaussian weighted sinusoid It is used to model individual channels: each channel filters a specific type of texture

51 Energy Function Energy in i channel is defined as e i log 10 [1 p i ] where: p i [ G P s, r (, ) P (, )] 2 being P(ω,θ) the Fourier transform of an image represented in the polar frequency domain and G a Gaussian function:. G P s, r exp 2 2 s 2 s exp 2 2 r 2 r

52 End RECALL

53 An efficient HTD implementation: Radon transform followed by Fourier transform 2-D image f(x,y) Radon transform 1D P(R, θ) 1D F(P (R, θ)) Resulted sampling grid in polar coords

54 RECALL RADON transform

55 Radon Transform Transforms images with lines into a domain of possible line parameters. Each line will be transformed to a peak point in the resulted image

56 Definition: let ƒ(x) = ƒ(x,y) be a continuous function vanishing outside some large disc in the Euclidean plane R 2. The Radon transform, Rƒ, is a function defined on the space of straight lines L in R 2 by the line integral along each such line: In practice, any straight line L can be parametrized by where s is the distance of L from the origin and α is the angle the normal vector to L makes with the x axis. The quantities (α,s) can be considered as coordinates on the space of all lines in R 2, and the Radon transform can be expressed in these coordinates by The Hough transform, when written in a continuous form, is very similar, if not equivalent, to the Radon transform.

57 End RECALL

orientation selective band-pass filters Coarseness (grain to coarse) Directionality (/30 0 ) The texture browsing

58 Texture Browsing Descriptor Same sp. filtering procedure as the HTD.. E.g look for textures that are very regular and oriented at 30 0 Regularity (periodic to random) Scale and orientation selective band-pass filters Coarseness (grain to coarse) Directionality (/30 0 ) The texture browsing descriptor can be used to find a set of candidates with similar perceptual properties and then use the HTD to get a precise similarity match list among the candidate images.

59 Non-Homogenous Texture Descriptor Edge Histogram Descriptor (EHD) Represents the spatial distribution of five types of edges: vertical, horizontal, 45, 135, and non-directional Dividing the image into 16 (4x4) blocks Generating a 5-bin histogram for each block It is scale invariant F = {BinCounts[k]},k=80 Cannot be used for object-based image retrieval Th edge if set to 0 ETD applies for binary edge images (sketch-based retrieval) Extended HTD achieves better results but does not exhibits rotation invariant property Retain strong edges by thresholding Canny edge operator

60 EHD extraction Basic (80 bins) Extended (150 bins). basic Semi-global global +13 clusters for semi-global Egde map image using Canny edge operator 67

61 What applications Homogenous Texture descriptor is for searching and browsing through large collections of similar looking patterns. An image can be considered as a mosaic of homogeneous textures so that these texture features associated with the regions can be used to index the image data. Texture Browsing descriptor is useful for representing homogeneous texture for browsing type applications, and requires only 12 bits (maximum). It provides a perceptual characterization of texture, similar to a human characterization, in terms of regularity, coarseness and directionality. Edge Histogram descriptor, in that edges play an important role for image perception, can retrieve images with similar semantic meaning. It targets image-to-image matching (by example or by sketch), especially for natural images with non-uniform edge distribution. The image retrieval performance can be significantly improved if the edge histogram descriptor is combined with other descriptors such as the color histogram descriptor.

62 Shape Descriptors Region-based Descriptor Contour-based Shape Descriptor 2D/3D Shape Descriptor 3D Shape Descriptor

63 A shape is the outline or characteristic surface configuration of a thing: a contour; a form. A shape cannot be described through text. Shape representation and matching is one of the major and oldest research topics of pattern Recognition and Computer Vision. Property of invariance of the representation - such that shape representations are left unaltered, under a set of transformations - plays a very important role in order to recognize the same object even in its translated /rotated/ scaled/ shrinked.. view.

64 Region-based Descriptor (RBD) Expresses pixel distribution within a 2-D object region. Employs a complex 2D-Angular Radial Transformation (ART) F nm V nm 0 2 1,, f, V,, f, d d 0 nm A m 1 2 exp jm m = 0,..12 R n 1 2 cos n n n 0 0 F ={MagnitudeOfART[k]},k=nxm n = 0,..3

65 ART is a 2-D complex transform defined on a unit disk in polar coordinates, F nm V nm 2 1,, f, V,, f, d d 0 0 nm f (, ) is an image function in polar coordinates, and V nm (, ) is the ART basis function. The ART basis functions are separable along the angular and radial directions, i.e., V, nm A m R n The angular and radial basis functions are defined as follows: A m 2 1 exp jm R n 1 2 cos n n n 0 0

Advantages: Describes complex shapes with disconnected

66 RBD Applicability Applicable to figures (a) (e) Distinguishes (i) from (g) and (h); (j), (k), and (l) are similar Advantages: Describes complex shapes with disconnected regions Robust to segmentation noise Small size Fast extraction and matching

Contour-Based Descriptor (CBD) Contour-Based Descriptor is based on Curvature Scale-Space representation: Finds curvature zero crossing points of the shape s contour

67 Contour-Based Descriptor (CBD) Contour-Based Descriptor is based on Curvature Scale-Space representation: Finds curvature zero crossing points of the shape s contour (keypoints) Reduces the number of keypoints step by step, by applying Gaussian smoothing The position of key points are expressed relative to the length of the contour curve

68 CBD Applicability Applicable to (a) Distinguishes differences in (b) Find similarities in (c) - (e) Advantages: Captures the shape very well Robust to the noise, scale, and orientation It is fast and compact 87

69 How the Contour is calculated? N equidistant points are selected on the contour, starting from an arbitrary point on the contour and following the contour clockwise. The x-coordinates of the selected N points are grouped together and the y-coordinates are also grouped together into two series X, Y. The contour is then gradually smoothed by repetitive application of a low-pass filter with the kernel (0.25,0.5,0.25) to X and Y coordinates of the selected N contour points

70 GlobalCurvatureVector This element specifies global parameters of the contour, namely the Eccentricity and Circularity circularit y perimeter area 2 FOR A CIRCLE CIRCULARITY IS C circle ( 2 r ) r eccentrici ty i i i i i i i i i 2 i i i i 4 i i 2 02 ( y y c ) i ( x )( y 11 c c i x 2 20 ( x x c ) y )

71 Comparison (RB/CB descriptors) Blue: Similar shapes by Region-Based Yellow: Similar shapes by Contour-Based 95

72 What applications Region Shape descriptor makes use of all pixels constituting the shape within a frame and can describe any shapes. It is also characterized by small size, fast extraction time and matching. The data size for this representation is fixed to 17.5 bytes. The feature extraction and matching processes have low order of computational complexities, and are suitable for tracking shapes in the video data processing. Contour Shape descriptor captures perceptually meaningful features of the shape enabling similarity-based retrieval. It is robust to non-rigid motion. It is robust to partial occlusion of the shape. It is robust to perspective transformations, which result from the changes of the camera parameters and are common in images and video

LECTURE 4: FEATURE EXTRACTION DR. OUIEM BCHIR

LECTURE 4: FEATURE EXTRACTION DR. OUIEM BCHIR RGB COLOR HISTOGRAM HSV COLOR MOMENTS hsv_image = rgb2hsv(rgb_image) converts the RGB image to the equivalent HSV image. RGB is an m-by-n-by-3 image array