ALMA MATER STUDIORUM - UNIVERSITÀ DI BOLOGNA Data Modellng and Multmeda Databases M Internatonal Second cycle degree programme (LM) n Dgtal Humantes and Dgtal Knowledge (DHDK) Unversty of Bologna Multmeda Informaton Retreval Part II Home page: http://www-db.ds.unbo.t/courses/dmmmdb/ Electronc verson: 2.02.MultmedaInformatonRetreval-II.pdf Electronc verson: 2.02.MultmedaInformatonRetreval-II-2p.pdf Outlne Descrpton models for MM data retreval Low-level features for MM data content representaton Smlarty measures for MM data content comparson Regon-based mage retreval The Wndsurf system 2
MM data retreval From the prevous lesson we know that features are a smarter way to represent MM data content than ther orgnal format e.g., color and texture for an mage Today we focus on whch are the most sutable models for representng, nterpretng, descrbng and comparng such features E.g., color hstograms for mages by usng the Eucldean dstance as smlarty measure wth the fnal goal to be able to retreve from MM collectons those objects whch are most nterestng for us!! 3 Content-based search Frst approach to search for MM objects reles on standard text-based technques, provded objects come wth a precse textual descrpton of what they represent/descrbe,.e., of ther semantcs However, the annotaton of MM objects s a subjectve, tme consumng, and tedous process (completely manual!!) A more convenent approach, sutable to manage large DBs, s to automatcally extract from MM objects a set of (low-level) relevant numercal features that, at least partally, convey some of the semantcs of the objects Clearly, whch are the best features to extract depend on the specfc medum and on the applcaton at hand (.e., what we are lookng for) Look for cheetahs? Ths s fne; but, how to fnd t? 4
Content-based smlarty search Once we have feature values, we can search objects by usng them Assume a database (DB) wth N MM objects (e.g., mages) and, for each of the N objects, we have extracted the relevant features E.g., we could extract some color nformaton from mages We can now search for objects whose feature values are smlar (n some sense to be defned) to the feature values of our query [SWS+00, LSD+06, LZL+07, DJL+08] In general, ths approach, much alke as t happens n text-retreval, cannot guarantee that all and only relevant results are returned as result of a query Look for cheetahs? Oops! Not really a cheetah ;-) 5 The general scenaro In general, we have a 2-levels scenaro: Needed to support automatc retreval The objects level Ultmately, we want to fnd relevant objects Color Shape Texture The features level We extract features from the objects, and use them for queryng the DB 6
The reference archtecture For the text-based approach, the mage queryng problem can be smply transformed nto a tradtonal nformaton retreval problem as we wll see speakng about MM data annotaton GUI query mage vsualze results feature extracton mage segmentaton (optonal) query processor For content-based nformaton retreval (CBIR) more sophstcated query evaluaton technques are requred mage DB query engne ndex feature DB 7 Varables of the CBIR problem How the set of relevant results s determned depends on whch low-level features are used to characterze the MM data content on the smlarty crteron (dstance functon) used to compare such features on how DB objects are ranked wth respect to the query on whether the user s nterested n the whole MM data query or only n a part of t All these aspects strongly nfluence the query evaluaton process! Smplest case: each MM data object (.e., mage) s characterzed usng global low-level features and the result of a query conssts n the set of DB objects that better match the vsual characterstcs of the target object, accordng to a predefned smlarty crteron, whch s n turn based on such features Ths s also defned Nearest Neghbors (NN) search problem 8
Representng color In a dgtal mage, the color space that encodes the color content of each pxel of the mage s necessarly dscretzed Ths depends on how many bts per pxel (bpp) are used Example: f one represents mages n the RGB space by usng 8 3 = 24 bpp, the number of possble dstnct colors s 2 24 = 16,777,216 Wth 8 bts per channel, we have 256 possble values on each channel Although dscrete, the possble color values are stll too many f one wants to compactly represent the color content of an mage Ths also ams at achevng some robustness n the matchng process (e.g., the two RGB values (123,078,226) and (121,080,230) are almost ndstngushable) In practce, a common approach to represent color s to make use of hstograms 9 Color hstograms A color hstogram h s a D-dmensonal vector, whch s obtaned by quantzng the color space nto D dstnct colors Typcal values of D are 32, 64, 256, 1024, Example: the HSV color space can be quantzed nto D=32 colors: H s dvded nto 8 ntervals, and S nto 4 V = 0 guarantees nvarance to lght ntensty The -th component (also called bn) of h stores the percentage (number) of pxels n the mage whose color s mapped to the -th color Although conceptually smple, color hstograms are wdely used snce they are relatvely nvarant to translaton, rotaton, scale changes and partal occlusons D = 64 10
Further examples Two D=64 color hstograms 11 Comparng color hstograms Snce hstograms are vectors, we can use any Lp-norm to measure the dstance (dssmlarty) of two color hstograms However, dong so we are not takng nto account colors correlaton Dependng on the query and the dataset, we mght therefore obtan low-qualty results Weghted Lp-norms and relevance feedback can partally allevate the problem The problem s that Lp-norms just consder the dfference of correspondng bns,.e., they perform a 1-1 comparson Wth color hstograms, our coordnates are not unrelated ( cross-talk effect) 12
Recall on Lp-norms Gven two D-dmensonal vectors p and q, ther dstance n the reference D-dmensonal space based on Lp-norm s: L p D 1 / p ( p, q ) = ( p q ) = 1 p 1 p < A relevant example s the Eucldean dstance (p=2): L ( D 2 2 p, q ) = ( p q ) = 1 and ts weghed verson: L 1 / 2 ( D 2 2, W p, q, W ) = w ( p q ) = 1 where W(w 1,,w D ) s the vector of weghts that reflect the mportance of each coordnate of the D-dmensonal space 1 / 2 L 2 Weghted L 2 13 Sample queres based on color (1) QueryImage Eucldean dstance 32-D HSV hstograms Weghted Eucldean dstance 14
Sample queres based on color (2) QueryImage Eucldean dstance 32-D HSV hstograms Weghted Eucldean dstance 15 Quadratc dstance Consder two hstograms h and q, both wth D bns Ther quadratc dstance s defned as: L A (h, q;a) D D = = = 1 j= 1 a,j ( h q )( h q ) T ( h q) A ( h q) j j L A where A = {a,j } s called the (color-)smlarty matrx The value of a,j s the smlarty of the -th and the j-th colors (a, = 1) Note that: when A s a dagonal matrx we are back to the weghted Eucldean dstance, when A = I (the dentty matrx) we obtan the L 2 dstance 16
Quadratc dstance vs. Eucldean dstance As a smple example, let D = 3, wth colors red, orange, and blue Consder 3 pure-color mages and the correspondng hstograms: h1=(1,0,0) h2=(0,1,0) h3=(0,0,1) Usng L 2, the dstance between two dfferent mages s always 2 On the other hand, let the color-smlarty matrx be defned as: A 1 0.8 0 0.8 1 0 0 0 1 Now we have L A (h1,h2) = 0.4, whereas L A (h1,h3) = L A (h2,h3) = 2 17 Representng texture (1) Unlke color, texture s not a property of the sngle pxel, rather t s a collectve property of a pxel and ts, sutably defned, neghborhood mosac effect blnds effect Intutvely, texture provdes nformaton about the unformty, granularty and regularty of the mage surface It s usually computed just consderng the gray-scale values of pxels (.e., the V channel n HSV) 18
Representng texture (2) Tamura features correspond to propertes of a texture whch are readly perceved, that s coarseness, contrast and drectonalty (3-D feature vector) Coarseness - coarse vs. fne: t provdes nformaton about the granularty of the pattern Contrast - hgh vs. low contrast: t measures the amount of local changes n brghtness Drectonalty - drectonal vs. non-drectonal: t s a global property of the mage 19 Representng shape Once one has succeeded n extractng an object s contour, the next step s how to represent/encode t A common approach s to navgate the contour, whch leads to an orderng of the pxels n the contour: { (x(t),y(t)) : t = 1,M } A 2nd step s to represent the resultng curve n a parametrc form For nstance, a possblty s to resort to complex values, by settng z(t) = x(t)+ j y(t) Thus, now we have vectors of complex values The problem s that each vector has a dfferent length (.e., M depends on the specfc mage) 20
Representatve ponts The dea s to keep only the D most nterestng ponts Some methods are: Equally-spaced samplng (a) Grd-based samplng (b) Maxmum curvature ponts (c) Fourer-based methods, whch frst compute the DFT of the contour, and then keep only the frst D coeffcents Workng n the frequency doman has several advantages: It can be proved that by properly modfyng Fourer coeffcents one can acheve nvarance to scale, translaton and rotaton Further, by vewng shape as a sgnal, one can adopt dstance measures that have been developed for the comparson of tme seres and that are somewhat nsenstve to sgnals modfcatons (a) (b) (c) 21 Comparng shapes The commonest way to measure the (ds-)smlarty of two shape vectors of equal length D s based on Eucldean dstance (L 2 ) However, wth Eucldean dstance we have to face a basc problem Senstvty to algnment of values Intutvely, we would need a dstance measure that s able to match a pont of tme seres s even wth surroundng ponts of tme seres q Alternatvely, we may vew the tme axs as a stretchable one A dstance lke ths exsts, and s called Dynamc Tme Warpng (DTW) 22
Sample queres based on shape [BCP02] QueryImage R = relevant (same type of fsh) 1100 objects contours 23 Ths s not the whole story of course, many other features models (and correspondent dstance functons) have been defned for MM data Ths was just a way to provde some concrete examples of features and modaltes to comparng them! Note that, besdes generc features, any specfc mage doman/applcaton needs to extract and manage specfc features, whch n general requre much more sophstcated tools than the one we have seen E.g., face/fngerprnts recognton Nonetheless, what s mportant to stress s that the problem of how to search n large mage DB s remans (almost) the same! Let s go now nto the detals of what happens and how thngs can become complex n a real mage retreval system 24
The regon-based mage retreval approach DB populaton tme: Preprocess mages to segment them nto regons Represent regons as vectors of features Query tme: Compare query regons to DB regons Assess smlarty between mages by combnng smlarty between regons DB populaton Segmentaton Queryng Segmentaton GUI Feature Extracton Feature Extracton Features Image Features Vsualze results Query processor Image DB Feature DB Query engne Image DB Feature DB 25 http://www-db.ds.unbo.t/bartoln/publcatons.html Wndsurf case study [ABP99, BCP00, BP00, BC03, Bar09a, BCP+09, BCP10] Wndsurf: Wavelet-Based Indexng of Images Usng Regons Fragmentaton Dscrete Wavelet Transform (DWT): extracts a set of features representng the mage n the color-texture space Clusterng: fragments the mage nto a set of regons usng wavelet coeffcents Smlarty Features: used to compare regons DWT Clusterng Smlarty Features 26
Dscrete Wavelet Transform (DWT) Haar wavelet: smple and quck Each coeffcent s defned by: level DWT (l) frequency sub-band (B) color channels (H, S, V) Image Avg Dff l; B w j = ( l ; B l ; B l ; B ) w0, w1, w { LL, LH, HL HH } B, 2 Avg Dff Avg Avg Avg Dff Dff Avg Dff Dff 27 DWT: practcal example 28
Clusterng (1) K-means algorthm (3rd level and low frequence nfo) Choose k ntal centrods; Assocate each pont to ts nearest centrod; Recompute centrods and repeat prevous step; Stop when soluton does not change. Mahalanobs dstance: δ 3, LL 3, LL 2 3, LL 3, LL T 3; LL 1 3, LL 3, LL ( w, w ) = ( w w ).( C ) ( w w ) j j Correlaton between wavelet coeffcents takes nto account varatons n color,.e. texture j 29 Clusterng (2) Optmal value for k? Mnmzaton of a valdty functon Intra-cluster dstance Clusters sze Inter-cluster dstance Input mage Clusters for k=2 Clusters for k=10 Clusters for k=4 I. Bartoln (Optmal soluton) 30
Smlarty features Regon smlarty wth Bhattacharyya dstance Regons are ellpsods n 37-D feature space (all frequences nfo s used) (3-D centrod + 6-D covarance matrx + 1-D regon sze) Dstance between regons centrods (color nfo) Covarance matrces (texture nfo) 3; B 3; B C R + C R j 3; B 3; B 1 2 C + C d B j R R R µ 1 1 j R 2 3; B 3; B 8 2 2 C 2 R C R j 2 1 1 ( ) B B T R R j = + ( ) B B R, R ln µ µ ( µ ) j 31 Image smlarty Smlarty between mages s a functon of smlartes among matched regons How regons are "matched" can therefore strongly nfluence the result of a query Example: one-to-one match (formulated as Assgnment Problem) 32
Assgnment problem Goal: Fnd the optmal match where unt elements of fxed sze are matched ndvdually mage smlarty regon matchng Implemented wth the Hungaran algorthm, maxmzng a functon that s monotonc n the smlarty scores (e.g. average) r 1 r 2 r 3 r 4 r 5 q 1.52.17.41.16.29 q 2.27.19.81.35.49 q 3 1.0.11.27.24.29 r 1 r 2 r 3 r 4 r 5 q 1.52.17.41.16.29 q 2.27.19.81.35.49 q 3 1.0.11.27.24.29 I. Bartoln (.52+.81+1.0)/3=.77 (.29+.81+1.0)/3=.7 33 Sample query 34
Effectveness comparson example Flowers query Wnsdurf clusters Blobworld [CTB+99] clusters Wndsurf I. Bartoln Blobworld Data Modellng and Multmeda Databases 35 Wndsurf n a specalzed context: handwrtngs To provde an example of the generalty of the Wndsurf framework, here we show a pc of the WrtngSmlartySearch system bult on top of the Wndsurf system Let s nstantate the key ponts of the Wndsurf wthn the new context: Regons correspond to local features (.e., key ponts of SURF) Regon dstance functon s the Eucldean dstance The matchng problem s solved by means of an approxmaton of the 1-1 matchng best bn frst match I. Bartoln Data Modellng and Multmeda Databases 36
WrtngSmlartySearch: an example query Top-k results (k=9) I. Bartoln 37 37 Free exercse 1.D Let s complete our Exercse 1, n ts last part D Startng from the defntons of the low-level features you selected for descrbng the content of unstructured data nvolved n your MM applcatons, provde a concrete representaton/comparson modalty of them wth vsual examples Among features possbltes: global features vs. local features (regon-based approach) E.g., global color dstrbuton for an mage (defnton) vs. color hstograms by usng the weghted Eucldean dstance as smlarty measure (representaton/comparson modalty) In dong the exercse, let s keep n mnd the fnal goal: retreve relevant MM content! 38