Photo management applications

Similar documents
A Binarization Algorithm specialized on Document Images and Photos

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

Machine Learning: Algorithms and Applications

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

A Clustering Algorithm for Key Frame Extraction Based on Density Peak

An efficient method to build panoramic image mosaics

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Support Vector Machines

PRÉSENTATIONS DE PROJETS

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

Optimizing Document Scoring for Query Retrieval

Mathematics 256 a course in differential equations for engineering students

CMPS 10 Introduction to Computer Science Lecture Notes

Lecture 13: High-dimensional Images

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

TN348: Openlab Module - Colocalization

Hermite Splines in Lie Groups as Products of Geodesics

Lecture 5: Multilayer Perceptrons

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Image Representation & Visualization Basic Imaging Algorithms Shape Representation and Analysis. outline

User Authentication Based On Behavioral Mouse Dynamics Biometrics

Searching Large Image Databases using Color Information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Six-Band HDTV Camera System for Color Reproduction Based on Spectral Information

Cluster Analysis of Electrical Behavior

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

Parallelism for Nested Loops with Non-uniform and Flow Dependences

K-means and Hierarchical Clustering

Object-Based Techniques for Image Retrieval

Problem Set 3 Solutions

Wishing you all a Total Quality New Year!

CS 534: Computer Vision Model Fitting

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

Optimal Workload-based Weighted Wavelet Synopses

S1 Note. Basis functions.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Background Removal in Image indexing and Retrieval

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Fuzzy C-Means Initialized by Fixed Threshold Clustering for Improving Image Retrieval

Edge Detection in Noisy Images Using the Support Vector Machines

Real-time Motion Capture System Using One Video Camera Based on Color and Edge Distribution

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Computer Animation and Visualisation. Lecture 4. Rigging / Skinning

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

Shape Representation Robust to the Sketching Order Using Distance Map and Direction Histogram

Query Clustering Using a Hybrid Query Similarity Measure

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Module Management Tool in Software Development Organizations

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

USING GRAPHING SKILLS

Local Quaternary Patterns and Feature Local Quaternary Patterns

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

Brave New World Pseudocode Reference

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007

Lecture #15 Lecture Notes

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

IP Camera Configuration Software Instruction Manual

Unsupervised Learning and Clustering

Intro. Iterators. 1. Access

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Parallel matrix-vector multiplication

An Optimal Algorithm for Prufer Codes *

Meta-heuristics for Multidimensional Knapsack Problems

MOTION BLUR ESTIMATION AT CORNERS

A Multi-step Strategy for Shape Similarity Search In Kamon Image Database

KIDS Lab at ImageCLEF 2012 Personal Photo Retrieval

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION

Semantic Image Retrieval Using Region Based Inverted File

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

Data Modelling and. Multimedia. Databases M. Multimedia. Information Retrieval Part II. Outline


A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

EFFICIENT H.264 VIDEO CODING WITH A WORKING MEMORY OF OBJECTS

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

The Codesign Challenge

A Background Subtraction for a Vision-based User Interface *

Machine Learning. Topic 6: Clustering

Hierarchical clustering for gene expression data analysis

A Gradient Difference based Technique for Video Text Detection

Smoothing Spline ANOVA for variable screening

Hybrid Non-Blind Color Image Watermarking

y and the total sum of

VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) , Fax: (370-5) ,

Enhanced Watermarking Technique for Color Images using Visual Cryptography

Multi-view 3D Position Estimation of Sports Players

Performance Evaluation of Information Retrieval Systems

VISUAL SELECTION OF SURFACE FEATURES DURING THEIR GEOMETRIC SIMULATION WITH THE HELP OF COMPUTER TECHNOLOGIES

A Gradient Difference based Technique for Video Text Detection

Vanishing Hull. Jinhui Hu, Suya You, Ulrich Neumann University of Southern California {jinhuihu,suyay,

A New Approach For the Ranking of Fuzzy Sets With Different Heights

Dependence of the Color Rendering Index on the Luminance of Light Sources and Munsell Samples

Transcription:

Techncal Note PR-TN 7/698 Issued: /7 Photo management applcatons M.A. Peters; P.M.F. Fonseca Phlps Research Europe

PR-TN 7/698 Authors address M.A. Peters WB 4 marc.a.peters@phlps.com P.M.F. Fonseca WB 4 pedro.fonseca@phlps.com KONINKLIJKE PHILIPS ELECTRONICS NV 7 All rghts reserved. Reproducton or dssemnaton n whole or n part s prohbted wthout the pror wrtten consent of the copyrght holder.

PR-TN 7/698 Ttle: Author(s): Revewer(s): Photo management applcatons M.A. Peters; P.M.F. Fonseca Bas Zoetekouw, IPS Facltes Techncal Note: PR-TN 7/698 Project: Content AnalySS And Navgaton for Dgtal consumer storage devces (CASSANDRA) (-6) Keywords: Abstract: dgtal photos, clusterng, mage smlarty, smplctylabs Ths report descrbes basc technology developed for hgher-level photo management applcatons. Based on the vsual comparson of photos, dstance measures are defned to yeld a numercal value ndcatng how smlar (or dssmlar) photos or groups of photos are. Ths enables a seres of md-level applcatons such as photo clusterng, orderng or search by example; based on these, hgher-level applcatons have been bult, explotng the basc functonalty offered by ths technology. The metrcs defned n ths report have been tested on large amounts of content. Real-world test sets comprsng thousands of photos have been used, and, for vdeo content-analyss applcatons, ths technology served as bass for the analyss of dozens, f not hundreds of hours of content from dfferent genres, such as news or sports content.

PR-TN 7/698 v

PR-TN 7/698 Contents. Introducton...7.. Applcatons...7.. Test set...8.3. Organzaton of ths report...8. Color spaces...9.. R G B color space...9.. Y CbCr color space....3. HSV color space....4. LUV color space....5. HMMD color space...3 3. General dstance measures...4 4. Low-level features...6 4.. Luma...6 4... Dstance measure...6 4.. Color Hue...7 4... Dstance measure...7 4.3. Domnant Color...8 4.3.. Dstance measure...8 4.4. Color Structure...9 4.4.. Dstance measure... 4.5. Color Layout... 4.5.. Dstance measure... 4.6. Edges... 4.6.. Dstance measure...3 5. Dstance between photos and clusters of photos...4 5.. Dstance between photos...4 5.. Dstance between clusters of photos...5 5... Centrod for the domnant color descrptor...5 5.3. Notes on the geometrcal propertes of photo-dstance measures...6 v

PR-TN 7/698 6. Md-level applcatons...7 6.. Extract features...7 6.. Cluster photos...8 6.3. LkePhoto...3 6.4. Compare photos...3 7. Conclusons...3 References...33 v

PR-TN 7/698. Introducton Ths report descrbes basc technology developed for hgher-level photo management applcatons. The technology s based on the smple prncple that n order to be able to manage photos, one should be able to compare them, or more precsely to tell how smlar (or dssmlar) two or more photos are from each other... Applcatons By reducng the problem of comparng photos to a geometrcal one (albet wth specal characterstcs, as wll be dscussed later), well-known algorthms for rankng and clusterng and orderng can be used. These algorthms enable some md-level applcatons - descrbed n a later chapter n ths report whch n turn can be used for hgher-level photo content management applcatons such as: Search and retreval by settng search parameters such as color, photos n whch that color s predomnant can be automatcally retreved; furthermore, an example photo or photos can be gven as search parameter (n what s known as query by example) n ths case, photos whch are smlar to those gven as example are automatcally retreved and ranked, accordng to ther smlarty. SmplctyLabs [], whch s a platform for web based applcatons that Phlps Research s buldng to test concepts, uses the search functonalty descrbed by the LkePhoto applcaton. Organzaton of photo collectons smlarty-based clusterng can be performed n order to group together vsually (very) smlar photos and thus, help navgatng, browsng and vsualzng large photo collectons; ths s especally mportant for the new photo-capturng paradgm enabled by the large storage capactes of dgtal photo cameras: users capture multple (sometmes dozens) of photos of the same object of nterest expectng that at least one has good qualty; however, snce ther local storage devces have even larger capactes (typcally the computers hard drves or CDs or DVDs where they burn ther pctures on), they tend to keep all photographs of that object, even those that do not have very good qualty. Sldeshow by groupng vsually very smlar photos, a sldeshow can be prepared consstng of a) photos belongng to one cluster b) photos belongng to the entre collecton, where only one photo s chosen per cluster especally the later s valuable for large photo collectons where multple photos were captured of the same object; by choosng only one, the sldeshow can thus be kept to the mnmum sze possble, whle stll offerng an overvew of the entre collecton. Furthermore, f the photos can be compared, they can also be ordered; n a manner comparable to alphabetcal or numercal orderng, a vsual order can be computed such that a sldeshow automatcally prepared n such a way offers a smooth (from a vsual pont of vew) flow over the entre collecton. Furthermore, the exact same algorthms proved valuable to enable applcatons of other modaltes, namely n the area of vdeo content analyss. Indvdual frames of vdeo sequences can be consdered photos and a group of frames from the same sequence can be consdered a photo collecton. Therefore, some of the technology developed for the 7

PR-TN 7/698 ntal purpose of analysng dgtal photo content was drectly used n the followng applcatons: HotNews segmentaton of news tems n a news recordng can be prmarly acheved by detectng whch are the shots where the anchorperson s vsble and ntroduces the new story segment. These shots are all very smlar n nature, so a smlar algorthm to that used for photo clusterng was used to automatcally determne the groups of frames (and shots where these belong to) that are smlar and repeat themselves throughout the news sequence. Tenns-n-a-mnute n order to detect the hghlghts of a tenns match and based on the knowledge that durng play segments (where a pont s beng dsputed), the camera angles used to flm the event are more or less constant throughout a sequence, a smlar algorthm to that used for photo clusterng was used to automatcally determne the groups of frames (and shots where these belong to) that are smlar and also repeat themselves throughout the tenns sequence... Test set The test set based on whch the technology was evaluated and optmzed conssts of a typcal photo collecton of a holday trp, wth many smlar photos of the same object and a wde spread and varety of photo types: natural landscapes, ctyscapes, anmals, persons, etc. More concretely, the test set comprsed 599 photos from a holday trp of a colleague to the USA. These photos are very representatve of these knd of collectons and of the wdespread amateur photo capturng paradgm dscussed earler. Some photos are blurred, out of focus, wth bad lght condtons, etc, whle many others are sharp, well balanced, and have a general good qualty..3. Organzaton of ths report In the next few chapters, an essental yet lghtweght ntroducton to some mage processng bascs are ntroduced, essental to fully understand the later chapters. Frst, an overvew of the color spaces where some features were extracted s gven. The report contnues wth an overvew of some basc and general dstance measures, typcally used to compare descrptors of certan features (lke luma hstograms, etc). It contnues further by ntroducng n detal each of the features used for mage comparson, ncludng examples of the descrptors used to descrbe these features and an analytcal descrpton of the dstance measures used for each descrptor. Afterwards, we dscuss how can these features be combned n one sngle dstance measure. Fnally, before concludng the report, we brefly dscuss some md-level applcatons whch whle not beng very useful to an end-user on ther own, are the essental buldng blocks of more valuable and complex applcatons such as those descrbed above. 8

PR-TN 7/698. Color spaces In ths secton we wll explan the dfferent color spaces that we use. Throughout the sequel we try to be consstent n usng the prme symbol ( ) for components that are gamma corrected. In the frst secton, on RGB, t wll be explaned what that means. For detals on gamma correcton as well as on the dfferent color spaces, we refer to []... R G B color space The RGB color space s the most commonly known one. It s an addtonal color space n whch red, green, and blue are prmary colors that can be combned to reproduce all the colors. Each of these components have values n the range [, 55], and together they form a 3-dmensonal cube representng all the colors. For dsplayng mages, for example on a computer screen, values are so-called gamma corrected. Ths orgnates from analogue dsplays, where a lnear ncrease n a specfc color value does not match a lnear ncrease on the dsplay. In order to fx ths, the values are corrected usng a power functon wth exponent gamma, where a commonly used value for gamma s. (=/.45). Ths gves a (non-lnear) gamma corrected value R : IF ( R 55*.8 ) THEN R = 4.5 R; ELSE R = R 55.99 55 ENDIF.45.99 ; And smlarly t gves gamma corrected values G and B. Now, a decoded mage wll have gamma corrected values, snce a dsplay expects gamma corrected values. Even 9

PR-TN 7/698 modern LCD dsplays (whch do not have problems wth showng a lnear ncrease) are buld such that they expect gamma corrected values. So, we deal wth R G B, but keep n mnd that f we need lnear calculatons we have to revert to RGB. For completeness we gve the nverse formula as well, agan for the case where gamma s.. IF ( R 55*.8 ) THEN R = R / 4.5 ; ELSE R' + 5.45 R = 55 8.45 ENDIF... Y CbCr color space ; The Y CbCr color space defnes a color space n terms of a luma component Y and two chromnance components Cb and Cr. The R G B and the Y CbCr color spaces can be easly transformed by lnear equatons as expressed n the ITU-R BT.6 standard [3]: Y =.57 R +.54 G +.98 B + 6 Cb =.48 R.9 G +.439 B + 8 Cr =.439 R.368 G.7 B + 8 R =.64 (Y 6) +.596 (Cr 8) G =.64 (Y 6).39 (Cb 8).83 (Cr 8) B =.64 (Y 6) +.8 (Cb 8) Usng the ranges for R G B (all n [,55]), we get the followng ranges: Y s n [6,35], Cb and Cr are n [6, 4]. One wll also often see the Y UV color space, whch s just a scaled verson of Y CbCr n such a wll that all three components are n the range [,55]. Note that we do not use the prme ( ) for Cb and Cr. Ths s by conventon, snce Y CbCr always s based on the non-lnear components R G B there should be no confuson.

PR-TN 7/698.3. HSV color space HSV specfes a nonlnear transformaton. It conssts of the Value representng the lghtness of color, the Saturaton ndcatng the degree of colorfulness, and the Hue representng the domnant spectral tone of the color, denoted by an angle from to 36 degrees. Ther values are derved from the normalzed R G B values (rangng from to ) as follows: V = max(r,g,b ); IF ( V == ) THEN S == ; ELSE S = (V mn(r,g,b )) / V ENDIF IF ( mn(r,g,b ) == V ) THEN H = ELSE IF ( V == R ) THEN H = 6 (G B ) / (V mn(r,g,b )) ELSE IF ( V == G ) THEN H = + 6 (B R ) / (V mn(r,g,b )) ELSE H = 4 + 6 (R G ) / (V mn(r,g,b )) ENDIF ENDIF IF ( H< ) THEN H = H +36 ENDIF The Saturaton component takes values n the range [,], the Value component has values n the range [,] and the Hue component has values n the range [,36).

PR-TN 7/698.4. LUV color space The CIE (Commsson Internatonale de l Éclarage) standardzed LUV as a reasonably perceptually unform color space. Ths converson s very useful for the extracton of other feature descrptors, for example the MPEG 7 Domnant Color descrptor. The values are derved from RGB values as follows (note that the nput are RGB values, not gamma corrected R G B ones): r = R / 55 g = G / 55; b = B / 55; X =.4453 r +.35758 g +.843 b Y =.67 r +.756 g +.769 b Z =.9334 r +.993 g +.957 b x = X / ( X + Y + Z ) y = Y / ( X + Y + Z ) u = 4x / (-x + y + 3) v = 9y / (-x + y + 3) IF ( Y >.8856 ) THEN L = 6 Y^(/3) 6 ELSE L = 93.3 Y ENDIF u =.6 v =.4686 U = 3Y ( u u ) V = 3Y ( v v )

PR-TN 7/698.5. HMMD color space Whte Color Sum Mn Dff Hue Max Black Color HMMD s defned by a nonlnear, reversble transformaton from the R G B color space. There are fve dstnct attrbutes (components) n the HMMD color space, however only three of them (Hue, Max, Mn, or Hue, Dff, Sum) are suffcent to defne the color space. The fve attrbutes can be characterzed as follows: Hue: the same as n HSV. Max: ndcates how much black color t has, gvng a flavor of shade or blackness. Mn: ndcates how much whte color t has, gvng a flavor of tnt or whteness. Dff: ndcates how much gray t contans and how close to the pure color, gvng a flavor of tone or colorfulness. Sum: smulates the brghtness of the color. The transformatons for Max, Mn and Hue are the same as the equatons for Mn, Max and Hue n HSV color space. The transformatons for Dff and Sum have the followng form: Dff = Max Mn; Sum = (Max + Mn)/; The Max, Mn and Sum components have values n the range [,] and the Dff component has values n the range [,]. The Hue component takes values n the range [,36]. 3

PR-TN 7/698 3. General dstance measures Many features are descrbed n terms of hstograms. Now, there are many known measures that defne a dstance between two hstograms. It wll not be a surprse that for some feature one dstance measure turns out to be better than another. Sometmes ths can be explaned mathematcally, sometmes t s just the result of benchmarkng. The followng formulas apply to a hstogram H wth N bns, H[], H[],, H[N-], and gve dstance measures between and. The hstogram ntersecton dstance: N mn( H [ ], H [ ]) = Ths measure just adds the smlar parts per bn n order to defne the smlarty between two hstograms. In some cases, there often s a qute large amount of values n a specfc bn. In those cases the dstance can be adjusted to focus more on the other bns. Thnk for example of edges, there wll almost always be many pxels n an mage that do not belong to an edge. So, for example, f the frst bn often has large values, we use: The adjusted hstogram ntersecton dstance: N mn( H [ ], H [ ]) = mn( H [], H [],.5) To understand why ths s a good measure, we dstngush between the 3 cases: Both H [] and H [] are small. Although we assume that ths does not happen frequently, we cannot exclude t. In ths case the denomnator s almost equal to, and the dstance measure s almost equal to the standard hstogram ntersecton dstance. So t wll hardly affect ths dstance. H [] s small and H [] s bg (or vce versa). In ths case the denomnator becomes smaller, maybe even.5, and the dstance measure can be approxmately twce the hstogram ntersecton dstance. Note that the hstogram ntersecton dstance s already relatvely large on ts own, and t wll stay large. Both H [] and H [] are bg. The hstogram ntersecton dstance wll be relatvely small because of the contrbuton of the frst bn. By usng the adjusted dstance measure, we put less weght on the frst bn and more on the others. Therefore the detals n the other bns wll have a bgger nfluence on the dstance measure. The effect s that we can dstngush better between mages that have a large overlap n the frst bn, where the dstance between other possble mages wll not get smaller. The threshold n the denomnator (here.5) s used to avod the stuaton where mages are for example completely equal wth all values n the frst bn, except for pxel. In such case the dstance would be otherwse. 4

PR-TN 7/698 5 Next we dscuss the correlaton, whch uses the average and the standard devaton of the hstogram. Average of a hstogram: N H H N = = ] [ If the hstogram s normalzed, ths equals /N. The standard devaton: ) ] [ ( ) ( = = N H H H N σ Now, the correlaton dstance can be formulated as follows: (Pearson s) correlaton dstance: ) ( ) ] [ ( ) ( ) ] [ ( = N H H H H H H N σ σ For completeness we menton the dstance between two arrays, whch wll be used by the ColorLayout feature: Eucldan dstance between arrays x and y: = ) ( N y x

PR-TN 7/698 4. Low-level features 4.. Luma Frst note that luma s not the same as lumnance (see also []). True (CIE) lumnance s formed as a weghted sum of lnear RGB components. Yet people often refer to Y as lumnance, whch s not correct snce t s defned n terms of non-lnear components R G B. Ths weghted sum s referred to as luma. The luma range [6,35], gven by the Y component of the Y CbCr color space, s dvded nto 6 unform bns. The bns represent the normalzed amount of pxels that have ther luma value n ths range. Ths feature represents the brghtness of the mage. {, j : k 9/6 Y' -6 < (k + ) 9/6} Luma_bn[k] = # j for k=,,,5 4... Dstance measure For luma, the hstogram ntersecton dstance works very well. For our test set of 599 photo s, we have the followng results: Average dstance:.46 Maxmum dstance: The maxmum dstance s for example reached by the mages.9.9.8.8.7.7.6.6.5.5.4.4.3.3.... 3 4 5 6 7 8 9 3 4 5 3 4 5 6 7 8 9 3 4 5 6

PR-TN 7/698 4.. Color Hue The hue range [,36), gven by the H component n the HSV color space, s dvded nto 6 unform bns. The bns represent the (normalzed) amount of pxels that have ther hue value n ths range. Ths feature represents the global dstrbuton of color n the mage. {, j : k 36/6 < (k + ) 36 /6} Hue_bn[k] = # H j for k=,,,5 4... Dstance measure Also for the hue hstogram the ntersecton dstance works well. Average dstance:.54 Maxmum dstance: The maxmum dstance s for example reached by the mages.9.9.8.8.7.7.6.6.5.5.4.4.3.3.... 3 4 5 6 7 8 9 3 4 5 3 4 5 6 7 8 9 3 4 5 7

PR-TN 7/698 4.3. Domnant Color The Domnant Color descrptor s defned by MPEG7 [4]. The pxels n the mage are clustered n at most 8 colors, representng the domnant colors of the mage. The domnant colors are extracted as a result of successve dvsons and mergng of the color clusters. The Generalzed Lloyd Algorthm (GLA) algorthm s used between each dvson. For detals of an algorthm that computes the domnant colors, we refer to the MPEG7 expermentaton model [5]. The output conssts of several elements, beng Number of domnant colors (at least, at most 8) For each domnant color: o The value of ths color, n R G B components o A boolean for the varance of each of the components, ndcatng f the varance s bg or not o The percentage of ths domnant color n the mage The quantzed spatal coherency, whch s a weghted sum of the per-domnant-color spatal coherency. The spatal coherency per domnant color captures whether or not the color s coherent and appears to be a sold color nstead of beng scattered across the mage. Although the values of the domnant colors are gven n R G B, the computaton of the domnant colors s done n LUV, snce ths color space models human percepton of color more closely than ether R G B or Y CbCr. 4.3.. Dstance measure We use the dstance measure as proposed n the MPEG7 expermentaton model. Ths dstance measure s based on a joned probablty functon for normal dstrbutons. Average dstance:.75 Maxmum dstance:.794 The maxmum dstance s acheved by the mages 8

PR-TN 7/698 98.4%.8%.8% 88.9% 8.% 3.% 4.4. Color Structure The Color Structure descrptor s also defned by MPEG7 [4]. Ths descrptor s a color-structure feature descrptor ntended for stll mage retreval. Its man functonalty s mage-to-mage matchng. It expresses local color structure n an mage by the use of a structurng element that s comprsed of several mage samples. Instead of characterzng the relatve frequency of sngle mage samples wth a partcular color, ths descrptor characterzes the relatve frequency of structurng elements that contan an mage sample wth a partcular color. 56 colors are defned based on a non-unform dstrbuton of the HMMD color space. A structurng element s used to count the frequences of these colors throughout the mage. It expresses local spatal dstrbuton of colors. 9

PR-TN 7/698 4.4.. Dstance measure MPEG7 proposes to use the L dstance (sum of absolute dfferences), but snce normalzaton s dffcult t s not easy to combne ths measure wth other descrptors. It turns out that the correlaton dstance works qute well. Expermental values gve a maxmum dstance of.63, therefore we scale by a factor.5. Of course we clp to assure a maxmum of. Average dstance:.33 ( scaled.47) Maxmum dstance:.63 ( scaled.945) The maxmum dstance s acheved by the mages 4.5. Color Layout The Color Layout descrptor s also defned by MPEG7 [4]. The mage s frst characterzed by an 8 x 8 mage, just averagng the colors n each block n the RGB color space. Then a DCT mage s created for each component Y, Cb, Cr. Ths DCT mage s quantzed, and the resultng 8 x 8 matrces are descrbed n 64 element arrays usng a zg-zag form, as shown below. 5 6 4 5 7 8 4 7 3 6 6 9 4 3 8 7 5 3 4 43 9 8 4 3 4 44 53 9 3 3 39 45 5 54 33 38 46 5 55 6 34 37 47 5 56 59 6 35 36 48 49 57 58 6 63 From each of the components we only use the frst 6 values (the hghlghted ones). They are the most sgnfcant ones and are suffcent to dstngush between mages. The color layout descrptor expresses global spatal dstrbuton of colors.

PR-TN 7/698 4.5.. Dstance measure The dstance s computed as n the MPEG7 expermentaton model, whch s a weghted Eucldan dstance of the 3 matrces from the DCT mage. 5 5 5 ( [ ] [ ]) ( [ ] [ ]) ( [ ] [ ]) ) / w YA YB + w CbA CbB + w CrA CrB = = = 9 Although the maxmum dstance s mathematcally almost 4, expermental values gve a maxmum dstance of.85. Therefore we scale by a factor.4 and clp to assure a maxmum of. Average dstance:.654 ( scaled.6) Maxmum dstance:.85 ( scaled.94) The maxmum dstance s acheved by the mages where the bottom mages show an enlargement of the 8x8 DCT mage thumbnals.

PR-TN 7/698 4.6. Edges Usng the Sobel operator, edges are determned n 4 drectons: horzontal, vertcal, dagonal, and ant-dagonal. The computatons are done only n the luma plane, the Y component of Y CbCr. For each of the drectons, the edges are dvded over bns n the nterval [,6). Edges wth values above 6 are put n the last bn. The bns represent the amount of pxels that have ther edge strength (value) n ths range. The Sobel operator s descrbed by a mask representng gradent operators. For edges n the horzontal drecton the mask s defned by where the boxed element denotes the orgn. So for a pxel n the Y plane, Y j, the gradent used for detectng horzontal edges s calculated by edges j = - Y' - j- + Y' + j - Y' - j + Y' + j - Y' j+ + Y' + j+ Smlarly the mask for edges n the vertcal drecton s defned by The gradent for detectng edges n the vertcal drecton s calculated by edges9 j = - Y' - j- + Y' - j+ - Y' j- + Y' j+ - Y' + j- + Y' + j+ The dagonal and ant-dagonal masks are gven by and

PR-TN 7/698 wth calculatons edges45 j = - Y' - j- - Y' - j - Y' j- + Y' j+ + Y' + j + Y' + j+ edges35 j = Y' - j + Y' - j+ - Y' j- + Y' j+ Y' + j- - Y' + j 4.6.. Dstance measure Edges are a typcal example were we often have large values n the frst bn. Therefore we use an adjusted hstogram ntersecton dstance for the edges. Average dstance:.36 Maxmum dstance: An example of two mages wth a large dstance (.973 for edges n ant-dagonal drecton) s gven by the mages.9.9.8.8.7.7.6.6.5.5.4.4.3.3.... 3 4 5 6 7 8 9 3 4 5 3 4 5 6 7 8 9 3 4 5 3

PR-TN 7/698 5. Dstance between photos and clusters of photos 5.. Dstance between photos Havng the dstance defned for each of the features, we compute the dstance between two photos as a lnear combnaton of the dstances between the features descrbng the photos. So where d(photo, photo ) = number of features- = w = number of features- = w d(feature (photo ),feature (photo )) Snce the dstance measure for each of the features s n the nterval [,], so s ths dstance. In our applcaton we store the settngs per feature n an xml fle settngs.xml, whch makes t very easy to change and test the nfluence of the separate features. As an example we gve the settngs as we now commonly use them, based on our test photo set. LumaHstogram.5 ColorHueHstogram.5 EdgesHstogram.5 Edges45Hstogram.5 Edges9Hstogram.5 Edges35Hstogram.5 DomnantColors.5 ColorStructure.5 ColorLayout. Wth these settngs we fnd for our test set: Average dstance:.37 Maxmum dstance:.838 where the maxmum s acheved by the followng mages: 4

PR-TN 7/698 See also secton 6.4 for the detals of the dstance per feature between these mages. 5.. Dstance between clusters of photos Now, s t straghtforward to defne the dstance between two clusters of mages? Frequently one uses the dstance between the centrods of the clusters. For example for a hstogram (most features are descrbed by hstograms) the centrod s defned as the hstogram wth n each bn the average of the contents of the same bn of all the photos n the cluster. So f H [k] s bn k of photo, then the hstogram for the centrod s gven by H[k] = number of photos - = H [k]/ number of photos The dstance between two clusters of photos, for the feature descrbed by the hstogram, can be defned as the dstance between the centrods. An evaluaton of all features shows that ths can be done for each of the features, except for the domnant color descrptor. From the descrptor s descrpton t s obvous that computng ths average s nether straghtforward nor trval at all. 5... Centrod for the domnant color descrptor Our proposed soluton, whch s fled as a patent applcaton [6], s to create a new mage from all the Domnant Color descrptors n a cluster of mages, based on ther components and the correspondng percentages. Say we have n mages, mage,, mage n. Then we create a new mage of sze n. Each lne contans pxels, based on the components of the Domnant Color descrptors of the mages, as follows: For =,, n, the -th lne contans the Domnant Colors of mage, where the amount of pxels for each color s determned by ts percentage. Say mage has m Domnant Colors c, c,, c m, wth percentages respectvely p, p,, p m. Then the -th lne of the new mage contans p pxels of color c, p pxels of color c,, and p m pxels of color c m. 5

PR-TN 7/698 Note: the shape of the new mage does not matter, for example we can also create a long lne of n pxels as the new mage. Now, the average Domnant Color descrptor of the cluster of mages wll smply correspond to the Domnant Color descrptor of the new mage. Obvously the (up to 8) domnant colors of the new mage are representatve for the average of the Domnant Colors n the cluster, and the percentages also represent how these Domnant Colors are dstrbuted over the mages n the cluster. Moreover, the varances of these colors also represent the varance of the color wthn the cluster. Ths new fake mage wll represent the centrod for the domnant color descrptor. So, the dstance between two clusters of photos can be defned as the dstance between the two centrods representng the clusters. 5.3. Notes on the geometrcal propertes of photo-dstance measures Much care was put n tryng to normalze the dstance measures for each feature such that they yeld the same range of values (between and ). However, one must pont out that some of the feature spaces are very non-lnear and very heterogeneous n nature (take domnant color or scalable color for nstance). Therefore, although the weghted average between the dstances computed for each feature mght make sense from an emprcal pont of vew, t does not have any of the geometrcal propertes that one would expect from tradtonal dstance measures computed on lnear spaces (e.g. Eucldan dstance between two RGB color vectors). 6

PR-TN 7/698 6. Md-level applcatons As ntroduced earler, the basc technology descrbed n ths report enables frst and foremost a number of md-level applcatons. Whle some of these applcatons mght not be valuable on ther own as end-user applcatons, they are essental buldng blocks needed to enable hgher-level and more valuable photo management applcatons. They are also valuable, as explaned n the frst chapter, to help other mage- and vdeo-content analyss technologes. For each of these md-level applcatons, example procedures were created and small executables that demonstrate ther functonalty made avalable. These comprse applcatons to cluster a gven set of photos, and to fnd photos smlar to a gven nput photo. For convenence we also have an applcaton to compare the dfferences between a lst of photos. All these applcaton do not use the photos themselves, they rather use the features that descrbe these photos. The features are computed usng an addtonal procedure that extracts the features for the photos. 6.. Extract features Frst of all, we have to extract the low-level features for a set of photos. The resultng nformaton (the features) s used by the other applcatons and therefore has to be done beforehand. The process of extractng the features s tme consumng (on current pc s t takes a few seconds per mage). Although extractng the features only has to be done once, the process s relatvely slow. extract Syntax: extract drname Descrpton: Ths applcaton analyss all the mages n the drectory drname, and extracts the low-level features for each of the mages. Output: For each jpg mage we get: - an xml fle descrbng the features of the mage - a thumbnal of the mage For an mage mymage.jpg, the features are stored n mymage.xml, and the thumbnal s stored n mymage_tn.jpg. The xml fles are used by the other applcatons, where the thumbnal mages are used n html pages descrbng the results of those applcatons. 7

PR-TN 7/698 6.. Cluster photos The cluster applcaton creates clusters of photos that are smlar. There are many known algorthms for clusterng. We chose to start by puttng each photo n a sngle cluster, and then merge clusters that are close to each other recursvely. The result s a collecton of clustered photos that are dsplayed together n an html page. The clusterng s done tll the remanng clusters are all more then a gven dstance apart. We provde optons for that. duplcates, whch s the default settng, clusters mages up to a dstance of (out of ), whle all clusters up to a dstance of 5. Our experence s that duplcates fnds most of the mages that are taken at the same tme at the same place, somethng that happens frequently wth dgtal cameras. On the other hand, all clusters a bt more smlar mages, not necessarly from the same scene. Note that the clusterng s based on low-level features, and therefore n general there wll be many sngleton clusters. For more general clusterng the use of md- and hgh-level features s needed. The clusterng applcaton s an order N applcaton, whch means that the tme t takes to cluster a set of photos s ncreases by the square of the total number of photos. cluster Syntax: Descrpton: Output: cluster drname [ duplcates (default) all ] Ths applcaton clusters the photos n the drectory drname based on low-level features. Dependng on the cluster type, t merges only photos that are very close ( duplcates ), or t merges also photos that are relatvely smlar. clusters.html. Ths html fle shows the clusters of photos created durng the procedure. For each cluster, one of the mages has a border of a dstngushng color. Ths mage s the one that s closest to the centrod. The followng mage shows a screenshot of cluster.html for our test set. In each cluster we have hghlghted one mage, whch s the one that s closest to the centrod. In case one would to flter a set of photos (because the set s too large), one could cluster the photos and select one mage for each cluster. Ths creates a summary of the whole photo set, whch shows one mage for each of the events that one took a photo of. 8

PR-TN 7/698 9

PR-TN 7/698 6.3. LkePhoto Ths applcaton fnds, gven a photo, all the photos that are wthn a gven dstance. In other words, t fnds smlar photos, and t can be used for search and retreval. In contrast to the clusterng applcaton, ths method s lnear and very fast. lkephoto Syntax: lkephoto flename [dstance] Descrpton: Ths applcaton fnds mages that are smlar to the nput mage. By settng the dstance (between and ) you fnd all the photos whch have a dstance smaller than ths parameter. Default value s 5. The resultng photos are ordered, so the closest one s shown frst. Moreover we show the mage that s furthest away from the nput mage. Output: smlar_photo.html 3

PR-TN 7/698 In the screenshot above, the hghlghted mage s the nput mage, and the other mages are the results of the lkephoto applcaton. The mage below the lne s the one that s furthest away from the nput mage. 6.4. Compare photos Ths s just a help functon, t s not meant to be an applcaton on tself. However, snce t s a very useful tool for analyss, we do menton t here. compare Syntax: compare flename flename [flename 3... flename n ] Descrpton: Ths applcaton shows the total weghted dstance as well as the dstance per component between all the gven mages. Output: dstances.html For example, when comparng the photos n the test set that are furthest apart, we get: 3

PR-TN 7/698 7. Conclusons Ths report descrbed basc technology developed for hgher-level photo management applcatons. Based on the vsual comparson of photos, dstance measures were defned to yeld a numercal value ndcatng how smlar (or dssmlar) photos or groups of photos are. Ths enabled a seres of md-level applcatons such as photo clusterng, orderng or search by example; based on these, hgher-level applcatons have been bult, explotng the basc functonalty offered by ths technology. The metrcs defned n ths report have been tested on large amounts of content. Realworld test sets comprsng thousands of photos have been used, and, for vdeo contentanalyss applcatons, ths technology served as bass for the analyss of dozens, f not hundreds of hours of content from dfferent genres, such as news or sports content. Ths report wll hopefully serve as a thorough reference not only for future mage/vdeo content analyss applcatons, but also for basc but yet essental technques, tools and defntons, namely descrbng relevant color spaces and powerful mage descrptors. 3

PR-TN 7/698 References [] http://www.smplctylabs.net/ (Shoobox applcaton) [] Dgtal Vdeo and HDTV, Charles Poynton [3] ITU-R BT.6 standard: http://www.tu.nt/itu-r/ [4] Text of ISO/IEC 5938-3/FDIS Informaton technology Multmeda content descrpton nterface Part 3 Vsual [5] Text of ISO/IEC 5938-8 PDTR (Extracton and Use of MPEG-7 Descrptons) [6] Phlps Patent flng PH644 Domnant color descrptors, M.A. Peters, P.M.F.S Fonseca 33