Introduction. Chapter 1. Multimedia Information Retrieval. SIGIR 98. Norbert Fuhr. images audio. media types: text.

Size: px

Start display at page:

Download "Introduction. Chapter 1. Multimedia Information Retrieval. SIGIR 98. Norbert Fuhr. images audio. media types: text."

Matthew Walker
6 years ago
Views:

1 Multimedia Information Retrieval Norbert Fuhr SIGIR 98 Chapter 1 Introduction media types: text images audio video 1

2 terminology: monomedia object/document: object containing data of a single media multimedia object/document: object containing data of multiple media hypertext document: nonlinear text document (i.e. with links) hypermedia document: nonlinear multimedia document document structures and attributes IR networks heterogeneity effectivness user friendlyn. content structure head title author chapter section section chapter logical structure IR in networks J. Doe document layout structure author = J. Doe crdate = ladate = external attributes

3 characteristics of multimedia information systems: managing data of multiple media large storage and bandwidth requirements continuous delivery (to avoid jitter) synchronization between several channels (e.g. between audio and video) content- and similarity-based access user-friendly interfaces tasks: representation and transformation storage and compression communication and synchronization authoring and cooperation presentation content analysis browsing retrieval and filtering user interfaces versioning

4 Course structure: 1. introduction 2. media 3. retrieval 4. indexing Chapter 2 Media media types and dimensions views on media objects FERMI multimedia data model 7

5 2.2 Views on media objects here: images physical view pixel matrix logical view perceptive view colour brightness symbolic view spatial view: spatial relations (depending on modelling space) structural view set of image objects structural relations between image objects (aggregation) texture 2.1 Media types y image text is a linear medium... t audio video x

7 2.3 The FERMI Multimedia Document Model Document structure and IR impact of structure: of multimedia information: heterogeneity of multimedia data on semantic content: logical ˆ= structure discourse structure on corpus classic IR: document = atomic unit MMIR: retrieval of document components

8 2.3.2 Elements of the multimedia data model logical structure hierarchy of structural objects leaves = single-media data implements explicit organization of discourse other data model elements refer to logical structure attributes classical attributes(author,dates,...) index expressions navigational structure links The logical structure logical structure ˆ= hierarchical aggregation of structural objects: LS = (OS; str;seq; TYPEST ; tst; typest ; TYPEM; typem) OS: finite set of document structural objects elements: osi str: aggregative relation between structural objects, defines hierarchical composition seq: defines a linear sequence on OS (corresponds to standard, linear order to access components TYPEST : set of types of structural objects types correspond to abstraction levels tst : relation on structural object types defining hierarchy of abstraction levels typest : total function assigning each structural object its structural type in TYPEST TYPEM : set of media types, = TYPEM ftext ;image;graphic;multimediag. typem : total function assigning to each structural object its media type in TYPEM e.g. for books: TYPEST = fdocument; Chapter; Sub Section; Paragraph; Figureg Section;

9 Attributes A = (OS;NAMEA;VALUEA; namea; domaina; valuea; SM) where: OS: the set of structural objects in the document elements: osi NAMEA: set of attributes names. VALUEA: set of all possible attribute values (union of all the domain languages of all attributes) namea: partial function associating to structural objects a nonempty set of attribute names domaina: total function defining the domain of any attribute name (i.e. all the expressions of its associated language) valuea: partial function assigning to structural objects the value for a related attribute name (definition allows multi-valued attributes) Content Attributes single-media models involve up to five types of views: the physical view the structural view the symbolic view standard attribute names (called Content Attributes) for! views: physical structural symbolic spatial perceptive the spatial view (only in image and graphic models) the perceptive view (only in image and graphic models)

10 Attribute Classes ˆ= attribute properties assigned to elements of the logical structure problem in structured documents: propagation of properties among related structural objects example: author names propagated (and collected) bottom-up type of propagation may depend on specific attribute (e.g. author vs. publication date) type of document (e.g. conference proceedings, encyclopediae etc.) classes of attributes: dynamic attributes descending (e.g. publication date) ascending (e.g. author) static (e.g. title) Indexing model indexing: assign index expressions to document structural objects retrieval of multimedia documents: retrieve smallest units that fulfill the query index expressions assigned to parent object have to imply index expressions of its component! objects index objects: structural objects that are indexed (assigned a value of attribute symbolic) index model of a document base: OI : set of index objects oii OI OS I =(OI;TYPEI;ind) TYPEI : set of index object types TYPEI TYPEST ind : relation representing structural dependency between index objects: ind OI OI

11 parts of the structural and semantic views of a document Structural View Semantic View Document Os1 Chapter Section Subsection Paragraph U1 U2 Os2 Os3 Os4 Os5 Os6 Os7 U4 U6 Os8 Os9 Os10 Os11 Os12 Os13 Os14 Os15 Os16 Os17 Os18 Os19 U3 U5 U7 Osem2 Osem3 Osem8 Osem9 Osem10 Osem11 Osem Example of an indexing structure example of structure and index hierarchy types (index objects of type Chapter or Subsection only) Structure Types Symbolic Types Document Chapter Section Subsection Paragraph

12 2.3.3 Document Model, Document Base and Hyperbase Document Model combines logical structure and attributes: D =(LS;A) The Document Base document base B: set of documents document base as structure: B =(B;D) The Hyperbase document database does not allow browsing based on navigation links! support for browsing and querying: index objects + navigation links

13 Navigation structure =(ON; RNAV; TYPEL; cross; typel) N ON: set of node objects (nodes) ni of the hyperbase ON OSB. RNAV: relation defining navigation links on nodes intra-document links / inter-document links RNAV ON ON TYPEL: set of link types e.g. Same Author, Similar topic cross: standard access function related to navigation links: cross : RNAV! ON link (ni;nj): ni source, n j target 8(ni; n j) 2 RNAV; cross(ni; n j) =n j typer: total function assigning a type to each link: typer : RNAV! TYPEL Hyperbase H B document base N navigation structure I index model H =(B;N ;I)

14 Chapter 3 Multimedia Retrieval the logical view on IR models based on predicate logic retrieval of structured documents The logical view on IR IR as inference IR as uncertain inference Propositional vs. predicate logic

15 3.1.1 IR as inference q - query d document retrieval: search for documents which imply the query:! d q example: d = ft1;t2;t3g q = ft1;t3g logical view: = d t1 ^t2 ^t3 = q t1 ^t3 ): d! q advantage of inference-based approach: step from term-based to knowledge-based retrieval e.g. easy incorporation of additional knowledge example: d: squares q: rectangles thesaurus:! squares rectangles ): d! q

16 3.1.2 IR as uncertain inference d: quadrangles q: rectangles uncertain knowledge required ) quadrangles 0:3 rectangles! [Rijsbergen 86]: IR as uncertain inference ˆ= Retrieval estimate probability! q) P(d P(qjd) = q t 1 t 4 t 2 t 5 t 3 t 6 d Propositional vs. predicate logic limitations of propositional logic: document attributes query: documents published after 1990??- pubyear(d,y) & Y>1990 conventional indexing (based on propositional logic): d = ftree, houseg query: Is there a picture with a tree on the left of the house? query cannot be expressed in propositional logic ) predicate logic: d: tree(t1). house(h1). left(h1,t1).?- tree(x) & house(y) & left(x,y). multimedia retrieval

17 3.2 Models based on predicate logic Terminological logic Datalog Probabilistic Datalog Terminological logic Thesaurus polygon regular polygon triangle quadrangle... rectangle regular triangle square thesaurus knowledge: can be expressed in propositional logic square = ^ quadrangle regular-polygon terminological logic based on semantic networks more expressive than thesauri instances of concepts roles between (instances of) concepts

18 Elements of terminological logic concepts monadic predicates person, document roles dyadic predicates author, refers-to terminological axioms describe relationships between concepts and roles connotations: necessary conditions < man person definitions: necessary and sufficient conditions square = rectangle and regular-polygon assertions define instances of concepts and roles document[d123]. person[smith]. author[d123,smith]. MIRTL Multimedia Information Retrieval Terminological Logic terminological module concepts, roles, definitions, connotations assertional module: assertions

19 regular-triangle =(andtriangle regular-polygon) german-paper =(andpaper (c-some author german)) student-paper =(andpaper (all author student)) non-german =(andperson (a-not german)) unido =(singuniv-dortmund) multilingual =(and person (atleast 2 speaks-lang)) chinese-parent =(and(chinese (atmost 1 child))) MIRTL Syntax j(a-nothmonadic symboli) predicate constanti) j(singhindividual + ) j(andhconcepti j(allhroleihconcepti) hroleihconcepti) j(c-some numberihrolei) j(atleasthnatural numberihrolei) j(atmosthnatural ˆ= (and (atleastnr)(atmostnr)) (exactlynr) RC) ˆ= (func (and (allrc)(exactly R)) 1 (no R) ˆ= (atmost0 R) ::= hmonadic predicate symboli hconcepti j(top) j(bottom) ::= hdyadic predicate symboli hrolei j(invhrolei)

20 student =(andperson (atleast 1 enrolled) (atmost 1 enrolled) (all enrolled university)) =(andperson (exactly 1 enrolled) (all enrolled university)) =(andperson (func enrolled university)) bachelor =(andman (no spouse)) Retrieval with terminological logic modelling of documents: external attributes logical structure layout structure content structure queries: MIRTL as query language terminological knowledge (thesaurus)

21 (and paper (func appears-in (sing SIGIR93))) (all author (func affiliation (sing IEI-CNR))) (c-some author (sing Carlo-Meghini)) (c-some author (sing Fabrizio-Sebastiani)) (c-some author (sing Umberto-Straccia)) (c-some author (sing Constantino-Thanos)) (exactly 4 author))[paper666] (and (func typeset-with (sing LaTeX)) (func format (sing double-column)) (no figure) (no running-header) (no running-footer))[paper666] (and (exactly 1 abstract) (exactly 5 section) (exactly 1 bibliography)) [paper666] bibliography [paper666,bib666] (and (func typeset-with (sing BibTeX)) (func style (sing plain)) (exactly 22 reference)) [bib666] (and (c-some dw (sing Mirtl)) (c-some dw (sing syn666)) (c-some dw (sing sem666)) (c-some dw (sing alg666)) (c-some dw (sing terminological-logic (c-some modeling-tool (sing IR))))) [paper666] terminological-logic [Mirtl] syntax [Mirtl,syn666] semantics [Mirtl,sem666] inferential-algorithm [Mirtl,alg666] Papers by Thanos about terminological logic? (and paper (c-some author (sing Costantino-Thanos)) (c-some dw (sing terminological-logic))) Papers by Thanos on semantics of terminological logic? (and paper (c-some author (sing Costantino-Thanos)) (c-some dw (c-some (inv semantics) terminological-logic)))

22 italian[carlo Meghini] italian < european Papers by European author? (and paper (c-some author european)) Modelling IR in Datalog Introduction Datalog: horn predicate logic (most IR models based on propositional logic) no functions restricted forms of negation allowed sound and complete evaluation algorithms

23 ground facts: docterm(d1,ir). docterm(d2,ir). docterm(d1,db). docterm(d2,oop). rules: irdoc(d) :- docterm(d,ir). iranddb(d) :- docterm(d,ir) & docterm(d,db). irnotdb(d) :- docterm(d,ir) & not(docterm(d,db)). recursive rules: link(d1,d2). link(d2,d3). link(d3,d1). linked(x,y) :- link(x,y). linked(x,y) :- linked(x,z) & link(z,y). Hypertext structure docterm(d1,ir). docterm(d1,db). link(d1,d2). link(d2,d3). link(d3,d1). about(d,t) :- link(d,d1) & about(d1,t). d3 d1 d2 ir db?- about(d,ir) d1 d3 d2 docterm link

24 Image retrieval output of IRIS image indexing: probabilistic facts imgobj(o,i,n,l,r,b,t) O: object id I: image id N: concept (water,sand,forest,stone...) L,R,B,T: coordinates of the MBR images with stones in front of a forest:?- imgobj(oa,i,stone,l1,r1,b1,t1) & imgobj(ob,i,forest,l2,r2,b2,t2) & <= B1 B2

25 3.2.3 Probabilistic Datalog Syntax ground facts with probabilistic weights 0.9 docterm(d1,ir). 0.5 docterm(d1,db). 0.8 docterm(d2,ir). 0.3 docterm(d2,oop).?- docterm(d,ir). gives d1 0.9 d2 0.8?- docterm(d,ir) & docterm(d,db). gives d Semantics 0.6 docterm(d1,ir). 0.5 docterm(d1,db). independence! assumptions fdocterm(d1,ir)g P(W1)=0:3: P(W2)=0:3: docterm(d1,db)g fdocterm(d1,ir), fdocterm(d1,db)g P(W3)=0:2: fg P(W3)=0:2:?- docterm(d1,ir) & docterm(d1,db) 0.3

26 Disjoint events example: imprecise attribute values # py(dk,av). 0.2 py(d3,89). 0.7 py(d3,90). 0.1 py(d3,91). interpretation: P(W1)=0:2: fpy(d3,89)g P(W2)=0:7: fpy(d3,90)g P(W3)=0:1: fpy(d3,91)g b89(x) :- py(x,y) & Y > 89.?- b89(x) gives d3 [p(d3,90) p(d3,91)] 0:7 + 0:1 = 0:8 Vague predicates phrase search:?- doc(d), phrase(d, information retrieval ) documents:...information retrieval systems......information storage and retrieval......retrieval of information......information is retrieved... phrase as vague predicate, yields probabilistic weight (similar to Boolean builtin predicates) applications of vague predicates: variants of text search: compound words, proper nouns vague fact conditions (e.g. price ˆ<1000) multimedia IR (e.g. audio retrieval, image retrieval)

27 Probabilistic rules generating probabilistic events from deterministic facts: 0.5 related(d,d1) :- link(d,d1). about(d,t) :- docterm(d,t). about(d,t) :- related(d,d1), about(d1,t). semantics: # sex(dk,av). 0.7 l-s(x) :- sex(x,male). 0.4 l-s(x) :- sex(x,female). 0.5 sex(x,male) :- human(x). 0.5 sex(x,female) :- human(x). human(peter).?- ls(x) gives peter 0.55 interpretation: P(W1)=0:35: fsex(peter,male), l-s(peter)g P(W2)=0:15: fsex(peter,male)g P(W3)=0:20: fsex(peter,female), l-s(peter)g P(W4)=0:30: fsex(peter,female)g 3.3 Retrieval of structured documents: POOL goals: retrieval of structured documents hierarchical logical structure! abstraction from node types contexts as untyped nodes! multimedia retrieval! expressiveness of restricted predicate logic

28 3.3.1 Structure of POOL programs object: identifier + content context: object with nonempty content (a1, s11, s12) program: set of clauses clause: context / proposition / rule proposition: term (image, presentation) classification (article(a1), section(s11)) attribute (s11.author(smith), a1.pubyear(1997)) example: a1[ s11[ image 0.6 retrieval presentation ] s12[ ss121[ audio indexing ] ss122[ video not presentation ] ] ] s11.author(smith) s121.author(miller) s122.author(jones) a1.pubyear(1997) article(a1) section(s11) section(s12) subsection(ss121) subsection(ss122) rule: head :- body head: proposition / context containing a proposition body conjunction of subgoals (propositions or contexts) docnode(d) :- article(d) docnode(d) :- section(d) docnode(d) :- subsection(d) mm-ir-doc(d) :- docnode(d) & D[audio & retrieval] german-paper(d) :- D.author.country(germany) query:?- body?- D[audio & indexing]

29 3.3.2 Augmentation Contexts and augmentation clauses only hold for context where stated augmentation: propagation of propositions to surrounding contexts a1[ s11[ image 0.6 retrieval presentation ] s12[ ss121[ audio indexing ] ss122[ video not presentation ] ] ]?- D[audio & video] ; s12 augmentation with uncertainty:?- audio 1.00 ss121 ; 0.60 s12 ; 0.36 a1 ;?- D[audio & video] 0.22 a1 ; augmentation with uncertainty prefers most specific context! Augmentation and inconsistencies d1[ s1[ audio indexing ] s2[ s21[ image retrieval] s22[ video not retrieval ] ] ]?- D[audio & indexing] ; s1 d1 ;?- D[video & image] s2 ; d1 ;?- D[video & retrieval] ; (retrieval is inconsistent in s2) four-valued logic truth values: unknown, true, false, inconsistent s22: 7! video true 7! image unknown 7! retrieval false s2: 7! image true 7! video true 7! retrieval inconsistent ; a1 ; 0.36 s12

30 Chapter 4 Multimedia Indexing audio images video Audio Sound retrieval E. Wold et al.: Content-based classification, search and retrieval of audio. IEEE Multimedia 3(3), pp Levels of audio retrieval 1. exact match of sound samples 2. inexact match of sounds, irrespective of sample rate, quantization, compresssion, inexact match of acoustic features / perceptual properties of sound 4. content-based match (for speech, musical content) here: inexact match of sounds

31 Acoustic features aspects of sound considered: loudness root-mean-square of audio signal (in decibels) pitch greatest common divisor or peaks in Fourier spectra brightness centroid of short-time Fourier magnitude spectra (higher frequency content of signal) bandwidth magnitude-weighted average of differences between spectral components and the centroid (variation of frequencies, e.g. sine wave vs. white noise) harmonicity deviation of the sound s spectrum from a harmonic spectrum (i.e. harmonic spectra vs. inharmonic spectra vs. noise) variation of aspects over time: 1. compute aspect values at certain time intervals 2. derive features from sequences: average value variance autocorrelation (feature values weighted by amplitude) sound example

32 Property Mean Variance Autocorrelation Loudness Pitch Brightness Bandwidth Indexing and retrieval Indexing of a sound: compute and store feature vector a (mean, variance and autocorrelation for loudness, pitch, brightness, bandwidth and harmonicity) Retrieval: 1. conditions w.r.t. feature values 2. similarity of sounds: weighted Euclidian distance mean: µ= 1 M M a j j=1 covariance R = 1 M M (a j µ)(a j T µ) j=1 distance D = q (a b) T R 1 (a b) M # sounds considered

33 Property-based training and classification training: based on set of training sounds for a property (e.g. scratchiness) compute property-specific mean and covariance importance of feature: mean divided by standard deviation classification compute distances to means of all classes, select class with minimum distance likelihood: D 2 L exp = 2 Example: classification of laughter sounds

34 Example: class model for laughter Feature Mean Variance Importance Duration Loudness: Mean Variance Autocorrelation e Brightness: Mean Variance Autocorrelation Bandwidth: Mean Variance e Autocorrelation Pitch: Mean Variance Autocorrelation importance = jmeanj / p variance Speech retrieval 1. speech! recognition uncertain term identification 2. application of text retrieval methods on recognized terms Music retrieval McNab etal: The New Zealand Digital Library MELody index. D-Lib Magazine, May melody transcription 2. approximate string matching! TREC speech retrieval track

35 4.2 Images Introduction Semantic vs. syntactic indexing and retrieval syntactic image features: color texture contour semantic image features: objects (humans, animals, buildings, art works) topics (pollution, demonstration, political visit) most image indexing methods support syntactic features only Aboutness vs. ofness ofness: objects shown in the image aboutness: topic which is illustrated by the image aboutness is very much user-dependent e.g. image showing water pollution

36 4.2.2 QBIC tool for querying image and video databases example images user-constructed sketches and drawings selected color and texture patterns camera and object motion System overview main components: database population: 1. processing of images and videos to extract syntactical features: colors textures shape camera motion object motion 2. storing features in database database querying 1. user composes query graphically 2. generate features from from graphical query 3. search for database objects with similar features

object part of a scene videos: 1. break into clips (shots) 2.

37 Data model basic elements: still images/scenes contain objects video shots sets of contiguous frames contain motion objects still images: scene: image or video frame object part of a scene videos: 1. break into clips (shots) 2. generate representative frame for each slot, treated as still image 3. generate motion objects from shots

38 querying: on objects images with a red, round object on scenes images with 30 % red and 20 % blue on shots shots panning from left to right on combinations images with 30 % red containing a blue object Feature Calculation color color models: RGB, HSV, YUV, MTM average coordinates in color space k element histogram (typically = k 64;256)

39 texture coarseness: scale of texture contrast: vividness of a pattern (function of variance of grey-level histogram) directionality: peakedness of distribution of gradient directions in image (favoured direction (e.g. grass) vs. isotropic (e.g. sand)) shape area # pixels set in binary image circularity perimeter 2 /area major axis orientation 1. compute 2nd order covariance matrix from boundary pixels 2. major axis orientation = direction of largest eigenvector eccentricity = (largest eigenvalue) / (smallest eigenvalue) algebraic moment invariances consider 18 features invariant to affine transformations predefined matrices compute first m central moments as eigenvalues of

40 sketch based on reduced resolution edge map: 1. convert color image to single band luminance 2. compute binary edge image 3. reduce edge image to thin reduced image Sample queries average color queries search for images/objects with similar color computed as weighted Euclidian distance in color space histogram color queries search for images with specified color distribution based on 256-element histogram: Q query histogram D image histogram Z element difference histogram: = Z Q D A symmetric color similarity matrix j)=1 d(ci;cj)=dmax a(i; ck kth color in histogram MTM color distance d(ci;cj) dmax maximum distance between any two colors jjrjj = similarity: Z T AZ

angles around object perimeter compute weighted Euclidian distance, weights are inverse variances of features query by sketch user draws dominant

42 texture queries user selects texture from a sampler compute weighted Euclidian distance in 3D texture space (coarseness, contrast, directionality) object shape user draws shape shape features: area, circularity, eccentricity, majoraxis-direction, object moments, tangent angles around object perimeter compute weighted Euclidian distance, weights are inverse variances of features query by sketch user draws dominant lines and edges 1. reduce user sketch to for each db image, correlate sketch with user sketch, based on edge/no edge comparison 3. compute correlation scores

43 4.2.3 IRIS semantic indexing of images 1. image analysis color contour texture 2. object recognition (a) basic objects: clouds, snow, water, sky, forest, grass, sand, stone (b) high-level objects: forestscene, skyscene, mountainscene, landscapescene,...

4.2.3.1 Image Analysis Color IRIS subdivides color space into about 20 different colors 1. subdivide image into nonoverlapping tiles 2. compute color histogram for each tile 3.

44 Image Analysis Color IRIS subdivides color space into about 20 different colors 1. subdivide image into nonoverlapping tiles 2. compute color histogram for each tile 3. most frequent color =: color of tile 4. join tiles with similar colors and compute circumscribing rectangle 5. compute attributes of color rectangles: position color color density (# tiles with color / # tiles in rectangle) color evidence original image size

45 color-based segmentation:... colour2 HOR=mid,VER=up,SIZ=XL,SHP=Rect,COL=BLUE, UL=0 1,LR=44 11,DEN= colour3 HOR=mid,VER=mid,SIZ=M,SHP=Rect,COL=BLUE, UL=15 10,LR=44 17,DEN= colour4 HOR=left,VER=mid,SIZ=XS,SHP=Quad,COL=BLUE, UL=1 11,LR=1 11,DEN=1 1 colour5 HOR=left,VER=mid,SIZ=XS,SHP=Rect,COL=BLUE, UL=3 11,LR=14 12,DEN= Texture consider local distribution and variation of grey values 1. compute normalized co-occurrence matrix p for 4 directions: 0,90,45, for each of the four directions, compute the following features from C: angular second moment contrast (local variations) correlation (linear relationship between pixel values) variance (deviation from the average) entropy 3. for each of the five parameters, compute the average from the values for the 4 directions (! invariance against rotation)

4. feed average values into neural network hidden-layer hidden-layer output-layer constrast asm variance correlation entropy forest input-layer gras sand water stone sky clouds ice 5.

46 4. feed average values into neural network hidden-layer hidden-layer output-layer constrast asm variance correlation entropy forest input-layer gras sand water stone sky clouds ice 5. NN yields texture for each tile 6. join tiles with identical textures and compute circumscribing rectangles 7. compute attributes of texture rectangles: position texture texture density (# tiles with texture / # tiles in rectangle)... texture3 HOR=mid,VER=mid,SIZ=L,SHP=Rect,TEX=ice, UL=2 2,LR=10 3,DEN=11 18 texture4 HOR=left,VER=mid,SIZ=S,SHP=Path,TEX=clouds, UL=0 3,LR=3 3,DEN=4 4 texture5 HOR=left,VER=mid,SIZ=S,SHP=Quad,TEX=stone, UL=4 3,LR=5 4,DEN=3 4 texture6 HOR=mid,VER=mid,SIZ=S,SHP=Rect,TEX=clouds, UL=5 3,LR=8 4,DEN= size

47 Contour basedongreylevelimage 1. gradient-based edge detection 2. determination of object contours 3. shape analysis: compute position of centroid size of region bound coordinates of region

48 Object Recognition 1. step from syntactical to semantical features: identification of primitive objects 2. derivation of higher-level semantical features identification of primitive objects for each feature, consider corresponding region form graph describing topological relationships between feature regions: node = feature edge = topological relationship: overlaps, meets, contains overlaps CL T contains T CL CT T meets CT T CT CL CT formulate graph grammar rules for detecting primitive objects Mountainlake Clouds Sky Mountain Texture Segment Lake Color Segment Contour Segment Conditions of "Clouds" predicate((valcompeq(*self(2,"colorseg","col"),"blue") valcompeq(*self(2,"colorseg","col"),"white")) && valcompeq(*self(2,"colorseg","ver"),"up")); predicate(nrkind(*self(1,"contourseg"),"contains",*self(1,"colorseg")) && nrkind(*self(1,"contourseg"),"contains",*self(1,"textureseg"))); Clouds Forest basis: color, texture and contour features

49 4.2.4 Photobook developed at MIT Media Lab goal: semantic retrieval of images based on semantics-preserving image compression types of descriptions: appearance (faces) shape texture Appearance based on eigenimage representations Training: Building Eigenrepresentations 1. preprocessing of input images: normalize w.r.t. position, scale, orientation 2. computation of eigenvectors of normalized image covariance for training images (faces) subregions of training images (eyes, nose, mouth)

50 mean and first few eigenvectors: Retrieval Γ: new image (region) 1. transform Γ into face space 2. retrieval based on similarity measure

4.2.4.2 Shape representation based on modelling of physical deformations finite element!

52 Shape representation based on modelling of physical deformations finite element! method stiffness! matrix eigenvectors Retrieval compute amount of energy needed to align object

4.2.4.3 Texture representation based on Wold decomposition for regular stochastic processes in 2D sum of three orthogonal components: ˆ= 1. harmonic field 2.

53 Texture representation based on Wold decomposition for regular stochastic processes in 2D sum of three orthogonal components: ˆ= 1. harmonic field 2. generalized-evanescent field 3. purely-indeterministic field retrieval 1. derive parameters of Wold decompositions 2. compute similarity of parameter vectors

55 4.3 Video QBIC Representation of video data 1. shot detection 2. creation of representative frame 3. identify moving structures/objects Shot detection set of frames grouped into shots because they depict same scene signify single camera operation contain distinct event/action Automatic shot detection 1. cuts: high pulses in the histogram, detected by single threshold 2. gradual transitions over a sequence of frames (a) low threshold for detecting possible transitions (b) compute accumulated differences of successive frames (c) shot boundary, if sum exceeds second theshold are chosen as single indexable unit

56 representative frame generation representative frames representative frame generation methods: random frame from a shot synthesized r-frames mosaicking all frames in a panning shot remove moving objects layered representation different layers used for identifying significant objects in the scene algorithm divides a shot into a number of layers, each with its own treated as still images in database population in retrieval returned for as answer representing shot 2D affine motion parameters region of support in each frame

4.3.2 ISS-NUS 4.3.2.1 Video Browsing 1. micons: icons for video content 2. hierarchical video magnifier 3.

midpoint selection of frame yields further division of corresponding segment representative frame for each clip micon = 3D

57 4.3.2 ISS-NUS Video Browsing 1. micons: icons for video content 2. hierarchical video magnifier 3. clipmaps Micons: icons for video content ˆ= scroll bar Hierarchical video magnifier segment represented by frame at its midpoint selection of frame yields further division of corresponding segment representative frame for each clip micon = 3D display of frame sequence horizontal / vertical slices of micons for browsing entire videos division into segments of equal length segment can be viewed via micon

58 Case study: news videos spatial structure temporal structure 4.4 Summary: media indexing and matching 1. exact match 2. inexact media match (irrespective of digitization parameters) 3. inexact media feature match 4. content-based match

59 Chapter 5 Conclusions Issues in MMIR syntactic (signal-based) vs. semantic (Symbolic) retrieval of MM objects 116 dealing with document structure IR models based on predicate logic

Multimedia Information Retrieval

Multimedia Information Retrieval Norbert Fuhr Tutorial @ HS-IR 98 Chapter 1 Introduction document structures and attributes media types terminology 1 Document structures and attributes IR networks heterogeneity