9 Video Retrieval Multimedia Databases 9 Video Retrieval 9.1 Hidden Markov Models (continued from last lecture) 9.2 Introduction into Video Retrieval Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 2 9.1 Hidden Markov Model A HMM has at any time additional timeinvariant observation probabilities A HMM consists of A homogeneous Markov process with state set Transition probabilities 9.1 Hidden Markov Model Start distribution Stochastic process of observations with basic sets And observation probabilities of observation o k in state q j Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 3 Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 4 9.1 HMM Example 0.8 0.2 Observations: 0.8 0.2 0.6 0.4 0.1 0.9 Observation probability Given the observation sequence and a fixed HMM λ How high is the probability that λ has generated the observation sequence? =? Important for selecting between different models Type 1 Type 2 Type 3 Type 4 Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 5 Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 6 1
Let Then: be a state sequence Furthermore is also valid And is valid for Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 7 Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 8 Thus the total probability for observation O is: Substituting in our previous results we obtain: Most probable state sequence Given the observation sequence and a fixed HMM λ What is the state sequence which generates the observation sequence o, with the highest probability? Maximum likelihood estimator: maximize Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 9 Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 10 Because we know that and that is constant for fixed sequences of observations, instead of maximizing we can also maximize Definition: Maximal for the most probable path leading to the state q i (at time t) is valid Therefore corresponds to a state sequence assuming that the occurrence of the observation sequence O, is the most likely Such a path can be constructed in steps by means of dynamic programming, via Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 11 Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 12 2
The corresponding algorithm is the Viterbi algorithm (Viterbi, 1967) Initial step: for set For inductively set Termination: Recursive path identification for probability p: for t [1 : T -1] set Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 13 Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 14 Given a fixed HMM, for each sequence of observations, the Viterbi algorithm provides a sequence of states which has most probably caused the observations (Maximum likelihood estimator) Problem: Transition-, observationand start probabilities are often unknown Idea: training the parameters of the HMM λ Given an observation sequence training sequence Task: determine the model parameters λ = (A, B, π) to maximize Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 15 Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 16 The training sequence should not be too short The maximization of the probability leads to a high-dimensional optimization problem Solved e.g., through the Baum-Welch algorithm, which calculates a local optimum (Baum and others, 1970) Baum-Welch algorithm: Begin with an initial estimate of parameters: either arbitrary or based on additional knowledge Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 17 Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 18 3
These statistics can be used for an iterative reestimation of the parameters Define forward variables: Then: and And backward variables: for and for is valid, for Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 19 Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 20 Then, the probability to be in q i at timet if o has been observed, is: And the probability to be at time t in state q i and at time t+1 in state q j is: (conditional probability) (conditional probability) Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 21 Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 22 Then, the expected value of the number of times, state q i was left from, is: The Baum-Welch algorithm then sets in the r-th iteration (r 0): And the expected value of the number of transitions from q i to q j is given by: with defined by,, Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 23 Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 24 4
,, are the initial values The new estimated values are defined by: = (#expected transitions from q i to q j )/ (#expected transitions from q i ) Where has the value 1 if state o k was observed at time t in the training sequence, and otherwise the value 0 If there are several training sequences, indicates the corresponding relative frequencies Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 25 Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 26 Now if we build for each parameter re-estimation HMM then we can shown that: Thus, models with newer estimates get better until a (at least local) maximum is reached 9.1 Applications of HMMs Back to music recognition Feature extraction tries to convert a signal into a string Encoding acoustic events Music signals are sequences of acoustic events Segment the audio file, and determine the acoustic events for each segment The implementation of any acoustic event can be described by a state sequence of a HMM Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 27 Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 28 9.1 Back to Music Recognition If there are several models for acoustic events, we can use a maximum likelihood estimator, to identify the model which generates the observation sequence with the highest overall probability 9.1 Idea Train H HMMs Training by manual (small H) or automatic (large H) mapping between acoustic events and segments (observation sequences) of a signal Each HMM represents a specific acoustic event Then determine the most probable producer and attribute to each segment, the corresponding event Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 29 Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 30 5
9.1 Training example 9.1 Training example Extracted segments are used as training examples for the HMMs of the corresponding events If the following feature sequence belongs to A, after appropriate quantization then this observation sequence will be used: Assignment of two segments to events A and B Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 31 Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 32 9.1 Often used HMMs Complete HMM (ergodic model); e.g., with 3 states: 9.1 Often used HMMs Left-right model (Bakis model); e.g., with 3 states Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 33 Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 34 9.1 Feature Extraction with HMMs Given: a sequence of observations from feature sequence of length T: Goal: find sequence so that feature subsequence is associated to HMM Example: given o 1, o 2, o 3, o 4, o 5 and 3 HMM models λ 1, λ 2, λ 3 (2, 1), (1, 3), (3, 4) which associates: λ 2 with(o 1, o 2 ) λ 1 with(o 3 ) and λ 3 with(o 4, o 5 ) 9.1 Realization Combine all H HMM graphs completely (with equal probability) Embedded in a macro-hmm We can start with any model and migrate to any other model Any sequence of acoustic events can be represented in the macro-hmm Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 35 Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 36 6
9.1 Realization The macro-hmm behaves like a normal HMM All possible events are equally represented in the macro-hmm With the Viterbi algorithm we can establish the most probable state sequence 9.1 Illustration Macro-HMM with as the graph corresponding to HMM H = 4 Example for H = 4 Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 37 Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 38 9.1 Illustration For each single HMM graph we need of course only to completely connect the states in the macro graphs, which may occur as a start and end states 9.1 Problems in Real Applications The data stream can be an infinite sequence feature, and it is not clear exactly when an event begins Build a sequence of sub-sequences and apply the Viterbi algorithm to each subsequence Unfortunately, with complexity of Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 39 Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 40 9.1 Problems in Real Applications How do we choose the best window size w for the sub-sequences? Choose w sufficiently large so that a higher number of HMM graphs can be traversed If while passing through the various paths of the macro-hmm with the Viterbi algorithm, a sufficiently large probability value occurs, then break the computation and return the HMMs we have traversed until then and the corresponding time points Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 41 9.1 HMM by example HMM with Matlab HMM generation Most probable state sequence Viterbi algorithm Training Baum-Welch algorithm 0.8 0.2 Observations: 0.2 0.6 O1 O2 O3 O4 Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 42 0.8 0.4 0.1 0.9 7
9.1 HMM by example Classify pig coughs into morbid or healthy by using HMM 9.1 HMM by example Extract features e.g., spectrogram of a pig cough Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 43 Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 44 9.1 HMM by example Find a HMM which represents a pig cough as good as possible 9.1 HMM by example Train (Baum-Welch algorithm) the HMM for different cough types (vary diseases), e.g., Pasteurella disease Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 45 Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 46 9.1 HMM by example When a pig coughs use the Viterbi algorithm, to establish if the pig is ill or not! 9.2 Video Retrieval Video data Increasingly important for exchanging information Illustrative clips (simulations, animations, etc.) Presentations, lectures Video conferencing... Particularly more frequently on the internet e.g., YouTube, video on demand,... Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 47 Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 48 8
9.2 Video Data Cautious estimates 6-7 million hours approximately (800 years) video are already available on the Internet (2006) In 2008 a YouTube search returned more than 80 million videos and 3 million user channels So, it's safe to say that the total number is currently well into 100 million videos Until 2010, video data will represent approximately 50% of the stored digital data volume Management of video data is among others a problem of scalability 9.2 Video Data Regarding video data, it is necessary to efficiently: Store it Make it accessible And be able to recover it Today's databases Blobs, smart blobs Retrieval on metadata Splitting into key-frames Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 49 Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 50 Example: IBM AIV Extenders for IBM DB2 UDB (Development now discontinued) Incorporating the QBIC prototype into a commercial database Description on the IBM Web site: DB2 Video Extender adds the power of video retrieval to SQL queries. You can integrate video data and traditional business data in a single query. For example, you can query a news database for video news clips about a specific subject, and list the playing time of each video clip. Then use the Video Extender to play the video clips. IBM (http://www.ibm.com) Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 51 Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 52 Example usage: For example, an advertising agency stores information about its campaigns in a DB2 database and uses DB2 AIV Extenders to store its print ads, broadcast and video ads. One SQL query can retrieve multimedia data, such as print or broadcast ads for a particular year or client, as well as related business data in the database. Using DB2 Video Extender, you can define new data types and functions for video data using DB2 Universal Database s built-in support for user-defined types and user-defined functions. Secure and recover video data. Video clips and their attributes that you store in a DB2 database are afforded the same security and recovery protection as traditional business data. IBM (http://www.ibm.com) IBM (http://www.ibm.com) Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 53 Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 54 9
Goal: allows you to store and query video data as easily as you can traditional data Import and export video clips and their attributes into and out of a database. When you import a video clip, the DB2 Video Extender stores and maintains video attributes such as frame rate, compression format, and number of video tracks. Query video clips based on related business data or by video attributes. You can search for video clips based on data that you maintain, such as a name, number, or description; or by data that the DB2 Video Extender maintains, such as the format of the video or the date and time that it was last updated. IBM (http://www.ibm.com) IBM (http://www.ibm.com) Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 55 Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 56 9.2 Video Retrieval Topics Play video clips. You can use the DB2 Video Extender to retrieve a video clip. You can then use the DB2 Video Extender to invoke your favorite video browser to play the video clip. The DB2 Video Extender supports a variety of video file formats, and can work with different file-based video servers. Main problems: continuous medium Composed of several streams Image stream with visual information (often different views / camera angle of a scene) Audio stream (usually more than 1, e.g., synchronous tracks on DVDs) Stream of text (subtitles, news flash...) IBM (http://www.ibm.com) Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 57 Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 58 9.2 Video Retrieval Topics 9.2 Video Retrieval Topics Main problems: organization Video is a structured medium in time and space Videos can not be seen as a set of individual frames, but as a document Video abstraction decomposes video clips into structured parts (visual table of contents) How do queries work in video retrieval? Specification of certain features in SQL (as in the IBM extender) Specification of semantic content? E.g., via keywords: news about politics Query by example? One possibility are sample images: Find all the movies with this actor Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 59 Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 60 10
9.2 Video Retrieval Topics 9.2 Retrieval Technology Retrieval technology: comprises all the other problem groups Image Retrieval for the description of independent images, key frames, etc. Audio Retrieval for the evaluation of the sound track, voice recognition, etc. Text retrieval for the search in any subtitles, summaries, transcriptions of the audio track, etc. However, these techniques can also be combined in the case of video data Person recognition using segmentation and detection of subtitles Assignment based on the actor's voice Classify objects by shape and speech information Detect exciting sports scenes by the audience's applause, etc. Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 61 Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 62 9.2 Other Features 9.2 Other Features Other features which don t occur in image, audio and text retrieval, affect the detection of the temporal behavior of objects in space Movement of objects (direction, speed), instead of simple recognition as in the image retrieval E.g., Car moves slowly from left to right or two people walk together Recognition of movement is normally done through the comparison of shapes in a sequence of images In one frame e.g., by edge detection Transforming shapes in successive images using translation, rotation and scaling If successful, the type of transformation provides information about the parameters of motion Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 63 Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 64 9.2 Other Features 9.2 Other Features The extraction of moving objects is supported in MPEG-4 encoded streams Separate compression of background and the (moving) foreground objects Since fore-/background elements change only little, we only need to detect shifts and possibly changes in the camera angle Detection of camera movement Changes in camera angle (zooming, fade in/out,...) Movement of the camera itself (e.g., through background analysis Recognition through various models for the individual effects Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 65 Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 66 11
9.2 Other Features Time/place relationships Object motion results in trajectories in time and space Intersection of trajectories (e.g., car accidents on the observed crossings) Comparisons between different media E.g., all the videos with a movement from the upper right corner to lower left corner 9.2 Result Presentation Retrieval of the best videos is very expensive, the quality is difficult to evaluate Retrieval result as a set of video abstractions Summary sequences provide an overview of the content, usually with annotated key frames Highlights are scene cuts of certain video passages (trailer) Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 67 Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 68 9.2 Application of Video Retrieval Personal news (all clips on interesting topics) Entertainment Automatic recognition of film genres (love story, action movie, comedy,...) Detection of advertising in TV Automatic recording of material from the television Next lecture Video Abstraction Shot Detection Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 69 Multimedia Databases Wolf-TiloBalke InstitutfürInformationssysteme TU Braunschweig 70 12