Temporal structure analysis of broadcast tennis video using hidden Markov models

Size: px
Start display at page:

Download "Temporal structure analysis of broadcast tennis video using hidden Markov models"

Transcription

1 Temporal structure analysis of broadcast tennis video using hidden Markov models Ewa Kijak a,b, Lionel Oisel a, Patrick Gros b a THOMSON multimedia S.A., Cesson-Sevigne, France b IRISA-CNRS, Campus de Beaulieu, Rennes, France ABSTRACT This work aims at recovering the temporal structure of a broadcast tennis video from an analysis of the raw footage. Our method relies on a statistical model of the interleaving of shots, in order to group shots into predefined classes representing structural elements of a tennis video. This stochastic modeling is performed in the global framework of Hidden Markov Models (HMMs). The fundamental units are shots and transitions. In a first step, colors and motion attributes of segmented shots are used to map shots into classes: game (view of the full tennis court) and not game (medium, close up views, and commercials). In a second step, a trained HMM is used to analyze the temporal interleaving of shots. This analysis results in the identification of more complex structures, such as first missed services, short rallies that could be aces or services, long rallies, breaks that are significant of the end of a game and replays that highlight interesting points. These higher-level unit structures can be used either to create summaries, or to allow nonlinear browsing of the video. Keywords: sport video analysis, structure analysis, Hidden Markov Model, macro-segmentation, highlights detection, video content analysis 1. INTRODUCTION Video classification and segmentation are fundamental steps for efficiently searching and browsing video content. Lowlevel visual features are largely used for indexing generic video contents, but are not sufficient to provide the semantic information that is meaningful to an end-user. When the indexing of videos is restricted to a given category, domainspecific knowledge about the processed content facilitates the recovery of higher-level indexing information. One domain-specific application is the detection and recognition of highlights in sport videos. Sport video analysis is motivated by the growing amount of archived sport video material. Broadcasters need detailed annotation of video contents to select relevant excerpts to be edited for summaries or magazines. At present, this logging task is performed manually by librarians. Domain-specific video indexing can be divided into 3 main research areas: genre classification, content analysis, and structure analysis. The goal of genre classification is to automatically classify TV broadcast into predetermined genres like commercials, news, sport, etc. For this general video classification, Hidden Markov Models (HMM) are widely used 1,. Content analysis usually aims at detecting specific events in a video. Domain knowledge and properties of lowlevel features are exploited for mapping low-level information extracted from video data to high-level concepts. Finally, structure analysis aims at highlighting the table-of-contents of videos within a given genre. The table-of-contents is obtained by finding the discontinuities of semantics in the video. It involves detecting the temporal boundaries of the coherent segments and identifying all segments of video according to predefined semantic categories. As not all of the content of a video is of interest, separating the process of structure parsing and event detection may enhance the indexing process, by first extracting the interesting segments, and then applying content analysis on them. The temporal structure of a video will vary from one video type to another, and prior knowledge of some general structure for the class of video under study is obviously useful in video structure parsing. In particular, not all video Correspondence: kijake@thmulti.com Storage and Retrieval for Media Databases 003, Minerva M. Yeung, Rainer W. Lienhart, Chung-Sheng Li, Editors, Proceedings of SPIE IS&T Electronic Imaging, SPIE Vol. 501 (003) 003 SPIE IS&T X/03/$

2 documents are highly structured. For example, in the movie category, it is generally admitted that the structure follows a hierarchical model similar to that of theatre plays. Structure analysis comes down to a segmentation into scenes, obtained by grouping shots with similar content. Most scene segmentation approaches attempt to merge similar and consecutive shots into scene. These time-constrained methods rely on visual similarities between shots in scene. News programs are much more structured, as they can be defined by an interleaving of anchorperson shots and news shots. A model-based approach based on color histograms and the spatial layouts of frames allows to parse a newscast video into anchorperson and news story scenes 3. In the domain of sport, a finite number of identified scenes, related to game phases, occur all along the video. In baseball, a scene can be defined as a pitching-batting cycle 4. Such a scene is made up of very different shots, making time-constrained scene segmentation approaches used for movies unsuitable. Inside the category of sport videos, a distinction should be made between time-constrained sports such as soccer, and score-constrained sports such as tennis or volleyball. Time-constrained sports have a relatively loose structure. The game can be decomposed into equal periods. During a period, the content flow is quite unpredictable. In the scoreconstrained sports, however, the content exhibits a strong hierarchical structure. For example, a tennis match can be broken down into sets, games and points. In this paper, we take advantage of the well-defined structure of tennis broadcast to parse the structure of tennis videos. Our goal is to take advantage of the available domain-specific knowledge of tennis videos to analyze these videos up to the level in the semantic scale where the structure can reliably be recovered. The structure identification is accomplished according to a model-based approach. The outline of our paper is as follows. Section provides an overview of previous work in the domain of sport video indexing. Section 3 describes the elements of the syntax of tennis videos. Section 4 presents our feature extraction method, followed in section 5 by a presentation of our tennis video structure analysis system. Experimental results on real broadcast tennis videos are given in section 6 to demonstrate the effectiveness of the proposed method.. RELATED WORK Sports video analysis is an area of research where domain-specific knowledge can significantly enhance the performance of indexing. Most of the existing work in this domain is related to content analysis. It focuses on the detection of interesting play events of sports video. A common approach in event detection consists in combining the extraction of low-level features with heuristic rules. Various low-level features such as color, edge, motion and audio features are used to extract mid-level information such as court lines, goal posts, players and ball position. This information is combined with heuristic rules to infer predetermined highlights. For example, after detecting tennis shots, Sudhir et al. 5 classify tennis shots into semantic categories such as baseline-rallies, passing-shots, net-games, and serve-andvolley games using a reasoning module that interprets the players positions relatively to the court lines. Using audio features, Rui et al. 6 employ speech endpoint detection, baseball hit detection, and detect the excitement of the reporter in his speech to infer important events in baseball video. However, no semantic analysis of the relevant events is carried out. Nepal 7 develops simple temporal models from heuristic rules using crowd cheer, scoreboard and change in camera direction detections, to find goal segments in a basketball video. All these works are more related to event detection than to structure analysis because they attempt to detect and identify segments of interest of a video, without analyzing the temporal relations between them. A structure analysis is done when all the shots of a video are identified according to predetermined classes. This classification could be relative to play location. Gong et al. 8 classify each shot of a soccer video into one of nine positions of play such as in the midfield, around the left penalty area, in the top-right corner area, etc First, the line mark patterns are identified, and camera motion, ball, and players are detected. Then, the classification is performed according to the physical location in the field or the presence/absence of the ball. Zhou et al. 9 encapsulate basketball knowledge information in an inductive decision tree. A rule-based classifier takes in input color, edge, and motion features and categorizes basketball video into left or right fast-break, left or right dunk, and close-up shots. In these works, no particular events are detected but each shot is identified by the location of play within the court or the field. Classification into different playing locations is adapted to sports in which play action cannot be recorded from a single point of view. 90 Proc. of SPIE Vol. 501

3 We do not attempt to analyze the content of a shot of interest to identify particular actions, as Sudhir et al. 5 do on tennis. Our aim is to highlight the temporal relations between all the shots in order to identify a global structure of the video. The input video is segmented and classified into predetermined game phases, such as first missed serve, rally, replay and break. The novelty of our approach lies in the use of statistical models to describe domain-specific rules, rather than heuristic inference engines. In a recent work, Xie et al. 10 investigate a similar approach to segment a soccer video into play/out of play categories. Taking advantage of the well-defined structure of tennis games, we aim at segmenting the video up to a much higher level in the semantic scale. 3. TENNIS VIDEO SYNTAX Tennis video is characterized by a typical production style, which we call the tennis video syntax. Tennis games are recorded from a fixed number of cameras. The point of view that gives the most relevant information is selected for broadcast. For example, during a rally, the content provided by the camera filming the whole court is selected (this kind of shots called global views are thus of much interest), and the player who has just carried out an action of interest is captured with a close-up. As close-up views never appear during a rally but right after or before it, global views are generally significant of a rally. Because of the presence of typical scenes and the finite number of views, the tennis video has a predictable temporal syntax. A game is usually followed by a long succession of close-up views or commercials. A first serve is a short global view closely followed by an other global view, whereas an ace is characterized by a short global view followed by a series of close-ups. Replays are notified to the viewers by insertion of special transitions. A closer observation of tennis video reveals that there are only a few main types of video shots that occur repeatedly throughout the video footage. In addition, each shot has a different meaning according to its context. For example, in tennis videos, a global view that appears after a dissolve transition is probably a replay, whereas a global view that appears shortly after a previous global view is probably a winning rally. These two observations have motivated the use of an HMM for modeling a tennis video sequence. We integrated a priori information by deriving syntactical basic elements from tennis video syntax and modeling each of these elements by a HMM. These models rely on the type of view for the shot, on the shot duration and on the temporal relationships between shots. The next section describes the method used to identify the type of view for the shots. It is based on color and motion attributes extracted from the raw video. The shot classification and the temporal analysis are performed independently, so that the temporal analysis gets rid of color attributes that may change from one sequence to another. 4. SHOT CHARACTERIZATION In this section, we present the different types of views present in a tennis video and the process used to classify each of them. These views can be divided into four principal classes: global, medium, close-up, and audience (see Figure 1). Figure 1. Four types of view in tennis video (left-to-right: global, medium, close-up, audience) This fine granularity classification is not necessary in our case. In a tennis video production, global views contain much of the pertinent information. The remaining information relies on the presence or the absence of secondary views but is independent of the type of these views. Our classification algorithm will thus consist of a binary classifier. The classification process will label the shots according to two classes: global views (GV) and non-global views (N). In the following subsections, we present the whole process starting from the initial video sequence up to the list of classified shots. After the presentation of a preprocessing step, our classification method is detailed and compared to existing methods. The resulting classification is exploited in section 5. Proc. of SPIE Vol

4 4.1. Preliminary video data processing To cope with broadcasters usage, only MPEG videos are considered. In preliminary video processing, video is segmented into shots. Our segmentation process can be divided into several steps. These steps are briefly described in the following: - Straight cut detection is performed by detecting rapid changes of the standard bin-wise difference between luminance histograms of DC pictures. - Gradual shot transitions detection is performed by the twin-comparison method 15, using a dual threshold that accumulates significant differences to detect gradual transitions. - The content of a detected shot is represented by a single keyframe (only one keyframe is sufficient to illustrate the whole content of a view). The keyframe is chosen as the I frame the closest to the frame with the lowest activity (activity is defined as the average magnitude of all the MPEG vectors associated with a given frame). 4.. Features description The pre-processing is now complete. A list of shots has been identified and associated keyframes have been extracted. Cuts and transitions are known. The following step consists of labeling extracted shots according to global views (GV) and non-global (N) label. Identifying the different types of view is a necessary step in any sports content analysis. Many works deal with shot classification in the context of sport videos. Classification processes are divided into two parts: features extraction and classification of these features. We now present the features usually used, the features we actually use and our classification algorithm. We can identify two main kinds of features. The first class relies on color-based features. Considering sport videos, a global view is actually characterized by a large region of homogeneous color (color of the play field). Most of the works thus use this basic feature to identify a shot label. In a baseball video, a pitching scene is detected by computing the difference between a candidate keyframe and a representative pitching image 4. The representative pitching image is manually extracted from the considered video. A color model can also be learned to further improve shot recognition. This can be expressed as an unique color model or using several models trying to capture all the kind of tennis courts 5, 11. Another article 9 proposes the extraction of edge features around the dominant color region in association with a rulebased classifier to classify keyframes into left court, right court, middle court and close-up. The second class consists of motion-based descriptors. For example, the variation and persistence of the estimated camera motion, as well as the number of intra-coded macroblocks in a MPEG video of basketball is used to classify wide-angle and close-up shots 1. To efficiently capture the frame contents, some approaches mix several descriptors. In the context of soccer video, the grass pixels ratio and motion intensity in a frame are relevant features to categorize each views 10. In most of the methods that use color information, the game field color has to be first evaluated, because it can largely vary from one video to another. Our approach tries to avoid the use of predefined field color to be able to automatically take into account a large type of videos. As presented in the introduction of this section, our goal is to separate global views from other types of view. Close-up and global views are characterized by homogeneous color content. Indeed, the dominant colors of a global view consist of the colors of the court and its surrounding, and the dominant colors of a close-up shot consist of the colors of the clothes and face of a player. However, medium and audience views are characterized by scattered color content. The color-based classification should be re-enforced using motion-based features. On one hand, a global view must capture at each time the main part of the court. On the other hand, in close-up views, the camera is generally tracking the player. The first class can thus be characterized by a small camera motion while the second implies important camera translations. Based on these observations, we choose two features to identify game shots: activity that reflects camera motion during a shot and color. Rather than color histogram, we use a global descriptor of dominant colors that is more compact. In addition, dominant colors vectors capture the most significant color information of a frame and are less noise sensitive. 9 Proc. of SPIE Vol. 501

5 Let F be a vector of N dominant colors and p the percentage of each color with respect to the whole associated frame. The colors of the original images are quantized into N values using a k-means clustering algorithm. Neighboring dominant colors are merged when their distance are less than a predefined threshold T d. The goal is to ensure that the N dominant colors are perceptually different. According to MPEG-7 normalization, the similarity between two dominant colors features F 1 and F can be then measured by the following simplified quadratic distance function D(F 1, F ): 1 1 p1 i + p j D ( F, F ) = a p p (1) 1 N N i= 1 j = 1 i= 1 N N j= 1 1i, j 1i j where a k,l is the similarity coefficient between two colors c k and c l, a k, l 1 d k, l / d = 0 max d d k, l k, l T d > T d () T d is the maximum distance for two similar colors, d max = αt d, and d k,l is the Euclidean distance between two colors c k and c l defined as follows: d = c c k, l k l (3) To take into account the spatial configuration of similar color pixels, a confidence measure CM is associated to each dominant color feature. A pixel of color C i is considered to be coherent if all the pixels in its neighborhood have the same color. As a result, the confidence measure CM for the dominant color feature F is defined as: CM = N Number _ of _ coherent _ pixels _ Ci Total number of pixel C i= 1 i p i (4) The spatial coherency of dominant colors represented by the confidence measure as well as respective activity A 1 and A are taken into account in the final distance function between two dominant colors descriptors D 1 and D : Diff ( D = A (5) 1, D ) W1 CM1 CM + W D( F1, F ) + W3 ( A1 ) where W 1, W, and W 3 are three weighted coefficients. In our implementation, we use 4 dominant colors (N=4) to characterize the most significant color information in the game field. The weighting coefficients are set as follows: W 1 =0,, W =0,5, and W 3 =0,3. Colors are represented in YCbCr color space. In the simplified quadratic distance function, the luminance component is not taken into account thus effectively eliminating illumination variations. Figure. An example of four dominant colors extraction Proc. of SPIE Vol

6 4.3. Game shot identification Our goal is to identify global views from all extracted keyframes without making any assumption about the playing area color. Our method can thus get rid of the different types of tennis court (carpet, clay, hard or grass). Analyzing several hours of tennis video reveals that in a video, GV keyframes represent only 0% to 30% of all extracted keyframes including commercials. However, it is quite easy to distinguish global views from medium and audience views. To do so, we consider dominant colors ratios. As it previously noted, color contents in medium and audience views are more scattered than in close-up and global views. Considering that a global view is mainly composed of the playing area, we assume that the percentage of the main dominant color is greater than 50%. We reduce the set of candidate keyframes by discarding keyframes whose highest percentage of dominant color is less than 50%. In the resulting subset of K images, GV keyframes represent more than 50% of the data (most of medium and audience views have been discarded). The main problem thus remains the distinction between close-up and global views. In other words, the problem is now reduced to an identification of inliers datapoints i.e. GV keyframes, in the presence of many data outliers (N keyframes). First, we select a keyframe that is representative of a global view. In a random selection of p keyframes, we choose a keyframe by least median square method. The number p of samples is chosen in a way that the probability P of finding a representative GV keyframe is greater than 99%. The expression for p is given by 13 : log(1 P) p = q (6) log(1 (1 ε ) ) where ε is the fraction of outlier data, and q the number of features in each sample. Once a GV keyframe is found, outliers are removed. The set of candidate keyframes is reduced again by keeping keyframes whose distance is lower than the median distance previously found. The LMS is iterated on this new subset to select a reference keyframe K ref. K ref should be one of the best representative of GV keyframes. Assuming that the distribution of distances from all the keyframes to K ref can be modeled by a gaussian function, a keyframe K i is labeled as a GV-keyframe if: Diff ( D ref, D i ) 1,96 τ (7) 5.1. Hidden Markov models 5. STRUCTURE ANALYSIS A HMM is a Markov chain whose state sequence cannot be observed directly, but rather through a sequence of observation vectors. Each observation vector corresponds to an underlying state with an associated probability distribution. A discrete hidden Markov model is defined by a set of states, a set of state transition probabilities, a set of output symbols, and a probability distribution of output symbols on each state. Formally, for a N-states discrete HMM with an alphabet of M symbols and an observation sequence of length T, we have the following notations: S = {S 1, S,, S N } denotes the individual states V = {V 1, V,, V M } denotes the distinct observation symbols in observation space Q = {Q 1, Q,, Q T } is the state sequence O = {O 1, O,, O T } is the observation sequence The state transition probabilities distribution between states is represented by a matrix A={A ij }, where a ij = Pr(Q t+1 = S j Q t = S i ), and the observation symbol probability distribution is represented by a matrix B={b j (k)}, where b j (k) = Pr(O t = V k Q t = S j ) is the probability of observing V k when the current state is Q j. Initial state distribution denoted by π=pr(q 1 = S j ) contains the probabilities of the model being in state i at time t=1. For convenience we use λ = {A, B, π} to indicate the model parameters. 94 Proc. of SPIE Vol. 501

7 5.. Model description We define four structural elements in a tennis video game: first missed serve and rally, rally, replay and break. Structural elements are modeled by to 5-states left right HMM. The construction of the HMMs takes domain-knowledge derived from tennis syntax into account as follows: - In a broadcast video, the producers notify the viewers that a replay being display by inserting special transitions - A first missed serve is a global view of short duration following by close-up views of short duration too (as the players do not have to change their positions) and following by an other global view - A break is characterized by an important succession of close-up views, public views and advertisements. This set of consecutive shots has a particular long duration. It appears when players change ends, generally every two games. The type of view, the shot duration and their temporal relations are of first importance in the discrimination of the structural elements. The activity of a shot is not a discriminatory feature: as it represents an average quantity of motion over a shot, it has quite the same value for one type of view. Consequently, it cannot help to distinguish one global view from another. Each state S i models either segments of the video within a single shot, or dissolve transition between shots. A cut transition is not considered as a state; it is implicitly taken into account in the shot state. Each state of the HMM has one observation symbol V k, which can be one label of an alphabet of 3 symbols {GV, N, D}. Each symbol represents the class of the shot: GV for global view, N for non-global, and D for dissolve transition. In addition, a shot duration model is associated with each state. The shot duration is modeled by either a single Gaussian, a mixture of Gaussians, or an histogram. Figure 3 shows the HMM models corresponding to the four structural elements. In the HMM models for a first missed serve and rally, and for a rally, the last states are two distinct GV states. One represents a rally containing only a serve (that is just returned or not). The other characterizes a rally containing significant strokes (more than two exchanges). These two GV states are differentiated by their shot duration distributions. Concerning the N-states, the shot duration is cumulated for a group of consecutive N states. Indeed, the shot duration of a non-global view is not a relevant feature. Whereas the cumulative duration of consecutive non-global views indicates the time interval between two global view. The observation sequence O consists of a sequence of shots labeled according to the previous classification step, and their respective duration. Then, given an observation O t with associated label L t and duration D t, and a state S j with observation symbol V k, the probability of the observation O t to be in state S j at t is: b O ) = Pr ( L / Q = S ) Pr ( D / Q = S ) (8) j ( t t t j t t j where: - Pr ( D t / Qt = S j ) is given by the probability distribution of the shot duration in state S j - 1 if Lt = Vk Pr ( Lt / Qt = S j ) = ε otherwise Proc. of SPIE Vol

8 Figure 3. (a) HMM model for a first missed serve and rally (b) HMM model for a rally (c) HMM model for a replay (d) HMM model for a break GV stands for Global View, N for Non-global view, and D for Dissolve transition The HMM process is divided into two steps: training and classification. In the training step, the parameters of the HMM, namely the transition probabilities A and the probability distribution of the shot duration, are estimated. The observation parameters B are not estimated, since the observation symbol of each state is fixed manually according to a priori knowledge. As a result, the probability of observing V k in the state S j is quite binary. This is a very hard constraint in the classification process. It comes from the fact that classification of shots into global view, non-global view, and dissolve transition results from a previous step. An alternative approach consists in integrating this classification step in the HMM by adding color descriptors and activity models to each state. However it supposes a re-estimation of the observation probabilities B for each different play area. In the classification step, the most likely sequence of states according to a given sequence of observations is computed. In other words, we have to find the state sequence Q that maximizes Pr(O/Q, λ). Segmentation and classification of the whole observed sequence into the different structural elements are performed simultaneously. Segmentation of a video sequence into more macroscopic time objects than shots is also called macrosegmentation. Classification results are given by the likelihood of each model for every segment. Macro-segmentation relies on long-time correlation of structural elements. To take into account the long-term structure of a tennis game, the four HMMs are connected in a higher level HMM. This higher level HMM is obtained by the concatenation of the previous HMMs. It is represented in Figure 4. This level reflects the structure of a tennis game in terms of points. A point correspond to a winner rally, that is to say almost all rallies except first missed serves. Replays happen at the end of a point and a break happen at the end of at least ten consecutive points. This last rule is represented by a low transition probability between point and break. The long-time correlation avoids the apparition of an interleaving of points and breaks. It prevents also two breaks from being consecutive. 96 Proc. of SPIE Vol. 501

9 Figure 4: Higher level HMM model for long-time correlation of sub-hmms 5.3. Training and classification The sequence observation vectors O is extracted from the video shots. Data for training consist of labeled shots computed for a collection of videos. Models are trained by determining manually the state alignment of the training data before re-estimating the parameters. Each model is trained separately using only observation vectors corresponding to the specific structural element the model should represent. Transition probabilities between the HMMs are estimated by counting. Once all the HMMs {λ i, i=1,,5} and the higher level HMM are trained, we use the Viterbi algorithm 14 to get the optimal class sequence for a given observation sequence O = {O 1, O,, O T }. 6. EXPERIMENTAL RESULTS Our experiments were performed on real broadcast tennis videos produced by different broadcasters: 3 videos of the Roland Garros tournament (RG1, RG, and RG3) and videos of the US Open (US1, US, and US3). These videos contain different editing styles. A test database characterized by a significant variety of editing styles is of first importance to be able to efficiently test the robustness of our system. Each MPEG- video is about 1 hour long. We use videos for the training set and the 3 others for evaluating our system. Experimental results on shot classification, macro-segmentation and highlights classification are shown in Table 1. The macro-segmentation accuracy is defined as the number of correctly classified shots according to basic structural elements over the total number of shots. Precision and recall are given for five identified tennis highlights that are: first missed serve, short rally, rally, replay and break. Precision is the ratio of correctly identified shots to the total number of identified shots. Recall is the ratio of the number of correctly identified shots to the total of relevant shots. Video Shot classification RG1 RG US1 US Macro-segmentation accuracy ,85 Highlight type recall precision recall precision recall precision recall precision missed serve short rally rally replay break Table 1. Classification and macro-segmentation results Proc. of SPIE Vol

10 Our initial results are satisfactory: the experimental results of using the trained HMM to segment a new set of tennis video into the four basic structural elements give global accuracies from 77 % to 90 % (see Table1 for details). This result proves the robustness of our classification scheme to heterogeneous video content (i.e. different editing rules, different types of court, ). Typical errors in highlights classification are due to model ambiguities. These ambiguities mainly relies on the fact that shot durations do not always reflect the semantic state of the shot. For example, the distinction between a short rally and a rally is only performed according to their respective shot duration distributions. In the learning set, a global view is considered as a short rally when it represents an ace or a serve plus a return of serve, without regarding the shot duration. An overlap thus appears in the shot duration distribution, because the duration of a short rally can be equivalent to the duration of a rally containing only three or four strokes. Such a confusion could be eliminated by a further content analysis of shots. Break states suffer from the same type of confusion. A set of consecutive close-up shots with a particular long duration can appear for example when there is a discussion between one player and the umpire. This introduces confusion in break states detection and explains the low precision rate of break states. In some cases, the ambiguities due to probability distribution can be removed. For example, first missed serve and short rally have almost the same shot duration distribution. They are discriminated by the context i.e. the previous and the following state. The good precision rate for missed serve proves the validity of our approach. Replay states benefit from a hard constraint on the presence of dissolve transitions. They are correctly identified. False alarms however occur when a dissolve transition is employed for example to end a break. Another source of errors relies on the non-respect of the assumption that a global view represents a rally. During a break, it can happen that a global view was displayed to show the status of the court. Because we consider the cumulated shot duration over successive non-global views, the shot duration for non-global views are then cumulated before and after such an occurrence of a global view. This leads to a group of two non-global views with a cumulated shot duration that is not necessary significant of a break, separate by a global view. Finally, the last source of errors comes from less frequent events that are not explicitly taken into account in the model. We have tried more complex basic structural elements, which include more configurations (for example possible dissolve transitions in a break or repetition of let services). These models did not give significant improvement due to the introduction of new ambiguous states. 7. CONCLUSION AND FUTURE WORK In this paper, we have proposed a statistical approach for tennis video macro-segmentation and classification. Based on domain-knowledge, we have defined four basic structural elements of a tennis game on which the structure analysis is based. Each element is modeled using a HMM, and the four resulting HMMs are connected in a higher level HMM. These four elements are also interesting while they infer the following highlights in a further event detection process: a replay happens just after a rally of interest have been played; a first missed serve is not of interest and should not be taken into account in a further analysis; a short rally may include an ace; a break indicates that a game or a set ended, when the players change ends. The results reported in this preliminary work are promising about modeling a hierarchical structure of a scoreconstrained sport by HMM. Future work will include a much more complex model that takes into account the entire hierarchical structure of a tennis game. It should lead to a higher level up in the semantic analysis of the temporal structure. We are also currently investigating the improvement of the performance by adding complementary information provided by audio. Since this paper was written, two recent papers have been published that use statistical approaches to classify sport highlights respectively for a soccer game 16 and a baseball game 17. HMMs are used for highlight classification, however macro-segmentation was not performed. Nevertheless, these works confirm the interest of using HMM in domainspecific applications. 98 Proc. of SPIE Vol. 501

11 ACKNOWLEDGMENTS The authors would like to thank Guillaume Gravier, from IRISA-CNRS, for his help and advice about Hidden Markov Models. REFERENCES 1. N. Dimitrova, L. Agnihotri, and G. Wei, "Video classification based on HMM using text and faces", European Conference on Signal Processing, J. Huang, Z. Liu, and Y. Wang, "Joint video scene segmentation and classification based on hidden Markov model", Proc. of the IEEE Int l Conference on Multimedia and Expo, pp , H.J. Zhang, S.Y. Tan, S.W. Smoliar, and G. Yihong, Automatic parsing and indexing of news video, Multimedia Systems, v, pp , T.Kawashima, K. Tateyama, T.Iijima, and Y. Aoki, "Indexing of baseball telecast for content-based video retrieval", International Conference on Image Processing, G. Sudhir, J.C.M. Lee, and A.K. Jain, "Automatic classification of tennis video for high-level content-based retrieval", Proc. of the Int l. Workshop on Content-Based Access of Image and Video Databases (CAIVD '98), Y. Rui, A. Gupta, and A. Acero, "Automatically extracting highlights for TV baseball programs", Proc. of ACM Multimedia Conference, S. Nepal, U. Srinivasan, and G. Reynolds, "Automatic detection of goal segments in basketball videos", Proc. of ACM Multimedia Conference, pp , Y. Gong, L.T. Sin, C.H. Chuan, H. Zhang, and M. Sakauchi, "Automatic parsing of TV soccer programs", Proc. of Int'l Conference on Multimedia Computing and Systems (ICMCS '95), pp , W. Zhou, A. Vellaikal, and C.-C. J. Kuo, "Rule-based video classification system for basketball video indexing", Proc. of ACM International Multimedia Conference, pp , L. Xie, S-F. Chang, A. Divakaran, and H. Sun, "Structure analysis of soccer video with hidden Markov models", IEEE Int l Conference on Acoustics, Speech, and Signal Processing (ICASSP '0), D. Zhong, and S.F. Chang, "Structure analysis of sports video using domain models", IEEE Conference on Multimedia and Expo, Y.P. Tan, D.D. Saur, S.R. Kulkarni, and P. J. Ramadge, "Rapid estimation of camera motion from compressed video with application to video annotation", IEEE Trans. on Circuits and Systems for video Technology, v10(1), pp , P.J. Rousseeuw, Robust regression and outlier detection, Wiley, New York, L.R. Rabiner, "A tutorial on hidden Markov models and selected applications in speech recognition", Proc. of the IEEE, v77(), pp , H.J Zhang, A. Kankanhalli, S.W. Smoliar, Automatic partitioning of full-motion video, Multimedia Systems, v1(1), pp 10-8, J. Assfalg, M. Bertini, A. Del Bimbo, W. Nunziati, and P. Pala, "Soccer highlights detection and recognition using HMMs", IEEE Int l Conference on Multimedia and Expo (ICME '0), P. Chang, M. Han, and Y. Gong, "Extract highlights from baseball game video with hidden Markov models", Proc. of IEEE Int l Conference on Image Processing (ICIP '0), 00. Proc. of SPIE Vol

Baseball Game Highlight & Event Detection

Baseball Game Highlight & Event Detection Baseball Game Highlight & Event Detection Student: Harry Chao Course Adviser: Winston Hu 1 Outline 1. Goal 2. Previous methods 3. My flowchart 4. My methods 5. Experimental result 6. Conclusion & Future

More information

Real-Time Content-Based Adaptive Streaming of Sports Videos

Real-Time Content-Based Adaptive Streaming of Sports Videos Real-Time Content-Based Adaptive Streaming of Sports Videos Shih-Fu Chang, Di Zhong, and Raj Kumar Digital Video and Multimedia Group ADVENT University/Industry Consortium Columbia University December

More information

Algorithms and System for High-Level Structure Analysis and Event Detection in Soccer Video

Algorithms and System for High-Level Structure Analysis and Event Detection in Soccer Video Algorithms and Sstem for High-Level Structure Analsis and Event Detection in Soccer Video Peng Xu, Shih-Fu Chang, Columbia Universit Aja Divakaran, Anthon Vetro, Huifang Sun, Mitsubishi Electric Advanced

More information

Motion analysis for broadcast tennis video considering mutual interaction of players

Motion analysis for broadcast tennis video considering mutual interaction of players 14-10 MVA2011 IAPR Conference on Machine Vision Applications, June 13-15, 2011, Nara, JAPAN analysis for broadcast tennis video considering mutual interaction of players Naoto Maruyama, Kazuhiro Fukui

More information

A Robust Wipe Detection Algorithm

A Robust Wipe Detection Algorithm A Robust Wipe Detection Algorithm C. W. Ngo, T. C. Pong & R. T. Chin Department of Computer Science The Hong Kong University of Science & Technology Clear Water Bay, Kowloon, Hong Kong Email: fcwngo, tcpong,

More information

A Unified Framework for Semantic Content Analysis in Sports Video

A Unified Framework for Semantic Content Analysis in Sports Video Proceedings of the nd International Conference on Information Technology for Application (ICITA 004) A Unified Framework for Semantic Content Analysis in Sports Video Chen Jianyun Li Yunhao Lao Songyang

More information

TEVI: Text Extraction for Video Indexing

TEVI: Text Extraction for Video Indexing TEVI: Text Extraction for Video Indexing Hichem KARRAY, Mohamed SALAH, Adel M. ALIMI REGIM: Research Group on Intelligent Machines, EIS, University of Sfax, Tunisia hichem.karray@ieee.org mohamed_salah@laposte.net

More information

Multi-level analysis of sports video sequences

Multi-level analysis of sports video sequences Multi-level analysis of sports video sequences Jungong Han a, Dirk Farin a and Peter H. N. de With a,b a University of Technology Eindhoven, 5600MB Eindhoven, The Netherlands b LogicaCMG, RTSE, PO Box

More information

Video Summarization Using MPEG-7 Motion Activity and Audio Descriptors

Video Summarization Using MPEG-7 Motion Activity and Audio Descriptors Video Summarization Using MPEG-7 Motion Activity and Audio Descriptors Ajay Divakaran, Kadir A. Peker, Regunathan Radhakrishnan, Ziyou Xiong and Romain Cabasson Presented by Giulia Fanti 1 Overview Motivation

More information

Automatic Video Caption Detection and Extraction in the DCT Compressed Domain

Automatic Video Caption Detection and Extraction in the DCT Compressed Domain Automatic Video Caption Detection and Extraction in the DCT Compressed Domain Chin-Fu Tsao 1, Yu-Hao Chen 1, Jin-Hau Kuo 1, Chia-wei Lin 1, and Ja-Ling Wu 1,2 1 Communication and Multimedia Laboratory,

More information

Title: Pyramidwise Structuring for Soccer Highlight Extraction. Authors: Ming Luo, Yu-Fei Ma, Hong-Jiang Zhang

Title: Pyramidwise Structuring for Soccer Highlight Extraction. Authors: Ming Luo, Yu-Fei Ma, Hong-Jiang Zhang Title: Pyramidwise Structuring for Soccer Highlight Extraction Authors: Ming Luo, Yu-Fei Ma, Hong-Jiang Zhang Mailing address: Microsoft Research Asia, 5F, Beijing Sigma Center, 49 Zhichun Road, Beijing

More information

Recall precision graph

Recall precision graph VIDEO SHOT BOUNDARY DETECTION USING SINGULAR VALUE DECOMPOSITION Λ Z.»CERNEKOVÁ, C. KOTROPOULOS AND I. PITAS Aristotle University of Thessaloniki Box 451, Thessaloniki 541 24, GREECE E-mail: (zuzana, costas,

More information

Searching Video Collections:Part I

Searching Video Collections:Part I Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia Representation Visual Features (Still Images and Image Sequences) Color Texture Shape Edges Objects, Motion

More information

Multimedia Databases. Wolf-Tilo Balke Younès Ghammad Institut für Informationssysteme Technische Universität Braunschweig

Multimedia Databases. Wolf-Tilo Balke Younès Ghammad Institut für Informationssysteme Technische Universität Braunschweig Multimedia Databases Wolf-Tilo Balke Younès Ghammad Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Previous Lecture Audio Retrieval - Query by Humming

More information

Multimedia Databases. 9 Video Retrieval. 9.1 Hidden Markov Model. 9.1 Hidden Markov Model. 9.1 Evaluation. 9.1 HMM Example 12/18/2009

Multimedia Databases. 9 Video Retrieval. 9.1 Hidden Markov Model. 9.1 Hidden Markov Model. 9.1 Evaluation. 9.1 HMM Example 12/18/2009 9 Video Retrieval Multimedia Databases 9 Video Retrieval 9.1 Hidden Markov Models (continued from last lecture) 9.2 Introduction into Video Retrieval Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme

More information

A Miniature-Based Image Retrieval System

A Miniature-Based Image Retrieval System A Miniature-Based Image Retrieval System Md. Saiful Islam 1 and Md. Haider Ali 2 Institute of Information Technology 1, Dept. of Computer Science and Engineering 2, University of Dhaka 1, 2, Dhaka-1000,

More information

Highlights Extraction from Unscripted Video

Highlights Extraction from Unscripted Video Highlights Extraction from Unscripted Video T 61.6030, Multimedia Retrieval Seminar presentation 04.04.2008 Harrison Mfula Helsinki University of Technology Department of Computer Science, Espoo, Finland

More information

An Automatic Video Classification System Based on a

An Automatic Video Classification System Based on a International Journal of Smart Engineering System Design, 2002 An Automatic Video Classification System Based on a Combination of Cheng Lu, Mark S. Drew, and James Au School of Computing Science Simon

More information

Video shot segmentation using late fusion technique

Video shot segmentation using late fusion technique Video shot segmentation using late fusion technique by C. Krishna Mohan, N. Dhananjaya, B.Yegnanarayana in Proc. Seventh International Conference on Machine Learning and Applications, 2008, San Diego,

More information

Trademark Matching and Retrieval in Sport Video Databases

Trademark Matching and Retrieval in Sport Video Databases Trademark Matching and Retrieval in Sport Video Databases Andrew D. Bagdanov, Lamberto Ballan, Marco Bertini and Alberto Del Bimbo {bagdanov, ballan, bertini, delbimbo}@dsi.unifi.it 9th ACM SIGMM International

More information

CONTENT analysis of video is to find meaningful structures

CONTENT analysis of video is to find meaningful structures 1576 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 18, NO. 11, NOVEMBER 2008 An ICA Mixture Hidden Markov Model for Video Content Analysis Jian Zhou, Member, IEEE, and Xiao-Ping

More information

Key-frame extraction using dominant-set clustering

Key-frame extraction using dominant-set clustering University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2008 Key-frame extraction using dominant-set clustering Xianglin Zeng

More information

Video Key-Frame Extraction using Entropy value as Global and Local Feature

Video Key-Frame Extraction using Entropy value as Global and Local Feature Video Key-Frame Extraction using Entropy value as Global and Local Feature Siddu. P Algur #1, Vivek. R *2 # Department of Information Science Engineering, B.V. Bhoomraddi College of Engineering and Technology

More information

Highlight Ranking for Broadcast Tennis Video Based on Multi-modality Analysis and Relevance Feedback

Highlight Ranking for Broadcast Tennis Video Based on Multi-modality Analysis and Relevance Feedback Highlight Ranking for Broadcast Tennis Video Based on Multi-modality Analysis and Relevance Feedback Guangyu Zhu 1, Qingming Huang 2, and Yihong Gong 3 1 Harbin Institute of Technology, Harbin, P.R. China

More information

MULTIMODAL BASED HIGHLIGHT DETECTION IN BROADCAST SOCCER VIDEO

MULTIMODAL BASED HIGHLIGHT DETECTION IN BROADCAST SOCCER VIDEO MULTIMODAL BASED HIGHLIGHT DETECTION IN BROADCAST SOCCER VIDEO YIFAN ZHANG, QINGSHAN LIU, JIAN CHENG, HANQING LU National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of

More information

Story Unit Segmentation with Friendly Acoustic Perception *

Story Unit Segmentation with Friendly Acoustic Perception * Story Unit Segmentation with Friendly Acoustic Perception * Longchuan Yan 1,3, Jun Du 2, Qingming Huang 3, and Shuqiang Jiang 1 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing,

More information

A Rapid Scheme for Slow-Motion Replay Segment Detection

A Rapid Scheme for Slow-Motion Replay Segment Detection A Rapid Scheme for Slow-Motion Replay Segment Detection Wei-Hong Chuang, Dun-Yu Hsiao, Soo-Chang Pei, and Homer Chen Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan 10617,

More information

Semantic Event Detection in Sports through Motion Understanding.

Semantic Event Detection in Sports through Motion Understanding. Semantic Event Detection in Sports through Motion Understanding. N. Rea, R. Dahyot and A. Kokaram Electronic and Electrical Engineering Department, University of Dublin, Trinity College Dublin, Ireland.

More information

PixSO: A System for Video Shot Detection

PixSO: A System for Video Shot Detection PixSO: A System for Video Shot Detection Chengcui Zhang 1, Shu-Ching Chen 1, Mei-Ling Shyu 2 1 School of Computer Science, Florida International University, Miami, FL 33199, USA 2 Department of Electrical

More information

Learning based face hallucination techniques: A survey

Learning based face hallucination techniques: A survey Vol. 3 (2014-15) pp. 37-45. : A survey Premitha Premnath K Department of Computer Science & Engineering Vidya Academy of Science & Technology Thrissur - 680501, Kerala, India (email: premithakpnath@gmail.com)

More information

Video Syntax Analysis

Video Syntax Analysis 1 Video Syntax Analysis Wei-Ta Chu 2008/10/9 Outline 2 Scene boundary detection Key frame selection 3 Announcement of HW #1 Shot Change Detection Goal: automatic shot change detection Requirements 1. Write

More information

An Introduction to Content Based Image Retrieval

An Introduction to Content Based Image Retrieval CHAPTER -1 An Introduction to Content Based Image Retrieval 1.1 Introduction With the advancement in internet and multimedia technologies, a huge amount of multimedia data in the form of audio, video and

More information

CORRELATION BASED CAR NUMBER PLATE EXTRACTION SYSTEM

CORRELATION BASED CAR NUMBER PLATE EXTRACTION SYSTEM CORRELATION BASED CAR NUMBER PLATE EXTRACTION SYSTEM 1 PHYO THET KHIN, 2 LAI LAI WIN KYI 1,2 Department of Information Technology, Mandalay Technological University The Republic of the Union of Myanmar

More information

Generation of Sports Highlights Using a Combination of Supervised & Unsupervised Learning in Audio Domain

Generation of Sports Highlights Using a Combination of Supervised & Unsupervised Learning in Audio Domain MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Generation of Sports Highlights Using a Combination of Supervised & Unsupervised Learning in Audio Domain Radhakrishan, R.; Xiong, Z.; Divakaran,

More information

Title: Automatic event detection for tennis broadcasting. Author: Javier Enebral González. Director: Francesc Tarrés Ruiz. Date: July 8 th, 2011

Title: Automatic event detection for tennis broadcasting. Author: Javier Enebral González. Director: Francesc Tarrés Ruiz. Date: July 8 th, 2011 MASTER THESIS TITLE: Automatic event detection for tennis broadcasting MASTER DEGREE: Master in Science in Telecommunication Engineering & Management AUTHOR: Javier Enebral González DIRECTOR: Francesc

More information

Video De-interlacing with Scene Change Detection Based on 3D Wavelet Transform

Video De-interlacing with Scene Change Detection Based on 3D Wavelet Transform Video De-interlacing with Scene Change Detection Based on 3D Wavelet Transform M. Nancy Regina 1, S. Caroline 2 PG Scholar, ECE, St. Xavier s Catholic College of Engineering, Nagercoil, India 1 Assistant

More information

The ToCAI Description Scheme for Indexing and Retrieval of Multimedia Documents 1

The ToCAI Description Scheme for Indexing and Retrieval of Multimedia Documents 1 The ToCAI Description Scheme for Indexing and Retrieval of Multimedia Documents 1 N. Adami, A. Bugatti, A. Corghi, R. Leonardi, P. Migliorati, Lorenzo A. Rossi, C. Saraceno 2 Department of Electronics

More information

A Geometrical Key-frame Selection Method exploiting Dominant Motion Estimation in Video

A Geometrical Key-frame Selection Method exploiting Dominant Motion Estimation in Video A Geometrical Key-frame Selection Method exploiting Dominant Motion Estimation in Video Brigitte Fauvet, Patrick Bouthemy, Patrick Gros 2 and Fabien Spindler IRISA/INRIA 2 IRISA/CNRS Campus Universitaire

More information

NOVEL APPROACH TO CONTENT-BASED VIDEO INDEXING AND RETRIEVAL BY USING A MEASURE OF STRUCTURAL SIMILARITY OF FRAMES. David Asatryan, Manuk Zakaryan

NOVEL APPROACH TO CONTENT-BASED VIDEO INDEXING AND RETRIEVAL BY USING A MEASURE OF STRUCTURAL SIMILARITY OF FRAMES. David Asatryan, Manuk Zakaryan International Journal "Information Content and Processing", Volume 2, Number 1, 2015 71 NOVEL APPROACH TO CONTENT-BASED VIDEO INDEXING AND RETRIEVAL BY USING A MEASURE OF STRUCTURAL SIMILARITY OF FRAMES

More information

Short Survey on Static Hand Gesture Recognition

Short Survey on Static Hand Gesture Recognition Short Survey on Static Hand Gesture Recognition Huu-Hung Huynh University of Science and Technology The University of Danang, Vietnam Duc-Hoang Vo University of Science and Technology The University of

More information

Scalable Hierarchical Summarization of News Using Fidelity in MPEG-7 Description Scheme

Scalable Hierarchical Summarization of News Using Fidelity in MPEG-7 Description Scheme Scalable Hierarchical Summarization of News Using Fidelity in MPEG-7 Description Scheme Jung-Rim Kim, Seong Soo Chun, Seok-jin Oh, and Sanghoon Sull School of Electrical Engineering, Korea University,

More information

Introduction to Medical Imaging (5XSA0) Module 5

Introduction to Medical Imaging (5XSA0) Module 5 Introduction to Medical Imaging (5XSA0) Module 5 Segmentation Jungong Han, Dirk Farin, Sveta Zinger ( s.zinger@tue.nl ) 1 Outline Introduction Color Segmentation region-growing region-merging watershed

More information

Action Classification in Soccer Videos with Long Short-Term Memory Recurrent Neural Networks

Action Classification in Soccer Videos with Long Short-Term Memory Recurrent Neural Networks Action Classification in Soccer Videos with Long Short-Term Memory Recurrent Neural Networks Moez Baccouche 1,2, Franck Mamalet 1, Christian Wolf 2, Christophe Garcia 1, and Atilla Baskurt 2 1 Orange Labs,

More information

Automatic Colorization of Grayscale Images

Automatic Colorization of Grayscale Images Automatic Colorization of Grayscale Images Austin Sousa Rasoul Kabirzadeh Patrick Blaes Department of Electrical Engineering, Stanford University 1 Introduction ere exists a wealth of photographic images,

More information

Clustering Methods for Video Browsing and Annotation

Clustering Methods for Video Browsing and Annotation Clustering Methods for Video Browsing and Annotation Di Zhong, HongJiang Zhang 2 and Shih-Fu Chang* Institute of System Science, National University of Singapore Kent Ridge, Singapore 05 *Center for Telecommunication

More information

Video Analysis for Browsing and Printing

Video Analysis for Browsing and Printing Video Analysis for Browsing and Printing Qian Lin, Tong Zhang, Mei Chen, Yining Deng, Brian Atkins HP Laboratories HPL-2008-215 Keyword(s): video mining, video printing, user intent, video panorama, video

More information

NeTra-V: Towards an Object-based Video Representation

NeTra-V: Towards an Object-based Video Representation Proc. of SPIE, Storage and Retrieval for Image and Video Databases VI, vol. 3312, pp 202-213, 1998 NeTra-V: Towards an Object-based Video Representation Yining Deng, Debargha Mukherjee and B. S. Manjunath

More information

Hypervideo Summaries

Hypervideo Summaries Hypervideo Summaries Andreas Girgensohn, Frank Shipman, Lynn Wilcox FX Palo Alto Laboratory, 3400 Hillview Avenue, Bldg. 4, Palo Alto, CA 94304 ABSTRACT Hypervideo is a form of interactive video that allows

More information

Cs : Computer Vision Final Project Report

Cs : Computer Vision Final Project Report Cs 600.461: Computer Vision Final Project Report Giancarlo Troni gtroni@jhu.edu Raphael Sznitman sznitman@jhu.edu Abstract Given a Youtube video of a busy street intersection, our task is to detect, track,

More information

Text Area Detection from Video Frames

Text Area Detection from Video Frames Text Area Detection from Video Frames 1 Text Area Detection from Video Frames Xiangrong Chen, Hongjiang Zhang Microsoft Research China chxr@yahoo.com, hjzhang@microsoft.com Abstract. Text area detection

More information

Audio-Visual Content Indexing, Filtering, and Adaptation

Audio-Visual Content Indexing, Filtering, and Adaptation Audio-Visual Content Indexing, Filtering, and Adaptation Shih-Fu Chang Digital Video and Multimedia Group ADVENT University-Industry Consortium Columbia University 10/12/2001 http://www.ee.columbia.edu/dvmm

More information

Region-based Segmentation

Region-based Segmentation Region-based Segmentation Image Segmentation Group similar components (such as, pixels in an image, image frames in a video) to obtain a compact representation. Applications: Finding tumors, veins, etc.

More information

Audio-Visual Content Indexing, Filtering, and Adaptation

Audio-Visual Content Indexing, Filtering, and Adaptation Audio-Visual Content Indexing, Filtering, and Adaptation Shih-Fu Chang Digital Video and Multimedia Group ADVENT University-Industry Consortium Columbia University 10/12/2001 http://www.ee.columbia.edu/dvmm

More information

Noise Reduction in Image Sequences using an Effective Fuzzy Algorithm

Noise Reduction in Image Sequences using an Effective Fuzzy Algorithm Noise Reduction in Image Sequences using an Effective Fuzzy Algorithm Mahmoud Saeid Khadijeh Saeid Mahmoud Khaleghi Abstract In this paper, we propose a novel spatiotemporal fuzzy based algorithm for noise

More information

Browsing News and TAlk Video on a Consumer Electronics Platform Using face Detection

Browsing News and TAlk Video on a Consumer Electronics Platform Using face Detection MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Browsing News and TAlk Video on a Consumer Electronics Platform Using face Detection Kadir A. Peker, Ajay Divakaran, Tom Lanning TR2005-155

More information

Rushes Video Segmentation Using Semantic Features

Rushes Video Segmentation Using Semantic Features Rushes Video Segmentation Using Semantic Features Athina Pappa, Vasileios Chasanis, and Antonis Ioannidis Department of Computer Science and Engineering, University of Ioannina, GR 45110, Ioannina, Greece

More information

Analysis of Image and Video Using Color, Texture and Shape Features for Object Identification

Analysis of Image and Video Using Color, Texture and Shape Features for Object Identification IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 16, Issue 6, Ver. VI (Nov Dec. 2014), PP 29-33 Analysis of Image and Video Using Color, Texture and Shape Features

More information

SOUND EVENT DETECTION AND CONTEXT RECOGNITION 1 INTRODUCTION. Toni Heittola 1, Annamaria Mesaros 1, Tuomas Virtanen 1, Antti Eronen 2

SOUND EVENT DETECTION AND CONTEXT RECOGNITION 1 INTRODUCTION. Toni Heittola 1, Annamaria Mesaros 1, Tuomas Virtanen 1, Antti Eronen 2 Toni Heittola 1, Annamaria Mesaros 1, Tuomas Virtanen 1, Antti Eronen 2 1 Department of Signal Processing, Tampere University of Technology Korkeakoulunkatu 1, 33720, Tampere, Finland toni.heittola@tut.fi,

More information

Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation

Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation Obviously, this is a very slow process and not suitable for dynamic scenes. To speed things up, we can use a laser that projects a vertical line of light onto the scene. This laser rotates around its vertical

More information

Structure Analysis of Soccer Video with Domain Knowledge and Hidden Markov Models

Structure Analysis of Soccer Video with Domain Knowledge and Hidden Markov Models MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Structure Analysis of Soccer Video with Domain Knowledge and Hidden Markov Models Lexing Xie, Peng Xu, Shih-Fu Chang, Ajay Divakaran, Huifang

More information

Real-time Monitoring System for TV Commercials Using Video Features

Real-time Monitoring System for TV Commercials Using Video Features Real-time Monitoring System for TV Commercials Using Video Features Sung Hwan Lee, Won Young Yoo, and Young-Suk Yoon Electronics and Telecommunications Research Institute (ETRI), 11 Gajeong-dong, Yuseong-gu,

More information

Binju Bentex *1, Shandry K. K 2. PG Student, Department of Computer Science, College Of Engineering, Kidangoor, Kottayam, Kerala, India

Binju Bentex *1, Shandry K. K 2. PG Student, Department of Computer Science, College Of Engineering, Kidangoor, Kottayam, Kerala, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Survey on Summarization of Multiple User-Generated

More information

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Stefan Müller, Gerhard Rigoll, Andreas Kosmala and Denis Mazurenok Department of Computer Science, Faculty of

More information

HIERARCHICAL VISUAL DESCRIPTION SCHEMES FOR STILL IMAGES AND VIDEO SEQUENCES

HIERARCHICAL VISUAL DESCRIPTION SCHEMES FOR STILL IMAGES AND VIDEO SEQUENCES HIERARCHICAL VISUAL DESCRIPTION SCHEMES FOR STILL IMAGES AND VIDEO SEQUENCES Universitat Politècnica de Catalunya Barcelona, SPAIN philippe@gps.tsc.upc.es P. Salembier, N. O Connor 2, P. Correia 3 and

More information

CHAPTER 3 SHOT DETECTION AND KEY FRAME EXTRACTION

CHAPTER 3 SHOT DETECTION AND KEY FRAME EXTRACTION 33 CHAPTER 3 SHOT DETECTION AND KEY FRAME EXTRACTION 3.1 INTRODUCTION The twenty-first century is an age of information explosion. We are witnessing a huge growth in digital data. The trend of increasing

More information

ABSTRACT 1. INTRODUCTION

ABSTRACT 1. INTRODUCTION ABSTRACT A Framework for Multi-Agent Multimedia Indexing Bernard Merialdo Multimedia Communications Department Institut Eurecom BP 193, 06904 Sophia-Antipolis, France merialdo@eurecom.fr March 31st, 1995

More information

If we want widespread use and access to

If we want widespread use and access to Content-Based Multimedia Indexing and Retrieval Semantic Indexing of Multimedia Documents We propose two approaches for semantic indexing of audio visual documents, based on bottom-up and top-down strategies.

More information

Image retrieval based on bag of images

Image retrieval based on bag of images University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2009 Image retrieval based on bag of images Jun Zhang University of Wollongong

More information

Latest development in image feature representation and extraction

Latest development in image feature representation and extraction International Journal of Advanced Research and Development ISSN: 2455-4030, Impact Factor: RJIF 5.24 www.advancedjournal.com Volume 2; Issue 1; January 2017; Page No. 05-09 Latest development in image

More information

Integration of Global and Local Information in Videos for Key Frame Extraction

Integration of Global and Local Information in Videos for Key Frame Extraction Integration of Global and Local Information in Videos for Key Frame Extraction Dianting Liu 1, Mei-Ling Shyu 1, Chao Chen 1, Shu-Ching Chen 2 1 Department of Electrical and Computer Engineering University

More information

Motion Tracking and Event Understanding in Video Sequences

Motion Tracking and Event Understanding in Video Sequences Motion Tracking and Event Understanding in Video Sequences Isaac Cohen Elaine Kang, Jinman Kang Institute for Robotics and Intelligent Systems University of Southern California Los Angeles, CA Objectives!

More information

A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models

A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models Gleidson Pegoretti da Silva, Masaki Nakagawa Department of Computer and Information Sciences Tokyo University

More information

Video search requires efficient annotation of video content To some extent this can be done automatically

Video search requires efficient annotation of video content To some extent this can be done automatically VIDEO ANNOTATION Market Trends Broadband doubling over next 3-5 years Video enabled devices are emerging rapidly Emergence of mass internet audience Mainstream media moving to the Web What do we search

More information

Color Image Segmentation

Color Image Segmentation Color Image Segmentation Yining Deng, B. S. Manjunath and Hyundoo Shin* Department of Electrical and Computer Engineering University of California, Santa Barbara, CA 93106-9560 *Samsung Electronics Inc.

More information

Texture Segmentation by Windowed Projection

Texture Segmentation by Windowed Projection Texture Segmentation by Windowed Projection 1, 2 Fan-Chen Tseng, 2 Ching-Chi Hsu, 2 Chiou-Shann Fuh 1 Department of Electronic Engineering National I-Lan Institute of Technology e-mail : fctseng@ccmail.ilantech.edu.tw

More information

Tracking of video objects using a backward projection technique

Tracking of video objects using a backward projection technique Tracking of video objects using a backward projection technique Stéphane Pateux IRISA/INRIA, Temics Project Campus Universitaire de Beaulieu 35042 Rennes Cedex, FRANCE ABSTRACT In this paper, we present

More information

Image retrieval based on region shape similarity

Image retrieval based on region shape similarity Image retrieval based on region shape similarity Cheng Chang Liu Wenyin Hongjiang Zhang Microsoft Research China, 49 Zhichun Road, Beijing 8, China {wyliu, hjzhang}@microsoft.com ABSTRACT This paper presents

More information

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009 Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer

More information

Understanding Sport Activities from Correspondences of Clustered Trajectories

Understanding Sport Activities from Correspondences of Clustered Trajectories Understanding Sport Activities from Correspondences of Clustered Trajectories Francesco Turchini, Lorenzo Seidenari, Alberto Del Bimbo http://www.micc.unifi.it/vim Introduction The availability of multimedia

More information

Det De e t cting abnormal event n s Jaechul Kim

Det De e t cting abnormal event n s Jaechul Kim Detecting abnormal events Jaechul Kim Purpose Introduce general methodologies used in abnormality detection Deal with technical details of selected papers Abnormal events Easy to verify, but hard to describe

More information

MULTIVIEW REPRESENTATION OF 3D OBJECTS OF A SCENE USING VIDEO SEQUENCES

MULTIVIEW REPRESENTATION OF 3D OBJECTS OF A SCENE USING VIDEO SEQUENCES MULTIVIEW REPRESENTATION OF 3D OBJECTS OF A SCENE USING VIDEO SEQUENCES Mehran Yazdi and André Zaccarin CVSL, Dept. of Electrical and Computer Engineering, Laval University Ste-Foy, Québec GK 7P4, Canada

More information

Information Extraction from News Video using Global Rule Induction Technique

Information Extraction from News Video using Global Rule Induction Technique Information Extraction from News Video using Global Rule Induction Technique Lekha Chaisorn and 2 Tat-Seng Chua Media Semantics Department, Media Division, Institute for Infocomm Research (I 2 R), Singapore

More information

Iterative Image Based Video Summarization by Node Segmentation

Iterative Image Based Video Summarization by Node Segmentation Iterative Image Based Video Summarization by Node Segmentation Nalini Vasudevan Arjun Jain Himanshu Agrawal Abstract In this paper, we propose a simple video summarization system based on removal of similar

More information

UNSUPERVISED MINING OF MULTIPLE AUDIOVISUALLY CONSISTENT CLUSTERS FOR VIDEO STRUCTURE ANALYSIS

UNSUPERVISED MINING OF MULTIPLE AUDIOVISUALLY CONSISTENT CLUSTERS FOR VIDEO STRUCTURE ANALYSIS Author manuscript, published in "Intl. Conf. on Multimedia and Exhibition, Australia (2012)" UNSUPERVISED MINING OF MULTIPLE AUDIOVISUALLY CONSISTENT CLUSTERS FOR VIDEO STRUCTURE ANALYSIS Anh-Phuong TA

More information

Automatic visual recognition for metro surveillance

Automatic visual recognition for metro surveillance Automatic visual recognition for metro surveillance F. Cupillard, M. Thonnat, F. Brémond Orion Research Group, INRIA, Sophia Antipolis, France Abstract We propose in this paper an approach for recognizing

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

Real-Time Detection of Sport in MPEG-2 Sequences using High-Level AV-Descriptors and SVM

Real-Time Detection of Sport in MPEG-2 Sequences using High-Level AV-Descriptors and SVM Real-Time Detection of Sport in MPEG-2 Sequences using High-Level AV-Descriptors and SVM Ronald Glasberg 1, Sebastian Schmiedee 2, Hüseyin Oguz 3, Pascal Kelm 4 and Thomas Siora 5 Communication Systems

More information

AUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS

AUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS AUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS Nilam B. Lonkar 1, Dinesh B. Hanchate 2 Student of Computer Engineering, Pune University VPKBIET, Baramati, India Computer Engineering, Pune University VPKBIET,

More information

TEXT DETECTION AND RECOGNITION IN CAMERA BASED IMAGES

TEXT DETECTION AND RECOGNITION IN CAMERA BASED IMAGES TEXT DETECTION AND RECOGNITION IN CAMERA BASED IMAGES Mr. Vishal A Kanjariya*, Mrs. Bhavika N Patel Lecturer, Computer Engineering Department, B & B Institute of Technology, Anand, Gujarat, India. ABSTRACT:

More information

Workload Characterization Techniques

Workload Characterization Techniques Workload Characterization Techniques Raj Jain Washington University in Saint Louis Saint Louis, MO 63130 Jain@cse.wustl.edu These slides are available on-line at: http://www.cse.wustl.edu/~jain/cse567-08/

More information

Deterministic Approach to Content Structure Analysis of Tennis Video

Deterministic Approach to Content Structure Analysis of Tennis Video Deterministic Approach to Content Structure Analysis of Tennis Video Viachaslau Parshyn, Liming Chen A Research Report, Lab. LIRIS, Ecole Centrale de Lyon LYON 2006 Abstract. An approach to automatic tennis

More information

Detection of goal event in soccer videos

Detection of goal event in soccer videos Detection of goal event in soccer videos Hyoung-Gook Kim, Steffen Roeber, Amjad Samour, Thomas Sikora Department of Communication Systems, Technical University of Berlin, Einsteinufer 17, D-10587 Berlin,

More information

Multimedia Database Systems. Retrieval by Content

Multimedia Database Systems. Retrieval by Content Multimedia Database Systems Retrieval by Content MIR Motivation Large volumes of data world-wide are not only based on text: Satellite images (oil spill), deep space images (NASA) Medical images (X-rays,

More information

Segmentation of Images

Segmentation of Images Segmentation of Images SEGMENTATION If an image has been preprocessed appropriately to remove noise and artifacts, segmentation is often the key step in interpreting the image. Image segmentation is a

More information

SEMANTIC ANNOTATION AND TRANSCODING FOR SPORT VIDEOS

SEMANTIC ANNOTATION AND TRANSCODING FOR SPORT VIDEOS SEMANTIC ANNOTATION AND TRANSCODING FOR SPORT VIDEOS M. Bertini, A. Del Bimbo D.S.I. - Università di Firenze - Italy bertini,delbimbo@dsi.unifi.it A. Prati, R. Cucchiara D.I.I. - Università di Modena e

More information

A Content Based Image Retrieval System Based on Color Features

A Content Based Image Retrieval System Based on Color Features A Content Based Image Retrieval System Based on Features Irena Valova, University of Rousse Angel Kanchev, Department of Computer Systems and Technologies, Rousse, Bulgaria, Irena@ecs.ru.acad.bg Boris

More information

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods

More information

CS 664 Segmentation. Daniel Huttenlocher

CS 664 Segmentation. Daniel Huttenlocher CS 664 Segmentation Daniel Huttenlocher Grouping Perceptual Organization Structural relationships between tokens Parallelism, symmetry, alignment Similarity of token properties Often strong psychophysical

More information

Comparison of Sequence Matching Techniques for Video Copy Detection

Comparison of Sequence Matching Techniques for Video Copy Detection Comparison of Sequence Matching Techniques for Video Copy Detection Arun Hampapur a, Ki-Ho Hyun b and Ruud Bolle a a IBM T.J Watson Research Center, 3 Saw Mill River Road, Hawthorne, NY 1532, USA b School

More information

Self Lane Assignment Using Smart Mobile Camera For Intelligent GPS Navigation and Traffic Interpretation

Self Lane Assignment Using Smart Mobile Camera For Intelligent GPS Navigation and Traffic Interpretation For Intelligent GPS Navigation and Traffic Interpretation Tianshi Gao Stanford University tianshig@stanford.edu 1. Introduction Imagine that you are driving on the highway at 70 mph and trying to figure

More information