If we want widespread use and access to

Size: px
Start display at page:

Download "If we want widespread use and access to"

Transcription

1 Content-Based Multimedia Indexing and Retrieval Semantic Indexing of Multimedia Documents We propose two approaches for semantic indexing of audio visual documents, based on bottom-up and top-down strategies. We base the first approach on a finitestate machine using low-level motion indices extracted from an MPEG compressed bitstream. The second approach innovatively performs semantic indexing through Hidden Markov Models. Riccardo Leonardi and Pierangelo Migliorati University of Brescia If we want widespread use and access to richer and novel information sources, we ll need effective navigation through multimedia documents. In this context, the design of efficient indexing techniques that facilitate the retrieval of relevant information is an important issue. Allowing for possible automatic procedures to semantically index audio video material represents an important challenge. Ideally, we could design such methods to create suitable indices of the audio visual material, which characterize the temporal structure of a multimedia document from a semantic point of view. 1 Traditionally, the most common approach to create an index of an audio visual document is based on the automatic detection of changes to camera records and the types of involved editing effects. This kind of approach generally demonstrates satisfactory performance and leads to a good low-level temporal characterization of the visual content. However, semantic characterization remains poor because the description is fragmented considering the high number of shot transitions occurring in typical audio visual programs. Alternatively, recent research efforts base the analysis of audio visual documents on joint audio and video processing to provide for a higher level organization of information. 2,3 Saraceno and Leonardi 3 considered these two information sources for identifying simple scenes that compose an audio visual program. Here we propose and compare the performance of two different classes of approaches for semantic indexing of audio visual documents. In the first one, we tackle the problem in a topdown fashion to identify a specific event in a certain program. In the second class, we first identify structuring elements from the data, then group them to form new patterns that we can further combine into a hierarchy. More precisely, we apply the top-down approach for identifying relevant situations in soccer video sequences. In the complementary bottom-up approach, we combine audio and visual descriptors associated to individual shots and associated audio segments to extract higher level semantic entities. Many researchers have studied automatic detection of semantic events in sport games. Generally, the goal is to identify certain spatiotemporal segments corresponding to semantically significant events. Tovinkere et al., 4 for example, presented a method that tries to detect the complete set of semantic events that might happen in a soccer game. This method uses the player s and ball s position information during the game as input. As a result, the approach requires a complex and accurate tracking system to obtain this information. In our approach, we consider only the motion information associated to an MPEG-2 bitstream. We addressed the problem by trying to identify a correlation between semantic events and the low-level motion indices associated to a video sequence. In particular, we considered three low-level indices that represent the following characteristics lack of motion, camera operations (represented by pan and zoom parameters), and presence of shot cuts. We then studied the correlation between these indices and the semantic events demonstrating their usefulness. 5,6 To exploit this correlation, we propose an algorithm based on finite-state machines that can detect the presence of goals and other relevant events in soccer games. As we mentioned earlier, in the complementary bottom-up approach, we combine audio and visual descriptors to extract higher level semantic entities such as scenes or even individual program items. In particular, we perform the indexing through Hidden Markov Models (HMM) used in an innovative framework. Our approach considers the input signal as a nonstationary stochastic process, modeled by an HMM in which each state stands for a different signal class X/02/$ IEEE

2 Soccer video indexing using motion information As mentioned previously, semantic video indexing will prove useful in the field of efficient navigation and retrieval from multimedia databases. This task, which seems simple for humans, isn t easy to implement in an automatic manner because automatic systems require two steps. In the first step, they must extract some lowlevel indices to represent low-level information in a compact way. In the second step, decisionmaking algorithms extract a semantic index from low-level indices. In our work, we re attempting to semantically index a video sequence starting from some low-level descriptors of the motion field extracted from it. Note that in a top-down approach, detecting the low-level descriptors and their combination to reach a proper decision depends on the content and the targeted recognition task. Low-level motion indices Typically, motion vectors associated to a frame represent apparent motion of objects in the sequence. In our work, we directly extract this motion information from the compressed MPEG- 2 domain, where for the macroblock type and a moving macroblock, a motion vector is provided. The macroblock type relates to the motion vectors as follows: if a macroblock is intracoded, it gives no motion vector; if a macroblock is nomotion-coded, the motion vector is null; otherwise it sends a non-null motion vector tag. 8 We can represent the motion field with various descriptor types, such as temporal and spatial mean and standard deviation of phase and magnitude of the vector field, phase, or magnitude histograms, and camera motion parameters. 9 We should combine these compact representations with other indices suitable to state the reliability of their estimation or to add other useful information. 10,11 Here we ll limit the analysis to the use of three low-level indices that identify lack of motion, camera operations (represented here by only pan and zoom parameters), and the presence of shot-cuts. We detected lack of motion by thresholding the mean value of the motion vector module µ, given for each P-frame by 1 µ= MN I M 1 i= 0 N 1 j= x y v (, i j) + v (, i j) (1) µ < S no-motion (2) where M and N are the frame dimensions (in macroblock unit), I is the number of intracoded macroblocks, v x and v y are the horizontal and vertical components of motion vectors, and S no motion is the threshold value, which we typically set to 4. We evaluated camera motion parameters represented by horizontal pan and zoom factors using a least-mean square method applied to P- frame motion fields. We did this using the algorithm proposed by Migliorati and Tubaro. 12 We detected fast horizontal pan (or fast zoom) by thresholding the pan value (or the zoom factor), using the threshold value S pan (or S zoom ). We estimated the threshold values empirically, and the proposed algorithm shows a good intrinsic robustness with respect to these values. 6 Shot-cut detection The algorithm can detect shot-cut information that we use in the recognition process as well as on the basis of sole-motion information. In particular, we used the sharp variation of the low-level motion indices and of the number of intracoded macroblocks of P-frames as proposed by Deng and Manjunath. 13 To evaluate the sharp variation of the motion field, we estimated the difference between the average value of the motion vector modules between two adjacent P-frames. We call such a difference µ(k) = µ(k) µ(k 1) (3) where µ(k) is the average value of the motion vectors modules of the P-frame k, given by Equation 1. This parameter will assume significantly high values in the presence of a shot cut likely to be characterized by an abrupt change in the motion field between the two considered shots. Information regarding the sharp change in the motion field average behavior has been suitably combined with the number of intracoded macroblocks of the current P-frame, as follows: Cut (k) = Intra(k) + β µ(k) (4) April June

3 Timeout S1 Timeout S2 Fast pan or zoom Fast pan SI or zoom S1 S2 SF Fast pan or zoom Lack of motion Shot cut Figure 1. Proposed goalfinding algorithm. Figure 2. Proposed algorithm for detecting corner and free kicks. Timeout S2 Shot cut where Intra(k) shows the number of the intracoded macroblocks of the current P-frame and β is a proper weighting factor. When Cut(k) is greater than a prefixed threshold value S cut, the algorithm declares a shot cut. As presented Bonzanini et al., 6 the proposed shot-cut algorithm gives good results and is quite robust with respect to the threshold value (typically β is set to 10 while S cut is set to 400). The goal-finding algorithm From the experimental results described by Bonzanini et al., 5 we can see that the low-level indices are insufficient individually to reach satisfactory results. To find particular events, such as goals, we tried to exploit the motion indices temporal evolution in the vicinity of such events. We noticed that in correspondence with goals we find fast pan or zoom followed by lack of motion and a shot cut. We can support this experimental observation by arguing that in conjunction with a goal, one of the two teams starts to move rapidly toward one side of the soccer field. The camera man will track this fast motion of the team players, at times zooming the operator in closely to the player holding the ball. Once the attacking team converges toward the other team Timeout S1 Scene cut No motion Scene SI cut S1 S2 SF Scene cut No motion Fast pan or zoom goal, the camera often remains still to capture the ball entering the net. If there s a score, then there will be a shot cut to present the goal with a different camera viewpoint, track the player who scored from a closed view, or simply provide a replay of the whole scene. We exploit the concatenation of the proposed low-level indices by using the finite-state algorithm shown in Figure 1. From the initial state (SI), the system moves into state 1 (S1) if it detects a fast pan or fast zoom for at least three consecutive P-frames. Then, from S1 the machine goes into the final state (SF), where it declares a goal if a shot cut is present; from S1 it goes into S2 if it detects lack of motion for at least three consecutive P- frames. From S2 the machine goes into SF if it detects a shot cut, while it returns into S1 if it detects fast pan or zoom for at least three consecutive P-frames. (In this case, the game action is probably continuous). It uses two timeouts to return the initial SI from S1 and S2 in case nothing happens for a certain number of P-frames (corresponding to about a 20-second interval). As the Experimental results section shows, this algorithm provides satisfactory results. It detects almost all live goals, and it can also detect some shots to goal. We proposed a similar algorithm (see Figure 2) to detect other interesting events, such as corner kicks and free kicks. In this case, we detected fewer relevant events, and the performance hasn t been as satisfactory as in the previous case. Content-based indexing using HMM Here we focus on using a bottom-up approach to provide tools for analyzing both audio and visual streams, translating signal samples into sequences of semantic labels. We can decompose the whole processing system in the following steps to each of which extracts information at a defined level of semantic abstraction. First, we divide the input stream into the two main components audio and video. An independent segmentation and classification of these two components represent the next analysis step. This step segments the audio stream into clips, and extracts a feature vector from the lowlevel acoustic properties of each clip (such as the Mel-Cepstrum coefficients, zero crossing rate, and so on). This step also calculates a feature vector by comparing each couple of adjacent video frames in terms of luminance histograms, motion vectors, and pixel-to-pixel differences. 46

4 We then classify each sequence of feature vectors extracted from the two streams by an HMM. 14,15 used in an innovative approach. We consider the input signal as a nonstationary stochastic process, modeled by an HMM in which each state stands for a different signal class. After training each HMM, given a sequence of unsupervised feature vectors, we can generate the corresponding most likely sequence of labels identifying particular signal classes using the Viterbi algorithm. 7 For audio classification we considered four classes namely music, silence, speech, and background noise. The result of the audio classification leads to the association of one of these classes to each previously extracted feature vector. In other words, at the end of this analysis we get a temporal separation of the audio signal into segments of a single one of these classes with a resolution of 0.5 second (we determine this resolution by the minimum shift in time between consecutive audio segments). To reach this segmentation result, the audio signal is split into equal length frames, partially overlapped to reduce the spectral distortion because of windowing. The duration of each frame is N samples (typically N is set to represent a 30-to-40-milliseconds (ms) interval), and each frame overlaps for two-thirds of its duration with the next frame. For each frame, the algorithm extracts Mel Frequency Cepstral Coefficients features. These define the observations produced by an ergodic HMM. By using the HMM model, the algorithm can estimate the optimal sequence of hidden states representing the different audio classes associated to the different temporal frames. Finally, consecutive frames marked by the same class define the various audio segments. In the video analysis, the system segments the video signal into elementary units, which form individual video shots. With this aim, we train a two-state HMM classifier, in which it associates S1 as a detected shot transition and S0 as a nondetected shot transition. The system is now ready to identify both abrupt transitions (cut) and slow transitions (fading) between consecutive shots. Again, we obtain these by applying the Viterbi algorithm to the sequence of feature vectors. When it reaches the S1, the system recognizes a shot transition. We use the same idea to establish a correlation between nonadjacent shots to identify subsequent occurrence of a visual content between nonconsecutive shots. The next task extracts content from segmented video shots and indexes them according to the initial audio and video classification. We define a semantic entity called scene, which we compose as a group of consecutive shots. Scenes represent a level of semantic abstraction in which we jointly consider audio and visual signals to reach some meaningful information about the associated data stream. The adopted approach to perform the scene identification requires us to define four different types of scenes: Dialogues. The audio signal is mostly speech, and the change of the associated visual information alternates (for example, ABAB ). Stories. The audio signal is mostly speech while the associated visual information exhibits the repetition of a given visual content to create a shot pattern of the type ABCADEFAG Actions. The audio signal belongs mostly to one class (which is nonspeech), and the visual information exhibits a progressive pattern of shots with contrasting visual contents of the type ABCDEF Generic. The audio of consecutive shots belongs mostly to one class, but the visual content doesn t belong to the other patterns. 16 We can identify these kinds of scenes by using the descriptor sequence obtained by the previous classification steps. 16 Top-down approach results Here we provide and discuss the simulation results for indexing soccer game sequences with our top-down approach. We tested the proposed algorithms performances on two hours of MPEG-2 sequences, obtaining the results reported in Tables 1 and 2 (next page). We detail the events associated to goals, free kicks, and shots toward goal keepers that the proposed goal-finding algorithm detected in Table 2. The goal-finding algorithm detected almost every live goal, together with some shots toward the goal, but it obtained poor results on free kicks. Similarly, we detail events associated to corner kicks, free kicks, and penalties detected by the proposed kick-finding algorithm in Table 2. The algorithm detected only a few of these events, and we had expected a better performance. We attribute this discrepancy to the multitude of scenarios that led to these events. April June

5 Table 1. Performance of the proposed goal-finding algorithm. Present Bottom-up approach results Here we provide and discuss the results of our content-based indexing we obtained by using HMM. Audio classification results We based this analysis on 60 minutes of audio and compared the results of the classification process with those of a ground truth. Overall, we performed the simulation on 20 runs of three minutes each. We summarized the classifier performances using a couple of indices evaluated for each class. We call these indices purity and completeness and define them as follows: Nc Purity = N + N c Nc Completeness = N + N f c Detected Events Live Replay Total Live Replay Total Goals Shots to goal Free kicks Total False 116 Table 2. Performance of the proposed corner and free kicks-finding algorithm. Present Detected Events Live Replay Total Live Replay Total Free kicks Penalties Corner Total False 34 Table 3. Recognition percentages, completeness, and purity. Complete Complete Complete Complete Class Music (%) Silence (%) Speech (%) Noise (%) Completeness Purity Music Silence Speech Noise m (5) where N c is the number of correct detections, N m is the number of missing detections, and N f is the number of wrong detections. Both these indices range between 0 and 1. Table 3 shows the classification performance of each class (music, silence, speech, and noise). It s clear from Table 3 that for noise and silence, the algorithm shows the best performance, while results in music and speech detection are poorer. We attribute this to the high level of misclassification between music and voice. These errors probably derive from the following considerations: the number of data in the training set is too low, the used audio features may be insufficient to reach a correct classification, and some audio segments may not be uniquely classified if we superimpose multiple audio sources simultaneously. Video classification results We based the video classification s performance analysis on a video stream with a duration of 60 minutes. As in the audio case, we carried out each simulation on video segments with a three-minute duration. We combined all the results to obtain the whole classification performance. We performed the video classification in two steps: shot segmentation by transition identification, and correlation between shots to obtain the correlation among noncontiguous shots. If we consider the state S1 (the detected shot transition), we obtain a completeness of 98.9 percent, wheras considering S0 (the nondectect shot transition) we obtain a completeness of 95 percent. Two possible sources of errors exist: S1 is recognized as 0 and vice versa. It s more probable that the system reveals false shot changes rather than missing one of them. The value of purity associated to S1 (the detected shot transition) is This reduced value of purity is because of a relatively high number of false shot detections. These false detections may be caused by fast camera motion, luminance changes, and motion of large objects in the scene. Scene identification results Using the results of audio and 48

6 video classification, we can evaluate the performance of the scene identification process. The first step is to align audio and video descriptors, creating a descriptor shot sequence. With this sequence we can search for already-defined scene categories (dialogues, actions, stories, and generic scenes). For each kind of scene, we ve calculated four different performance indices: completeness, purity, completeness cover, and purity cover. We defined the first two scene classes (completeness and purity) as in the case of the audio and video classification simulations, while we introduced the latter two (completeness cover and puritycover) to consider situations when the system recognizes some shots correctly and improperly recognizes others. We decided that a scene would be declared correctly recognized if the system correctly identifies at least one of its shots (that is, it belongs to the right kind of scene). Moreover, when it recognizes two consecutive scenes as a unique scene, we consider only one correct (if it has been correctly classified) while we declare the other as missed. We introduced the second couple of performance indices because sometimes the identified scene only partially overlaps with the real scene (some shots belonging to the identified scene don t actually belong to the real scene). Let N c be the number of shots belonging to one kind of correctly identified scene, N a the total number of shots belonging to this kind of scene, and N r the whole number of shots belonging to the identified scenes for this kind of scene. Then, we define Purity cover N = N Completeness c r cover N = N (6) Table 4 shows the performance indices for each type of scene. We based the scene identification procedure on deterministic rules rather than on a stochastic classifier, and it provides limited results when used to understand the audio visual assembly modality that the director used to create the scenes. The errors could result from inaccuracies in a priori rules and from errors in the previous steps of classification. Figures 3 through 5 show the results of the classifications of three TV programs representing three minutes (respectively) of a talk show, music program, and scientific documentary. In the video stream diagram, the system associates the shot label to the visual content of the relative c a Table 4. Values for each index and for each kind of scene evaluated using the results of the previous steps of audio and video classification. Completeness Purity Completeness cover Purity cover Dialogue Action Story Generic scene Shot identifiers (a) Audio classes No Sp Si Mu (b) Figure 3. (a) Video and (b) audio classification results of three minutes of a talk show. shot in such a way that the same label denotes shots with similar visual content. (Note that in the figures No stands for nosie, Sp for speech, Si for silence, and Mu for music.) We can effectively use the different statistical evolutions of audio and video indices to infer aspects of the semantic content of the underlying signals. Considering the example we show in Figure 3, it s easy to notice that the visual content alternates between two or three patterns while the audio signal remains mainly speech, separated by short music and clapping intervals. On the other hand Figure 4 (next page) outlines the different stages in a concert. After a music start, we observe April June

7 Shot identifiers Shot identifiers (a) 0 (a) No No Audio classes Sp Si Audio classes Sp Si Mu Mu (b) (b) Figure 4. (a) Video and (b) audio classification results of three minutes of a music program. Figure 5. (a) Video and (b) audio classification results of three minutes of a scientific documentary. IEEE MultiMedia clapping, followed by comments from the presenter, instrument fine-tuning, and then a recess of the music. The visual counterpart clearly exhibits a different pattern with respect to the talk show program, where we can clearly recognize an alternating pattern in the visual content. Finally, Figure 5 exhibits a continuous evolution of the visual content, where it temporarily replaces a pleasant musical surrounding with the presenter s comments. From these studies, we can conclude that we can rarely achieve semantic characterization at the highest level unless we use a top-down approach. Even then, it requires predefining in a specific application context the high-level semantic instances of events of interest (such as goals in a soccer game). Otherwise, intermediary semantic characterization is only obtainable to identify scenes defining dialogue, story, or action situations. What appears quite attractive instead is to use low-level descriptors in providing a direct feedback on the content of the described audio - visual program. The experiments have demonstrated that, by adequate visualization or presentation, low-level features instantly carry semantic information about the program content (given a certain program category) that might help the viewer use such low-level information for navigation or retrieval of relevant events. Conclusion We ve presented two different semantic indexing algorithms based respectively on top-down and bottom-up approaches. Regarding the topdown approach, we obtained very interesting results. The algorithm detects almost all live goals and can detect some shots toward the goal as well. Considering the bottom-up approach, we analyzed several samples from the MPEG-7 content set using the proposed classification schemes, demonstrating the performance of the overall approach to provide insights of the content of the audio visual material. Moreover, what appears quite attractive is to use low-level descriptors in providing a feedback of the content of the described audio visual program. We re devoting our current research to the extension of the top-down approach to detect salient semantic events in other categories of audio visual programs. Moreover, we need to further research accessing the classification proce- 50

8 dure s robustness proposed in the bottom-up approach. MM References 1. N. Adami et al., The ToCAI Description Scheme for Indexing and Retrieval of Multimedia Documents, Multimedia Tools and Applications J., Kluwer Academic Publishers, Dordecht, The Netherlands, vol. 14, no. 2, June 2001, pp Y. Wang, Z. Liu, and J. Huang, Multimedia Content Analysis Using Audio and Visual Information, IEEE Signal Processing, vol. 17, no. 6, Nov. 2000, pp C. Saraceno and R. Leonardi, Indexing Audio Visual Databases Through a Joint Audio and Video Processing, Int l J. Imaging Systems and Technology, vol. 9, no. 5, Oct. 1998, pp V. Tovinkere and R.J. Qian, Detecting Semantic Events in Soccer Games: Toward a Complete Solution, Proc. Int l Conf. Multimedia and Expo (ICME 2001), IEEE CS Press, Los Alamitos, Calif., 2001, pp A. Bonzanini, R. Leonardi, and P. Migliorati, Semantic Video Indexing Using MPEG Motion Vectors, Proc. European Signal Processing Conf. (Eusipco 2000), Tampere, Finland, 2000, pp A. Bonzanini, R. Leonardi, and P. Migliorati, Event Recognition in Sport Programs Using Low-Level Motion Indices, Proc. Int l Conf. Multimedia and Expo (ICME 2001), IEEE CS Press, Los Alamitos, Calif., 2001, pp F. Oppini and R. Leonardi, Audiovisual Pattern Recognition Using HMM for Content-Based Multimedia Indexing, Proc. Packet Video 2000, /pv T. Sikora, MPEG Digital Video-Coding Standards, IEEE Signal Processing, vol. 14, no. 5, Sept. 1997, pp E. Ardizzone and M. La Cascia, Video Indexing Using Optical Flow Field, Proc. Int l Conf. Image Processing (ICIP 96), IEEE CS Press, Los Alamitos, Calif., 1996, pp W.A.C. Fernando, C.N. Canagarajah, and D.R. Bull, Video Segmentation and Classification for Content Based Storage and Retrieval Using Motion Vectors, Proc. SPIE Conf. Storage and Retrieval for Image and Video Databases VII, SPIE Press, Bellingham, Wash., Jan. 1999, pp Yining Deng and B.S. Manjunath, Content-Based Search of Video Using Color, Texture, and Motion, Proc. Int l Conf. Image Processing (ICIP 97), IEEE CS Press, Los Alamitos, Calif., 1997, pp P. Migliorati and S. Tubaro, Multistage Motion Estimation for Image Interpolation, Proc. Eurasip Signal Processing: Image Comm., no. 7, 1995, pp , Y. Deng and B.S. Manjunath, Content-Based Search of Video Using Color, Texture, and Motion, Proc. Int l Conf. Image Processing (ICIP 97), IEEE CS Press, Los Alamitos, Calif., 1997, pp L.R. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, IEEE Proc., vol. 77, no. 2, Feb. 1989, pp R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis, John Wiley & Sons, New York, C. Saraceno and R. Leonardi, Identification of Story Units in Audio Visual Sequences by Joint Audio and Video Processing, Proc. Int l Conf. Image Processing (ICIP 98), IEEE CS Press, Los Alamitos, Calif., 1998, pp Riccardo Leonardi is a telecommunications researcher and professor at the University of Brescia. His main research interests are digital signal processing applications, with a focus on visual communications and content-based analysis of audio visual information. He received his diploma and PhD degrees in electrical engineering from the Swiss Federal Institute of Technology in Lausanne in 1984 and 1987, respectively. He has published more than 50 papers on these topics and acts as a national scientific coordinator of research programs in visual communications. Currently, he is also an evaluator and auditor for the European Commission on Research, Technology, and Development (RTD) programs. Pierangelo Migliorati is a telecommunications assistant professor at the University of Brescia. His main research interests include digital signal processing and transmission systems, with a specific expertise on visual communication and content-based analysis of audio visual information. He s also involved in activities related to channel equalization of nonlinear channels. He received a laurea (cum laude) in electronic engineering from the Politecnico di Milano in 1988, and an MS in information technology from the CEFRIEL Research Centre, Milan, in He s a member of the IEEE. Readers may reach the authors at the Department of Electronics for Automation, University of Brescia, Via Branze, 38, 25123, Brescia, Italy, {leon, pier}@ing.unibs.it. April June

A Robust Wipe Detection Algorithm

A Robust Wipe Detection Algorithm A Robust Wipe Detection Algorithm C. W. Ngo, T. C. Pong & R. T. Chin Department of Computer Science The Hong Kong University of Science & Technology Clear Water Bay, Kowloon, Hong Kong Email: fcwngo, tcpong,

More information

The ToCAI Description Scheme for Indexing and Retrieval of Multimedia Documents 1

The ToCAI Description Scheme for Indexing and Retrieval of Multimedia Documents 1 The ToCAI Description Scheme for Indexing and Retrieval of Multimedia Documents 1 N. Adami, A. Bugatti, A. Corghi, R. Leonardi, P. Migliorati, Lorenzo A. Rossi, C. Saraceno 2 Department of Electronics

More information

Highlights Extraction from Unscripted Video

Highlights Extraction from Unscripted Video Highlights Extraction from Unscripted Video T 61.6030, Multimedia Retrieval Seminar presentation 04.04.2008 Harrison Mfula Helsinki University of Technology Department of Computer Science, Espoo, Finland

More information

Video De-interlacing with Scene Change Detection Based on 3D Wavelet Transform

Video De-interlacing with Scene Change Detection Based on 3D Wavelet Transform Video De-interlacing with Scene Change Detection Based on 3D Wavelet Transform M. Nancy Regina 1, S. Caroline 2 PG Scholar, ECE, St. Xavier s Catholic College of Engineering, Nagercoil, India 1 Assistant

More information

Title: Pyramidwise Structuring for Soccer Highlight Extraction. Authors: Ming Luo, Yu-Fei Ma, Hong-Jiang Zhang

Title: Pyramidwise Structuring for Soccer Highlight Extraction. Authors: Ming Luo, Yu-Fei Ma, Hong-Jiang Zhang Title: Pyramidwise Structuring for Soccer Highlight Extraction Authors: Ming Luo, Yu-Fei Ma, Hong-Jiang Zhang Mailing address: Microsoft Research Asia, 5F, Beijing Sigma Center, 49 Zhichun Road, Beijing

More information

Scalable Coding of Image Collections with Embedded Descriptors

Scalable Coding of Image Collections with Embedded Descriptors Scalable Coding of Image Collections with Embedded Descriptors N. Adami, A. Boschetti, R. Leonardi, P. Migliorati Department of Electronic for Automation, University of Brescia Via Branze, 38, Brescia,

More information

Generation of Sports Highlights Using a Combination of Supervised & Unsupervised Learning in Audio Domain

Generation of Sports Highlights Using a Combination of Supervised & Unsupervised Learning in Audio Domain MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Generation of Sports Highlights Using a Combination of Supervised & Unsupervised Learning in Audio Domain Radhakrishan, R.; Xiong, Z.; Divakaran,

More information

Story Unit Segmentation with Friendly Acoustic Perception *

Story Unit Segmentation with Friendly Acoustic Perception * Story Unit Segmentation with Friendly Acoustic Perception * Longchuan Yan 1,3, Jun Du 2, Qingming Huang 3, and Shuqiang Jiang 1 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing,

More information

Video Key-Frame Extraction using Entropy value as Global and Local Feature

Video Key-Frame Extraction using Entropy value as Global and Local Feature Video Key-Frame Extraction using Entropy value as Global and Local Feature Siddu. P Algur #1, Vivek. R *2 # Department of Information Science Engineering, B.V. Bhoomraddi College of Engineering and Technology

More information

Color Image Segmentation

Color Image Segmentation Color Image Segmentation Yining Deng, B. S. Manjunath and Hyundoo Shin* Department of Electrical and Computer Engineering University of California, Santa Barbara, CA 93106-9560 *Samsung Electronics Inc.

More information

CORRELATION BASED CAR NUMBER PLATE EXTRACTION SYSTEM

CORRELATION BASED CAR NUMBER PLATE EXTRACTION SYSTEM CORRELATION BASED CAR NUMBER PLATE EXTRACTION SYSTEM 1 PHYO THET KHIN, 2 LAI LAI WIN KYI 1,2 Department of Information Technology, Mandalay Technological University The Republic of the Union of Myanmar

More information

Searching Video Collections:Part I

Searching Video Collections:Part I Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia Representation Visual Features (Still Images and Image Sequences) Color Texture Shape Edges Objects, Motion

More information

Scalable Hierarchical Summarization of News Using Fidelity in MPEG-7 Description Scheme

Scalable Hierarchical Summarization of News Using Fidelity in MPEG-7 Description Scheme Scalable Hierarchical Summarization of News Using Fidelity in MPEG-7 Description Scheme Jung-Rim Kim, Seong Soo Chun, Seok-jin Oh, and Sanghoon Sull School of Electrical Engineering, Korea University,

More information

Analysis of Image and Video Using Color, Texture and Shape Features for Object Identification

Analysis of Image and Video Using Color, Texture and Shape Features for Object Identification IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 16, Issue 6, Ver. VI (Nov Dec. 2014), PP 29-33 Analysis of Image and Video Using Color, Texture and Shape Features

More information

Multimedia Databases. Wolf-Tilo Balke Younès Ghammad Institut für Informationssysteme Technische Universität Braunschweig

Multimedia Databases. Wolf-Tilo Balke Younès Ghammad Institut für Informationssysteme Technische Universität Braunschweig Multimedia Databases Wolf-Tilo Balke Younès Ghammad Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Previous Lecture Audio Retrieval - Query by Humming

More information

Multimedia Databases. 9 Video Retrieval. 9.1 Hidden Markov Model. 9.1 Hidden Markov Model. 9.1 Evaluation. 9.1 HMM Example 12/18/2009

Multimedia Databases. 9 Video Retrieval. 9.1 Hidden Markov Model. 9.1 Hidden Markov Model. 9.1 Evaluation. 9.1 HMM Example 12/18/2009 9 Video Retrieval Multimedia Databases 9 Video Retrieval 9.1 Hidden Markov Models (continued from last lecture) 9.2 Introduction into Video Retrieval Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme

More information

Chapter 3 Image Registration. Chapter 3 Image Registration

Chapter 3 Image Registration. Chapter 3 Image Registration Chapter 3 Image Registration Distributed Algorithms for Introduction (1) Definition: Image Registration Input: 2 images of the same scene but taken from different perspectives Goal: Identify transformation

More information

A Rapid Scheme for Slow-Motion Replay Segment Detection

A Rapid Scheme for Slow-Motion Replay Segment Detection A Rapid Scheme for Slow-Motion Replay Segment Detection Wei-Hong Chuang, Dun-Yu Hsiao, Soo-Chang Pei, and Homer Chen Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan 10617,

More information

Clustering Methods for Video Browsing and Annotation

Clustering Methods for Video Browsing and Annotation Clustering Methods for Video Browsing and Annotation Di Zhong, HongJiang Zhang 2 and Shih-Fu Chang* Institute of System Science, National University of Singapore Kent Ridge, Singapore 05 *Center for Telecommunication

More information

Baseball Game Highlight & Event Detection

Baseball Game Highlight & Event Detection Baseball Game Highlight & Event Detection Student: Harry Chao Course Adviser: Winston Hu 1 Outline 1. Goal 2. Previous methods 3. My flowchart 4. My methods 5. Experimental result 6. Conclusion & Future

More information

Redundancy and Correlation: Temporal

Redundancy and Correlation: Temporal Redundancy and Correlation: Temporal Mother and Daughter CIF 352 x 288 Frame 60 Frame 61 Time Copyright 2007 by Lina J. Karam 1 Motion Estimation and Compensation Video is a sequence of frames (images)

More information

Video Analysis for Browsing and Printing

Video Analysis for Browsing and Printing Video Analysis for Browsing and Printing Qian Lin, Tong Zhang, Mei Chen, Yining Deng, Brian Atkins HP Laboratories HPL-2008-215 Keyword(s): video mining, video printing, user intent, video panorama, video

More information

NeTra-V: Towards an Object-based Video Representation

NeTra-V: Towards an Object-based Video Representation Proc. of SPIE, Storage and Retrieval for Image and Video Databases VI, vol. 3312, pp 202-213, 1998 NeTra-V: Towards an Object-based Video Representation Yining Deng, Debargha Mukherjee and B. S. Manjunath

More information

Image Classification Using Wavelet Coefficients in Low-pass Bands

Image Classification Using Wavelet Coefficients in Low-pass Bands Proceedings of International Joint Conference on Neural Networks, Orlando, Florida, USA, August -7, 007 Image Classification Using Wavelet Coefficients in Low-pass Bands Weibao Zou, Member, IEEE, and Yan

More information

TEVI: Text Extraction for Video Indexing

TEVI: Text Extraction for Video Indexing TEVI: Text Extraction for Video Indexing Hichem KARRAY, Mohamed SALAH, Adel M. ALIMI REGIM: Research Group on Intelligent Machines, EIS, University of Sfax, Tunisia hichem.karray@ieee.org mohamed_salah@laposte.net

More information

Real-Time Content-Based Adaptive Streaming of Sports Videos

Real-Time Content-Based Adaptive Streaming of Sports Videos Real-Time Content-Based Adaptive Streaming of Sports Videos Shih-Fu Chang, Di Zhong, and Raj Kumar Digital Video and Multimedia Group ADVENT University/Industry Consortium Columbia University December

More information

Video shot segmentation using late fusion technique

Video shot segmentation using late fusion technique Video shot segmentation using late fusion technique by C. Krishna Mohan, N. Dhananjaya, B.Yegnanarayana in Proc. Seventh International Conference on Machine Learning and Applications, 2008, San Diego,

More information

AIIA shot boundary detection at TRECVID 2006

AIIA shot boundary detection at TRECVID 2006 AIIA shot boundary detection at TRECVID 6 Z. Černeková, N. Nikolaidis and I. Pitas Artificial Intelligence and Information Analysis Laboratory Department of Informatics Aristotle University of Thessaloniki

More information

Video Summarization Using MPEG-7 Motion Activity and Audio Descriptors

Video Summarization Using MPEG-7 Motion Activity and Audio Descriptors Video Summarization Using MPEG-7 Motion Activity and Audio Descriptors Ajay Divakaran, Kadir A. Peker, Regunathan Radhakrishnan, Ziyou Xiong and Romain Cabasson Presented by Giulia Fanti 1 Overview Motivation

More information

Region Feature Based Similarity Searching of Semantic Video Objects

Region Feature Based Similarity Searching of Semantic Video Objects Region Feature Based Similarity Searching of Semantic Video Objects Di Zhong and Shih-Fu hang Image and dvanced TV Lab, Department of Electrical Engineering olumbia University, New York, NY 10027, US {dzhong,

More information

HIERARCHICAL VIDEO SUMMARIES BY DENDROGRAM CLUSTER ANALYSIS

HIERARCHICAL VIDEO SUMMARIES BY DENDROGRAM CLUSTER ANALYSIS HIERARCHICAL VIDEO SUMMARIES BY DENDROGRAM CLUSTER ANALYSIS Sergio Benini, Aldo Bianchetti, Riccardo Leonardi, Pierangelo Migliorati DEA-SCL, University of Brescia, Via Branze 38, I-25123, Brescia, Italy

More information

Algorithms and System for High-Level Structure Analysis and Event Detection in Soccer Video

Algorithms and System for High-Level Structure Analysis and Event Detection in Soccer Video Algorithms and Sstem for High-Level Structure Analsis and Event Detection in Soccer Video Peng Xu, Shih-Fu Chang, Columbia Universit Aja Divakaran, Anthon Vetro, Huifang Sun, Mitsubishi Electric Advanced

More information

Offering Access to Personalized Interactive Video

Offering Access to Personalized Interactive Video Offering Access to Personalized Interactive Video 1 Offering Access to Personalized Interactive Video Giorgos Andreou, Phivos Mylonas, Manolis Wallace and Stefanos Kollias Image, Video and Multimedia Systems

More information

Detection of goal event in soccer videos

Detection of goal event in soccer videos Detection of goal event in soccer videos Hyoung-Gook Kim, Steffen Roeber, Amjad Samour, Thomas Sikora Department of Communication Systems, Technical University of Berlin, Einsteinufer 17, D-10587 Berlin,

More information

Production of Video Images by Computer Controlled Cameras and Its Application to TV Conference System

Production of Video Images by Computer Controlled Cameras and Its Application to TV Conference System Proc. of IEEE Conference on Computer Vision and Pattern Recognition, vol.2, II-131 II-137, Dec. 2001. Production of Video Images by Computer Controlled Cameras and Its Application to TV Conference System

More information

DUPLICATE DETECTION AND AUDIO THUMBNAILS WITH AUDIO FINGERPRINTING

DUPLICATE DETECTION AND AUDIO THUMBNAILS WITH AUDIO FINGERPRINTING DUPLICATE DETECTION AND AUDIO THUMBNAILS WITH AUDIO FINGERPRINTING Christopher Burges, Daniel Plastina, John Platt, Erin Renshaw, and Henrique Malvar March 24 Technical Report MSR-TR-24-19 Audio fingerprinting

More information

SOUND EVENT DETECTION AND CONTEXT RECOGNITION 1 INTRODUCTION. Toni Heittola 1, Annamaria Mesaros 1, Tuomas Virtanen 1, Antti Eronen 2

SOUND EVENT DETECTION AND CONTEXT RECOGNITION 1 INTRODUCTION. Toni Heittola 1, Annamaria Mesaros 1, Tuomas Virtanen 1, Antti Eronen 2 Toni Heittola 1, Annamaria Mesaros 1, Tuomas Virtanen 1, Antti Eronen 2 1 Department of Signal Processing, Tampere University of Technology Korkeakoulunkatu 1, 33720, Tampere, Finland toni.heittola@tut.fi,

More information

Binju Bentex *1, Shandry K. K 2. PG Student, Department of Computer Science, College Of Engineering, Kidangoor, Kottayam, Kerala, India

Binju Bentex *1, Shandry K. K 2. PG Student, Department of Computer Science, College Of Engineering, Kidangoor, Kottayam, Kerala, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Survey on Summarization of Multiple User-Generated

More information

Semantic Video Indexing

Semantic Video Indexing Semantic Video Indexing T-61.6030 Multimedia Retrieval Stevan Keraudy stevan.keraudy@tkk.fi Helsinki University of Technology March 14, 2008 What is it? Query by keyword or tag is common Semantic Video

More information

An Introduction to Pattern Recognition

An Introduction to Pattern Recognition An Introduction to Pattern Recognition Speaker : Wei lun Chao Advisor : Prof. Jian-jiun Ding DISP Lab Graduate Institute of Communication Engineering 1 Abstract Not a new research field Wide range included

More information

PixSO: A System for Video Shot Detection

PixSO: A System for Video Shot Detection PixSO: A System for Video Shot Detection Chengcui Zhang 1, Shu-Ching Chen 1, Mei-Ling Shyu 2 1 School of Computer Science, Florida International University, Miami, FL 33199, USA 2 Department of Electrical

More information

Multi-level analysis of sports video sequences

Multi-level analysis of sports video sequences Multi-level analysis of sports video sequences Jungong Han a, Dirk Farin a and Peter H. N. de With a,b a University of Technology Eindhoven, 5600MB Eindhoven, The Netherlands b LogicaCMG, RTSE, PO Box

More information

Multimedia Database Systems. Retrieval by Content

Multimedia Database Systems. Retrieval by Content Multimedia Database Systems Retrieval by Content MIR Motivation Large volumes of data world-wide are not only based on text: Satellite images (oil spill), deep space images (NASA) Medical images (X-rays,

More information

Automatic Texture Segmentation for Texture-based Image Retrieval

Automatic Texture Segmentation for Texture-based Image Retrieval Automatic Texture Segmentation for Texture-based Image Retrieval Ying Liu, Xiaofang Zhou School of ITEE, The University of Queensland, Queensland, 4072, Australia liuy@itee.uq.edu.au, zxf@itee.uq.edu.au

More information

Automatic visual recognition for metro surveillance

Automatic visual recognition for metro surveillance Automatic visual recognition for metro surveillance F. Cupillard, M. Thonnat, F. Brémond Orion Research Group, INRIA, Sophia Antipolis, France Abstract We propose in this paper an approach for recognizing

More information

Automatic Video Caption Detection and Extraction in the DCT Compressed Domain

Automatic Video Caption Detection and Extraction in the DCT Compressed Domain Automatic Video Caption Detection and Extraction in the DCT Compressed Domain Chin-Fu Tsao 1, Yu-Hao Chen 1, Jin-Hau Kuo 1, Chia-wei Lin 1, and Ja-Ling Wu 1,2 1 Communication and Multimedia Laboratory,

More information

DATA and signal modeling for images and video sequences. Region-Based Representations of Image and Video: Segmentation Tools for Multimedia Services

DATA and signal modeling for images and video sequences. Region-Based Representations of Image and Video: Segmentation Tools for Multimedia Services IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 9, NO. 8, DECEMBER 1999 1147 Region-Based Representations of Image and Video: Segmentation Tools for Multimedia Services P. Salembier,

More information

AUTOMATIC VIDEO INDEXING

AUTOMATIC VIDEO INDEXING AUTOMATIC VIDEO INDEXING Itxaso Bustos Maite Frutos TABLE OF CONTENTS Introduction Methods Key-frame extraction Automatic visual indexing Shot boundary detection Video OCR Index in motion Image processing

More information

Object of interest discovery in video sequences

Object of interest discovery in video sequences Object of interest discovery in video sequences A Design Project Report Presented to Engineering Division of the Graduate School Of Cornell University In Partial Fulfillment of the Requirements for the

More information

THE AMOUNT of digital video is immeasurable and continues

THE AMOUNT of digital video is immeasurable and continues 538 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 7, NO. 3, JUNE 2005 Joint Scene Classification and Segmentation Based on Hidden Markov Model Jincheng Huang, Member, IEEE, Zhu Liu, Senior Member, IEEE, and Yao

More information

A SHOT BOUNDARY DETECTION TECHNIQUE BASED ON LOCAL COLOR MOMENTS IN YC B C R COLOR SPACE

A SHOT BOUNDARY DETECTION TECHNIQUE BASED ON LOCAL COLOR MOMENTS IN YC B C R COLOR SPACE A SHOT BOUNDARY DETECTION TECHNIQUE BASED ON LOCAL COLOR MOMENTS IN YC B C R COLOR SPACE S.A.Angadi 1 and Vilas Naik 2 1 Department of Computer Science Engineering, Basaveshwar Engineering College,Bagalkot

More information

ABSTRACT 1. INTRODUCTION

ABSTRACT 1. INTRODUCTION ABSTRACT A Framework for Multi-Agent Multimedia Indexing Bernard Merialdo Multimedia Communications Department Institut Eurecom BP 193, 06904 Sophia-Antipolis, France merialdo@eurecom.fr March 31st, 1995

More information

Suspicious Activity Detection of Moving Object in Video Surveillance System

Suspicious Activity Detection of Moving Object in Video Surveillance System International Journal of Latest Engineering and Management Research (IJLEMR) ISSN: 2455-4847 ǁ Volume 1 - Issue 5 ǁ June 2016 ǁ PP.29-33 Suspicious Activity Detection of Moving Object in Video Surveillance

More information

TRADITIONAL adaptation approaches transform the video

TRADITIONAL adaptation approaches transform the video IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 8, NO. 3, JUNE 2006 433 Semantic Adaptation of Sport Videos With User-Centred Performance Analysis Marco Bertini, Rita Cucchiara, Member, IEEE, Alberto Del Bimbo,

More information

Lesson 11. Media Retrieval. Information Retrieval. Image Retrieval. Video Retrieval. Audio Retrieval

Lesson 11. Media Retrieval. Information Retrieval. Image Retrieval. Video Retrieval. Audio Retrieval Lesson 11 Media Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval Information Retrieval Retrieval = Query + Search Informational Retrieval: Get required information from database/web

More information

MULTIVIEW REPRESENTATION OF 3D OBJECTS OF A SCENE USING VIDEO SEQUENCES

MULTIVIEW REPRESENTATION OF 3D OBJECTS OF A SCENE USING VIDEO SEQUENCES MULTIVIEW REPRESENTATION OF 3D OBJECTS OF A SCENE USING VIDEO SEQUENCES Mehran Yazdi and André Zaccarin CVSL, Dept. of Electrical and Computer Engineering, Laval University Ste-Foy, Québec GK 7P4, Canada

More information

HIERARCHICAL VISUAL DESCRIPTION SCHEMES FOR STILL IMAGES AND VIDEO SEQUENCES

HIERARCHICAL VISUAL DESCRIPTION SCHEMES FOR STILL IMAGES AND VIDEO SEQUENCES HIERARCHICAL VISUAL DESCRIPTION SCHEMES FOR STILL IMAGES AND VIDEO SEQUENCES Universitat Politècnica de Catalunya Barcelona, SPAIN philippe@gps.tsc.upc.es P. Salembier, N. O Connor 2, P. Correia 3 and

More information

Blur Space Iterative De-blurring

Blur Space Iterative De-blurring Blur Space Iterative De-blurring RADU CIPRIAN BILCU 1, MEJDI TRIMECHE 2, SAKARI ALENIUS 3, MARKKU VEHVILAINEN 4 1,2,3,4 Multimedia Technologies Laboratory, Nokia Research Center Visiokatu 1, FIN-33720,

More information

Context based optimal shape coding

Context based optimal shape coding IEEE Signal Processing Society 1999 Workshop on Multimedia Signal Processing September 13-15, 1999, Copenhagen, Denmark Electronic Proceedings 1999 IEEE Context based optimal shape coding Gerry Melnikov,

More information

IST MPEG-4 Video Compliant Framework

IST MPEG-4 Video Compliant Framework IST MPEG-4 Video Compliant Framework João Valentim, Paulo Nunes, Fernando Pereira Instituto de Telecomunicações, Instituto Superior Técnico, Av. Rovisco Pais, 1049-001 Lisboa, Portugal Abstract This paper

More information

ADAPTIVE PICTURE SLICING FOR DISTORTION-BASED CLASSIFICATION OF VIDEO PACKETS

ADAPTIVE PICTURE SLICING FOR DISTORTION-BASED CLASSIFICATION OF VIDEO PACKETS ADAPTIVE PICTURE SLICING FOR DISTORTION-BASED CLASSIFICATION OF VIDEO PACKETS E. Masala, D. Quaglia, J.C. De Martin Λ Dipartimento di Automatica e Informatica/ Λ IRITI-CNR Politecnico di Torino, Italy

More information

Scene Change Detection Based on Twice Difference of Luminance Histograms

Scene Change Detection Based on Twice Difference of Luminance Histograms Scene Change Detection Based on Twice Difference of Luminance Histograms Xinying Wang 1, K.N.Plataniotis 2, A. N. Venetsanopoulos 1 1 Department of Electrical & Computer Engineering University of Toronto

More information

Video Syntax Analysis

Video Syntax Analysis 1 Video Syntax Analysis Wei-Ta Chu 2008/10/9 Outline 2 Scene boundary detection Key frame selection 3 Announcement of HW #1 Shot Change Detection Goal: automatic shot change detection Requirements 1. Write

More information

One category of visual tracking. Computer Science SURJ. Michael Fischer

One category of visual tracking. Computer Science SURJ. Michael Fischer Computer Science Visual tracking is used in a wide range of applications such as robotics, industrial auto-control systems, traffic monitoring, and manufacturing. This paper describes a new algorithm for

More information

Detecting motion by means of 2D and 3D information

Detecting motion by means of 2D and 3D information Detecting motion by means of 2D and 3D information Federico Tombari Stefano Mattoccia Luigi Di Stefano Fabio Tonelli Department of Electronics Computer Science and Systems (DEIS) Viale Risorgimento 2,

More information

Saliency Detection for Videos Using 3D FFT Local Spectra

Saliency Detection for Videos Using 3D FFT Local Spectra Saliency Detection for Videos Using 3D FFT Local Spectra Zhiling Long and Ghassan AlRegib School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA ABSTRACT

More information

Motion and Tracking. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE)

Motion and Tracking. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE) Motion and Tracking Andrea Torsello DAIS Università Ca Foscari via Torino 155, 30172 Mestre (VE) Motion Segmentation Segment the video into multiple coherently moving objects Motion and Perceptual Organization

More information

Latent Variable Models for Structured Prediction and Content-Based Retrieval

Latent Variable Models for Structured Prediction and Content-Based Retrieval Latent Variable Models for Structured Prediction and Content-Based Retrieval Ariadna Quattoni Universitat Politècnica de Catalunya Joint work with Borja Balle, Xavier Carreras, Adrià Recasens, Antonio

More information

Comparison of Sequence Matching Techniques for Video Copy Detection

Comparison of Sequence Matching Techniques for Video Copy Detection Comparison of Sequence Matching Techniques for Video Copy Detection Arun Hampapur a, Ki-Ho Hyun b and Ruud Bolle a a IBM T.J Watson Research Center, 3 Saw Mill River Road, Hawthorne, NY 1532, USA b School

More information

Research on Construction of Road Network Database Based on Video Retrieval Technology

Research on Construction of Road Network Database Based on Video Retrieval Technology Research on Construction of Road Network Database Based on Video Retrieval Technology Fengling Wang 1 1 Hezhou University, School of Mathematics and Computer Hezhou Guangxi 542899, China Abstract. Based

More information

Real-Time Detection of Sport in MPEG-2 Sequences using High-Level AV-Descriptors and SVM

Real-Time Detection of Sport in MPEG-2 Sequences using High-Level AV-Descriptors and SVM Real-Time Detection of Sport in MPEG-2 Sequences using High-Level AV-Descriptors and SVM Ronald Glasberg 1, Sebastian Schmiedee 2, Hüseyin Oguz 3, Pascal Kelm 4 and Thomas Siora 5 Communication Systems

More information

Outline Introduction MPEG-2 MPEG-4. Video Compression. Introduction to MPEG. Prof. Pratikgiri Goswami

Outline Introduction MPEG-2 MPEG-4. Video Compression. Introduction to MPEG. Prof. Pratikgiri Goswami to MPEG Prof. Pratikgiri Goswami Electronics & Communication Department, Shree Swami Atmanand Saraswati Institute of Technology, Surat. Outline of Topics 1 2 Coding 3 Video Object Representation Outline

More information

Neural Network based textural labeling of images in multimedia applications

Neural Network based textural labeling of images in multimedia applications Neural Network based textural labeling of images in multimedia applications S.A. Karkanis +, G.D. Magoulas +, and D.A. Karras ++ + University of Athens, Dept. of Informatics, Typa Build., Panepistimiopolis,

More information

Multimedia Systems Video II (Video Coding) Mahdi Amiri April 2012 Sharif University of Technology

Multimedia Systems Video II (Video Coding) Mahdi Amiri April 2012 Sharif University of Technology Course Presentation Multimedia Systems Video II (Video Coding) Mahdi Amiri April 2012 Sharif University of Technology Video Coding Correlation in Video Sequence Spatial correlation Similar pixels seem

More information

CHAPTER 3 SHOT DETECTION AND KEY FRAME EXTRACTION

CHAPTER 3 SHOT DETECTION AND KEY FRAME EXTRACTION 33 CHAPTER 3 SHOT DETECTION AND KEY FRAME EXTRACTION 3.1 INTRODUCTION The twenty-first century is an age of information explosion. We are witnessing a huge growth in digital data. The trend of increasing

More information

Motion in 2D image sequences

Motion in 2D image sequences Motion in 2D image sequences Definitely used in human vision Object detection and tracking Navigation and obstacle avoidance Analysis of actions or activities Segmentation and understanding of video sequences

More information

Automatic Parameter Adaptation for Multi-Object Tracking

Automatic Parameter Adaptation for Multi-Object Tracking Automatic Parameter Adaptation for Multi-Object Tracking Duc Phu CHAU, Monique THONNAT, and François BREMOND {Duc-Phu.Chau, Monique.Thonnat, Francois.Bremond}@inria.fr STARS team, INRIA Sophia Antipolis,

More information

Data Hiding in Video

Data Hiding in Video Data Hiding in Video J. J. Chae and B. S. Manjunath Department of Electrical and Computer Engineering University of California, Santa Barbara, CA 9316-956 Email: chaejj, manj@iplab.ece.ucsb.edu Abstract

More information

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Stefan Müller, Gerhard Rigoll, Andreas Kosmala and Denis Mazurenok Department of Computer Science, Faculty of

More information

A NOVEL FEATURE EXTRACTION METHOD BASED ON SEGMENTATION OVER EDGE FIELD FOR MULTIMEDIA INDEXING AND RETRIEVAL

A NOVEL FEATURE EXTRACTION METHOD BASED ON SEGMENTATION OVER EDGE FIELD FOR MULTIMEDIA INDEXING AND RETRIEVAL A NOVEL FEATURE EXTRACTION METHOD BASED ON SEGMENTATION OVER EDGE FIELD FOR MULTIMEDIA INDEXING AND RETRIEVAL Serkan Kiranyaz, Miguel Ferreira and Moncef Gabbouj Institute of Signal Processing, Tampere

More information

Hybrid Video Compression Using Selective Keyframe Identification and Patch-Based Super-Resolution

Hybrid Video Compression Using Selective Keyframe Identification and Patch-Based Super-Resolution 2011 IEEE International Symposium on Multimedia Hybrid Video Compression Using Selective Keyframe Identification and Patch-Based Super-Resolution Jeffrey Glaister, Calvin Chan, Michael Frankovich, Adrian

More information

Fast trajectory matching using small binary images

Fast trajectory matching using small binary images Title Fast trajectory matching using small binary images Author(s) Zhuo, W; Schnieders, D; Wong, KKY Citation The 3rd International Conference on Multimedia Technology (ICMT 2013), Guangzhou, China, 29

More information

Textural Features for Image Database Retrieval

Textural Features for Image Database Retrieval Textural Features for Image Database Retrieval Selim Aksoy and Robert M. Haralick Intelligent Systems Laboratory Department of Electrical Engineering University of Washington Seattle, WA 98195-2500 {aksoy,haralick}@@isl.ee.washington.edu

More information

Overlay Text Detection and Recognition for Soccer Game Indexing

Overlay Text Detection and Recognition for Soccer Game Indexing Overlay Text Detection and Recognition for Soccer Game Indexing J. Ngernplubpla and O. Chitsophuk, Member, IACSIT Abstract In this paper, new multiresolution overlaid text detection and recognition is

More information

Key Frame Extraction and Indexing for Multimedia Databases

Key Frame Extraction and Indexing for Multimedia Databases Key Frame Extraction and Indexing for Multimedia Databases Mohamed AhmedˆÃ Ahmed Karmouchˆ Suhayya Abu-Hakimaˆˆ ÃÃÃÃÃÃÈÃSchool of Information Technology & ˆˆÃ AmikaNow! Corporation Engineering (SITE),

More information

A Texture Feature Extraction Technique Using 2D-DFT and Hamming Distance

A Texture Feature Extraction Technique Using 2D-DFT and Hamming Distance A Texture Feature Extraction Technique Using 2D-DFT and Hamming Distance Author Tao, Yu, Muthukkumarasamy, Vallipuram, Verma, Brijesh, Blumenstein, Michael Published 2003 Conference Title Fifth International

More information

Text-Independent Speaker Identification

Text-Independent Speaker Identification December 8, 1999 Text-Independent Speaker Identification Til T. Phan and Thomas Soong 1.0 Introduction 1.1 Motivation The problem of speaker identification is an area with many different applications.

More information

SEMANTIC ANNOTATION AND TRANSCODING FOR SPORT VIDEOS

SEMANTIC ANNOTATION AND TRANSCODING FOR SPORT VIDEOS SEMANTIC ANNOTATION AND TRANSCODING FOR SPORT VIDEOS M. Bertini, A. Del Bimbo D.S.I. - Università di Firenze - Italy bertini,delbimbo@dsi.unifi.it A. Prati, R. Cucchiara D.I.I. - Università di Modena e

More information

AN EFFECTIVE APPROACH FOR VIDEO COPY DETECTION USING SIFT FEATURES

AN EFFECTIVE APPROACH FOR VIDEO COPY DETECTION USING SIFT FEATURES AN EFFECTIVE APPROACH FOR VIDEO COPY DETECTION USING SIFT FEATURES Miss. S. V. Eksambekar 1 Prof. P.D.Pange 2 1, 2 Department of Electronics & Telecommunication, Ashokrao Mane Group of Intuitions, Wathar

More information

Frequency Band Coding Mode Selection for Key Frames of Wyner-Ziv Video Coding

Frequency Band Coding Mode Selection for Key Frames of Wyner-Ziv Video Coding 2009 11th IEEE International Symposium on Multimedia Frequency Band Coding Mode Selection for Key Frames of Wyner-Ziv Video Coding Ghazaleh R. Esmaili and Pamela C. Cosman Department of Electrical and

More information

HAND-GESTURE BASED FILM RESTORATION

HAND-GESTURE BASED FILM RESTORATION HAND-GESTURE BASED FILM RESTORATION Attila Licsár University of Veszprém, Department of Image Processing and Neurocomputing,H-8200 Veszprém, Egyetem u. 0, Hungary Email: licsara@freemail.hu Tamás Szirányi

More information

Bridging the Gap Between Local and Global Approaches for 3D Object Recognition. Isma Hadji G. N. DeSouza

Bridging the Gap Between Local and Global Approaches for 3D Object Recognition. Isma Hadji G. N. DeSouza Bridging the Gap Between Local and Global Approaches for 3D Object Recognition Isma Hadji G. N. DeSouza Outline Introduction Motivation Proposed Methods: 1. LEFT keypoint Detector 2. LGS Feature Descriptor

More information

Programming-By-Example Gesture Recognition Kevin Gabayan, Steven Lansel December 15, 2006

Programming-By-Example Gesture Recognition Kevin Gabayan, Steven Lansel December 15, 2006 Programming-By-Example Gesture Recognition Kevin Gabayan, Steven Lansel December 15, 6 Abstract Machine learning and hardware improvements to a programming-by-example rapid prototyping system are proposed.

More information

Tamil Video Retrieval Based on Categorization in Cloud

Tamil Video Retrieval Based on Categorization in Cloud Tamil Video Retrieval Based on Categorization in Cloud V.Akila, Dr.T.Mala Department of Information Science and Technology, College of Engineering, Guindy, Anna University, Chennai veeakila@gmail.com,

More information

Lecture 12: Video Representation, Summarisation, and Query

Lecture 12: Video Representation, Summarisation, and Query Lecture 12: Video Representation, Summarisation, and Query Dr Jing Chen NICTA & CSE UNSW CS9519 Multimedia Systems S2 2006 jchen@cse.unsw.edu.au Last week Structure of video Frame Shot Scene Story Why

More information

Spatial Scene Level Shape Error Concealment for Segmented Video

Spatial Scene Level Shape Error Concealment for Segmented Video Spatial Scene Level Shape Error Concealment for Segmented Video Luis Ducla Soares 1, Fernando Pereira 2 1 Instituto Superior de Ciências do Trabalho e da Empresa Instituto de Telecomunicações, Lisboa,

More information

AN EFFICIENT VIDEO WATERMARKING USING COLOR HISTOGRAM ANALYSIS AND BITPLANE IMAGE ARRAYS

AN EFFICIENT VIDEO WATERMARKING USING COLOR HISTOGRAM ANALYSIS AND BITPLANE IMAGE ARRAYS AN EFFICIENT VIDEO WATERMARKING USING COLOR HISTOGRAM ANALYSIS AND BITPLANE IMAGE ARRAYS G Prakash 1,TVS Gowtham Prasad 2, T.Ravi Kumar Naidu 3 1MTech(DECS) student, Department of ECE, sree vidyanikethan

More information

Tracking of video objects using a backward projection technique

Tracking of video objects using a backward projection technique Tracking of video objects using a backward projection technique Stéphane Pateux IRISA/INRIA, Temics Project Campus Universitaire de Beaulieu 35042 Rennes Cedex, FRANCE ABSTRACT In this paper, we present

More information

WATERMARKING FOR LIGHT FIELD RENDERING 1

WATERMARKING FOR LIGHT FIELD RENDERING 1 ATERMARKING FOR LIGHT FIELD RENDERING 1 Alper Koz, Cevahir Çığla and A. Aydın Alatan Department of Electrical and Electronics Engineering, METU Balgat, 06531, Ankara, TURKEY. e-mail: koz@metu.edu.tr, cevahir@eee.metu.edu.tr,

More information

Image Quality Assessment Techniques: An Overview

Image Quality Assessment Techniques: An Overview Image Quality Assessment Techniques: An Overview Shruti Sonawane A. M. Deshpande Department of E&TC Department of E&TC TSSM s BSCOER, Pune, TSSM s BSCOER, Pune, Pune University, Maharashtra, India Pune

More information