Video Summarization and Browsing Using Growing Cell Structures
|
|
- Beverly Wilkinson
- 5 years ago
- Views:
Transcription
1 Video Summarization and Browsing Using Growing Cell Structures Irena Koprinska and James Clark School of Information Technologies, The University of Sydney Madsen Building F09, Sydney NSW 2006, Australia {irena, Abstract We present a new approach for video summarization and browsing of MPEG-2 compressed video based on the Growing Cell Structures (GCS) neural algorithms. It first applies GCS to select keyframes for each shot and then clusters them using TreeGCS to form a hierarchical view of the video for efficient browsing. The keyframe selection is based on histogram features of the dc-images for I frames. It captures well the video content and outperforms two other approaches. The main advantage of the TreeGCS module is the ability to form dynamically a flexible hierarchy depending on the video content. I. INTRODUCTION Recent developments in computing performance, multimedia compression and communication technologies have made possible the creation of digital video archives. Applications such as video-on-demand, digital TV, digital libraries generate and use large collections of video data. It is also expected that the storage of digital video at home will soon overtake the current analogue systems. However, unlike the document databases that use keywords to quickly access data, video databases still lack techniques for efficient organization, searching and retrieval. Text-based video organization based on manual annotation is highly inefficient, subjective, and time consuming. Recently, some content-based prototype systems [1,3,11,12] have been developed to automatically organize video in order to provide fast and meaningful nonlinear access to the relevant material in video. The generally accepted approach first breaks up the video stream into temporally homogeneous segments called shots [9]. Each shot is then represented by one or more keyframes. The shots are indexed typically using spatial features extracted from the keyframes (e.g. color, texture, shape) and also temporal features extracted from the shot (e.g. motion, camera operations) and are organised into a sequential or hierarchical structure of keyframes. This representation allows the user to browse the content of the video and search quickly for sub-sequences of interest without the need to watch the entire video. The user can also query the video database. The retrieval is based on the similarity between the feature vector of the query and feature vectors representing the shots. Clustering has been successfully applied for both keyframe selection and video organization. Ferman et al. [4] cluster the frames within each shot using an iterative 3-means algorithm and select as a keyframe the frame closest to the cenntroid of the larger cluster. As the content of the shot may change significantly due to camera operations and object motion, in their subsequent work [3] the clustering algorithm is modified to extract more than one keyframe. Two keyframes are extracted for clusters with high intercluster variance (the closest and the farthest to the centroid). Frames with large deviations from the average luminance of the shot are selected as keyframes as well. Girgenson and Boreczky [7] applied an agglomerative hierarchical clustering to extract a predefined number of keyframes and used them to represent videotaped meetings and presentations. Drew and Au [2] proposed a new feature based on color histograms and then applied an agglomerative hierarchical clustering that merges clusters based on cluster variance and temporal distance. Yeung et al. [12] select one keyframe for each shot and then cluster them based on visual content and temporal distance to create a scene-transition graph. In ViBE [1] each shot is represented with a tree induced by hierarchical clustering of the frames. The video is then organised into a three level similarity pyramid where each level contains groups of similar shots organized into a two dimensional grid. The pyramid is created by clustering of the shots based on temporal, motion, pseudosemantic and shot-tree distance. In this paper we present a new approach for keyframe selection and video browsing based on the Growing Cell Structures (GCS) neural algorithm. We use GCS to select keyframes representing the content of each shot. GCS finds the number of keyframes in unsupervised fashion, and also maps similar frames to neighboring nodes offering a better indication of the shot s structure. TreeGCS is then used to cluster the shots and provide a high-level hierarchical video representation. The main advantage over existing systems is that it creates a flexible hierarchical representation of the
2 video content, i.e. the number of layers in the hierarchy and the number of clusters in each layer will depend on the video content and do not have to be pre-specified in advance. In addition, similar clusters (of frames or shots) are mapped onto neighboring nodes that makes browsing more convenient. Our system operates directly on MPEG-2 compressed video that allows faster operations and smaller storage requirements. II. GROWING CELL STRUCTURE ALGORITHMS A. GCS GCS [5] is an incremental self-organizing neural algorithm, an extension of Kohonen s self-organising maps (SOM) [10]. It generates a mapping from a high dimensional input data to a lower (typically two-dimensional) space. The main advantage of such a mapping is that it allows to gain insight into the structure of the data due to two important properties: topology preservation (similar inputs are mapped onto neighboring neurons) and density preservation (regions of high input density are mapped on neural structures with more neurons). An important advantage over SOM and most of the classical clustering algorithms (e.g. k-means) is that GCS is able automatically to find a suitable network size and structure, i.e. does not require the number of clusters to be specified in advance. This is achieved through a process of controlled growing and removal of nodes. Unlike SOM, in GCS the number of neighboring neurons connected to a given neuron is not fixed. Finally, GCS is able to form discrete clusters, while in SOM the clusters remain connected and to find their boundaries is not always easy. The GCS algorithm we implemented starts with a randomly initialised triangle of neurons. At each iteration the best matching unit and its topological neighbours are adapted toward the input vector. There is no cooling schedule'' as in SOM, where neighbourhood size and learning rate decrease with time. New neurons are inserted at positions with high errors when the current structure under represents the input data distribution. Superfluous neurons are deleted from regions with low probability distribution. It is important that the deletion step maintains the consistency of the triangular structure. To ensure this we have implemented a simpler heuristic than Fritzke s tetrahedron based. The algorithm iterates until the stopping criteria is satisfied (maximum number of epochs or network size is reached). Fritzke has also demonstrated superior performance of GCS over SOM in terms of topology preservation and distribution-modelling error [6]. The algorithm requires 7 user specified parameters: maximum number of neurons or training epochs, insertion period, deletion period, learning rates for the winner ebmu and its neighbourhood e i, and error decay factors and. Fig. 1. GCS simulation results on four square shaped data B. TreeGCS TreeGCS [8] is a hierarchical clustering algorithm that is based on GCS. It maps high dimensional input vectors onto a multi-depth two dimensional hierarchy that preserves the topological ordering of the input space. The tree is generated dynamically and adapts to the underlying GCS structure. Initially the root of the tree points to one cluster that contains the initial GCS network. A split in the cluster results in adding a new node in the tree (Fig. 2). When clusters are deleted, the associated tree nodes are deleted and the resulting redundancies (if any) are removed. Our implementation follows strictly the original algorithm apart from the introduction of a hierarchy generation threshold (in [8] the tree is generated at the end of each GCS epoch). This threshold is the only user specified parameter. Fig. 2. Creating new nodes in TreeGCS when a cluster subdivides III. VIDEOGCS A. Data Pre-processing and Feature Extraction Since MPEG was established as an international standard for compression of digital video, video is increasingly stored and moved in compressed format. This motivates the
3 development of methods that process directly compressed video due to the computational and storage savings (no need to decode/re-encode the video) and faster operations (lower data rate of compressed video). Our system operates directly on MPEG-2 encoded video. MPEG-2 uses mackroblock based motion compensation to reduce temporal redundancy and block-based Discrete Cosine Transform (DCT) to reduce spatial redundancy. The only information that is available in the compressed stream is the DCT coefficients of intra coded blocks or residual errors, and also the motion vectors. Our system uses the DC terms (i.e. the 0 frequency term of the DCT coefficients) of intra-coded (I) frames. As each DC term is a scaled version of the block's average value, spatially reduced versions of the original images, called dc-images [11], can be constructed. The (i,j) pixel of the dc-image is the average value of the (i,j) block of the image (Fig.3 ). For each dc-image we compute the 16-bin grayscale histogram. Histograms have been successfully used as image representation as they are less sensitive to object movement, image rotation or variations in viewing angle and scale. decompressed them. After that we concatenated them using cuts (in the order shown in Table I) to form one long sequence. This sequence was then MPEG-2 compressed and the DC terms were extracted. A total of 30 shot boundaries were detected, 11 of them gradual and the other abrupt. Thus the total number of shots was 31. The size of the original video frames was 352x240 pixels, hence the size of their dc-images was 44x30 pixels. TABLE I VIDEO STATISTICS sequence # frames # shots Canada day 768 4: s0-s3 Capilano 778 2: s4-s5 Dragon boat 705 6: s6-s11 Jazz 577 3: s12-s14 Professor 130 1: s15 Steam clock 648 1: s16 Walk with dragon 795 3: s17-s19 Aqua : s20-s26 Beach 740 4: s27-s30 Fig. 3. A full image (352x288 pixels) and its dc image (44x36 pixels) B. Keyframe Selection and Video Representation After shot boundary detection, we use GCS to cluster the frames in each shot based on their 16-dimensional feature vectors. Depending on the content of the shot, GCS forms different number of discrete clusters. For each of them, the keyframe closest to the centroid is selected as a keyframe. Because GCS is preserving topology, similar frames are mapped to neighboring neurons. The selected keyframes are further clustered using TreeGCS to create a hierarchical view of the video sequence allowing the user to browse at different level of content. The depth of the hierarchy and number of nodes in each level depend on the video content. Each node corresponds to a cluster of similar shots, and can be represented by one single keyframe chosen as described above. The bottom level nodes are associated with clusters of similar shots that are mapped on a 2-dimensional GCS grid allowing efficient visualization and browsing. IV. EXPERIMENTS A. Video Sequences We used 9 video sequences available from [14] and previously used for keyframe selection evaluation (Table I). As the videos were originally MPEG-1 compressed, we B. GCS and TreeGCS Parameters The following GCS parameters were used: number of iterations=20000, insertion period=200, deletion period=2000, error decay factors =1, =0.0004, learning rates: e bmu = 0.06, e i = The hierarchy period of TreeGCS was set to 500. Our preliminary experiments showed that the ratio between the insertion and deletion periods is important. Before a deletion is performed, the GCS network has to grow sufficiently. This ensures that the clusters are not formed prematurely. C. Keyframe Selection Results The keyframe selection results of GCS are summarized in Table II. The column Correct indicates correct humanproduced results. It should be noted that our correct keyframes are slightly different than those reported in [2] for the following sequences: Capilano (1 less keyframe in the first shot), Dragon boat (2 less in the last shot) and Aquarium (3 more: 1 more in shot 1, 1 for shot 2 that was a missed shot in [2] and 1 more in shot 6). A comparison of GCS with two other approaches HistInt [3] and Signatures [2] is presented in Table IV based on the results reported in [2]. Both approaches used color histograms and work on uncompressed video. Some examples are shown in Fig.4-6 (for GCS the grayscale dcimages are shown).
4 TABLE II NUMBER OF KEYFRAMES GENERATED BY GCS sequence correct generated redundant missed Canada day Capilano Dragon boat Jazz Professor Steam clock Walk dragon Aqua Beach Total a) Correct (4 keyframes) b) GCS (6 keyframes) c) HistInt (18 keyframes) a) Correct (9 keyframes) d) Signatures (5 keyframes) Fig. 6. Keyframe selection for the sequence Walk with the dragon b) GCS (10 keyframes) c) HistInt (36 keyframes) d) Signatures (4 keyframes) Fig. 4. Keyframe selection for the sequence Aqua Overall GCS performs well and typically selects 1 keyframe for the low activity shots and several keyframes for the high activity shots. It misses just 3 keyframes but generates 13 redundant. In half of the cases these redundancies occur in sequences involving panning and zooming. For example, GCS typically generates two keyframes instead of one for shot 3 of Canada day (there is a small zoom and object tracking) and shot 1 (pan), and also for shot 3 of Beach (zoom and object movement). As the image (and its corresponding histogram) changes, GCS generates a new keyframe. But as the semantics does not change, the human does not select a new keyframe. In the other cases the redundant keyframes are selected for small and low activity shots. For example, two similar keyframes are generated for shot 4 of Aqua (Fig.4), and three for shot 1 of Walk with the dragon. This happens because GCS always splits the cluster after a pre-specified number of iterations regardless of its quality. This drawback can be eliminated by modifying the GCS deletion step. a) Correct b) GCS c) Signatures (1 keyframe) d) HistInt (6 keyframes) Fig. 5. Keyframe selection for the sequence Steam TABLE III DEFINITION OF RECALL, PRECISION AND F1 MEASURE keyframes # assigned as correct # not assigned as correct # correct tp fn (missed) # not correct fp (redundant) tn tp tp PR P, R, F1 2 tp fp tp fn P R
5 Nevertheless, GCS compares well with the other two approaches. HistInt tends to generate too many keyframes, while Signatures is able to generate a compact representation but there are many misses and redundancies. We have also calculated Recall (R), Precision (P) and F1 measure that are standard performance measures in information retrieval (Table III). As it can be seen from Table IV, overall GCS is the best approach. features characterizing each shot (based on the motion vectors that are directly available in the MPEG-2 steam), temporal features that prevent too distant keyframes to be grouped together and also more semantically rich components such as text captions and teletex. The open framework also allows using different distance metrics, e.g. the histograms can be compared with the widely used chi squared test. TABLE IV KEYFRAME SELECTION COMPARISON corr ect gener ated redun dant miss ed R [%] P [%] F1 [%] GCS Hist Int Signa tures D. Hierarchical Video Representation Results The generated hierarchical representation by clustering of the keyframes using TreeGCS is shown in Fig. 7. It has organized the keyframes (and the shots they represent) into a 3-level structure. Each node corresponds to a cluster of similar shots, and can be represented by one single keyframe. As it can be seen, the keyframes are grouped into two main clusters based on their gray-level histogram: lighter and darker. These two clusters are further split into 3 and 2 subclusters of similar shots, respectively. Similar sub-clusters appear close to each other in the tree. The biggest sub-cluster (sub-cluster 5) is less homogeneous than the others; if TreeGCS had been trained longer, it would have split it into further sub-clusters. The number of neurons in the five GCS grids was 8, 8, 11, 18 and 43, respectively. Within each of these bottom level clusters, similar keyframes were mapped to neighboring neurons in the GCS grid. The keyframe closest to the cluster centroid was selected as a keyframe representing the cluster of similar shots (the framed pictures: s17, s6, s19, s16 and s27). Similarly, keyframes can also be chosen for the two nodes at level 1. Thus, the resulting structure will allow the user to browse the video at different levels of detail. The quality of the video summarization crucially depends on the quality of the features extracted to represent each shot. As the example shows, while the keyframe histogram is a useful feature it may not be enough to capture well the semantics of the video and allow efficient retrieval. Highlevel semantic features would provide more useful description but their automated extraction is an open research problem. One of the advantages of clustering-based keyframe selection and video organization is that new features can be easily incorporated. We plan to investigate the use of motion Fig. 7. Hierarchical video representation The main advantage of the hierarchical representation used in VideoGCS is the ability dynamically to form a hierarchy where the number of layers and clusters in them depend on the video content. In the existing systems for video summarization the structure is fixed. For example, in [1] a three level hierarchy is used with a fixed number of clusters in each level (e.g. 4, 16, 54). In [13] the number of levels and clusters in them was also pre-determined. The agglomerative hierarchical clustering approaches used in [7,12] generate dendrograms that cannot be visualized for large data sets and require a selection of pre-defined number of nodes. TreeGCS also provides good visualization due to the underlaying GCS algorithm that maps high dimensional inputs to a twodimensional grid that is topology and density preserving. In contrast to SOM, it is able to automatically find the cluster boundaries. V. CONCLUSION In this paper we have presented a new approach for video summarization and browsing based on the GCS neural algorithms. The system VideoGCS process directly MPEG-2 compressed video. It applies GCS to select keyframes for each shot and then clusters them using TreeGCS to form a hierarchical view of the video content for efficient browsing. The results show that the keyframe selection module captures well the salient video content and outperforms two other approaches. The generated hierarchy based on the grayscale histogram of keyframes is useful but it does not capture the
6 video semantics. However, an advantage of the TreeGCS module over the existing systems is its ability to dynamically form a flexible hierarchy that depends on the video content. Future work will include modification of the GCS algorithm to reduce the number of redundant keyframes for small and low activity shots, and also integration of complementary low-level and semantic features to improve summarization. Another interesting direction for future research is to apply VideoGCS for creating video summaries on-line as both GCS and TreeGCS can be used in an on-line mode. ACKNOWLEDGMENT This work was supported by SESQUI grant Video Segmentation and Summarization from the University of Sydney. We are very grateful to Damien McMonigal for the extraction of the dc-images. REFERENCES [1] J.-Y. Chen, G. Taskiran, A. Albiol, E.J. Delp and C. Bouman, ViBE: A Compressed Video Database Structured for Active Browsing and Search,, IEEE Trans. Multimedia, [2] M. S. Drew and J. Au, Video Keyframe Production by Efficient Clustering of Compressed Chromaticity Signatures, ACM Multimedia, [3] A.M. Ferman and A.M. Tekalp, Efficient Filtering and Clustering Methods for Temporal Video Representtaion and Visual Summarization,, J. Visual Commun. & Image Rep., vol. 9, pp , [4] A.M. Ferman and A.M. Tekalp, Multiscale Context Extraction and Representtaion for Video Indexing, SPIE 3229, pp.23-31, [5] B. Fritzke, Growing Cell Structures a Self-Organizing Network for Unsupervised and Supervised Learning,, Neural Networks, vol.7(9), pp , [6] B. Fritzke, Kohonen feature maps and Growing Cell Structures A Performance Comparison, Adv. Neural Info. Processing, [7] A. Girgensohn and J. Boreczky, Time-constrained Keyframe Selection Technique, Multim. Tools & Appl, v.11, pp , [8] V.J. Hodge and J. Austin, "Hierarchical Growing Cell Structures: TreeGCS, IEEE Trans Know& Data Eng, v.13(2), pp , [9] I. Koprinska and S. Carrato, Temporal Video segmentation: A Survey,, Signal Processing: Image Commun, v.16, pp , [10] T. Kohonen, Self-Organizing Maps, 2d ed., Springer-Verlag, [11] B. Yeo and B.-L. Liu, Rapid scene Analysis on Compressed Video,, IEE Trans Circuits Sys Video tech, v.5(6), pp , [12] M. Yeung and B.-L. Yeo, Segmentation of Video by Clustering and Graph Analysis, Comp.Vis.& Image Und., v.71(1), pp , [13] D. Zhong, H. Zhang, and S.-F. Chang, Clustering Methods for Video Browsing and Annotation,, SPIR-2670, pp , [14]
Navidgator. Similarity Based Browsing for Image & Video Databases. Damian Borth, Christian Schulze, Adrian Ulges, Thomas M. Breuel
Navidgator Similarity Based Browsing for Image & Video Databases Damian Borth, Christian Schulze, Adrian Ulges, Thomas M. Breuel Image Understanding and Pattern Recognition DFKI & TU Kaiserslautern 25.Sep.2008
More informationSearching Video Collections:Part I
Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia Representation Visual Features (Still Images and Image Sequences) Color Texture Shape Edges Objects, Motion
More informationPixSO: A System for Video Shot Detection
PixSO: A System for Video Shot Detection Chengcui Zhang 1, Shu-Ching Chen 1, Mei-Ling Shyu 2 1 School of Computer Science, Florida International University, Miami, FL 33199, USA 2 Department of Electrical
More informationAutomatic Video Caption Detection and Extraction in the DCT Compressed Domain
Automatic Video Caption Detection and Extraction in the DCT Compressed Domain Chin-Fu Tsao 1, Yu-Hao Chen 1, Jin-Hau Kuo 1, Chia-wei Lin 1, and Ja-Ling Wu 1,2 1 Communication and Multimedia Laboratory,
More informationNOVEL APPROACH TO CONTENT-BASED VIDEO INDEXING AND RETRIEVAL BY USING A MEASURE OF STRUCTURAL SIMILARITY OF FRAMES. David Asatryan, Manuk Zakaryan
International Journal "Information Content and Processing", Volume 2, Number 1, 2015 71 NOVEL APPROACH TO CONTENT-BASED VIDEO INDEXING AND RETRIEVAL BY USING A MEASURE OF STRUCTURAL SIMILARITY OF FRAMES
More informationDATA and signal modeling for images and video sequences. Region-Based Representations of Image and Video: Segmentation Tools for Multimedia Services
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 9, NO. 8, DECEMBER 1999 1147 Region-Based Representations of Image and Video: Segmentation Tools for Multimedia Services P. Salembier,
More informationScene Change Detection Based on Twice Difference of Luminance Histograms
Scene Change Detection Based on Twice Difference of Luminance Histograms Xinying Wang 1, K.N.Plataniotis 2, A. N. Venetsanopoulos 1 1 Department of Electrical & Computer Engineering University of Toronto
More informationTHE PROLIFERATION of multimedia material, while offering
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 6, NO. 1, FEBRUARY 2004 103 ViBE: A Compressed Video Database Structured for Active Browsing and Search Cuneyt Taskiran, Student Member, IEEE, Jau-Yuen Chen, Alberto
More informationControlling the spread of dynamic self-organising maps
Neural Comput & Applic (2004) 13: 168 174 DOI 10.1007/s00521-004-0419-y ORIGINAL ARTICLE L. D. Alahakoon Controlling the spread of dynamic self-organising maps Received: 7 April 2004 / Accepted: 20 April
More informationA Rapid Scheme for Slow-Motion Replay Segment Detection
A Rapid Scheme for Slow-Motion Replay Segment Detection Wei-Hong Chuang, Dun-Yu Hsiao, Soo-Chang Pei, and Homer Chen Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan 10617,
More informationA Robust Wipe Detection Algorithm
A Robust Wipe Detection Algorithm C. W. Ngo, T. C. Pong & R. T. Chin Department of Computer Science The Hong Kong University of Science & Technology Clear Water Bay, Kowloon, Hong Kong Email: fcwngo, tcpong,
More informationA Miniature-Based Image Retrieval System
A Miniature-Based Image Retrieval System Md. Saiful Islam 1 and Md. Haider Ali 2 Institute of Information Technology 1, Dept. of Computer Science and Engineering 2, University of Dhaka 1, 2, Dhaka-1000,
More informationVideo Key-Frame Extraction using Entropy value as Global and Local Feature
Video Key-Frame Extraction using Entropy value as Global and Local Feature Siddu. P Algur #1, Vivek. R *2 # Department of Information Science Engineering, B.V. Bhoomraddi College of Engineering and Technology
More informationImage Classification Using Wavelet Coefficients in Low-pass Bands
Proceedings of International Joint Conference on Neural Networks, Orlando, Florida, USA, August -7, 007 Image Classification Using Wavelet Coefficients in Low-pass Bands Weibao Zou, Member, IEEE, and Yan
More informationClustering Methods for Video Browsing and Annotation
Clustering Methods for Video Browsing and Annotation Di Zhong, HongJiang Zhang 2 and Shih-Fu Chang* Institute of System Science, National University of Singapore Kent Ridge, Singapore 05 *Center for Telecommunication
More informationAbout MPEG Compression. More About Long-GOP Video
About MPEG Compression HD video requires significantly more data than SD video. A single HD video frame can require up to six times more data than an SD frame. To record such large images with such a low
More informationProfessor Laurence S. Dooley. School of Computing and Communications Milton Keynes, UK
Professor Laurence S. Dooley School of Computing and Communications Milton Keynes, UK How many bits required? 2.4Mbytes 84Kbytes 9.8Kbytes 50Kbytes Data Information Data and information are NOT the same!
More informationScalable Hierarchical Summarization of News Using Fidelity in MPEG-7 Description Scheme
Scalable Hierarchical Summarization of News Using Fidelity in MPEG-7 Description Scheme Jung-Rim Kim, Seong Soo Chun, Seok-jin Oh, and Sanghoon Sull School of Electrical Engineering, Korea University,
More informationImage Segmentation Techniques for Object-Based Coding
Image Techniques for Object-Based Coding Junaid Ahmed, Joseph Bosworth, and Scott T. Acton The Oklahoma Imaging Laboratory School of Electrical and Computer Engineering Oklahoma State University {ajunaid,bosworj,sacton}@okstate.edu
More informationTwo-step Modified SOM for Parallel Calculation
Two-step Modified SOM for Parallel Calculation Two-step Modified SOM for Parallel Calculation Petr Gajdoš and Pavel Moravec Petr Gajdoš and Pavel Moravec Department of Computer Science, FEECS, VŠB Technical
More informationCopyright Detection System for Videos Using TIRI-DCT Algorithm
Research Journal of Applied Sciences, Engineering and Technology 4(24): 5391-5396, 2012 ISSN: 2040-7467 Maxwell Scientific Organization, 2012 Submitted: March 18, 2012 Accepted: June 15, 2012 Published:
More informationLecture 12: Video Representation, Summarisation, and Query
Lecture 12: Video Representation, Summarisation, and Query Dr Jing Chen NICTA & CSE UNSW CS9519 Multimedia Systems S2 2006 jchen@cse.unsw.edu.au Last week Structure of video Frame Shot Scene Story Why
More informationReview and Implementation of DWT based Scalable Video Coding with Scalable Motion Coding.
Project Title: Review and Implementation of DWT based Scalable Video Coding with Scalable Motion Coding. Midterm Report CS 584 Multimedia Communications Submitted by: Syed Jawwad Bukhari 2004-03-0028 About
More informationCHAPTER 3 SHOT DETECTION AND KEY FRAME EXTRACTION
33 CHAPTER 3 SHOT DETECTION AND KEY FRAME EXTRACTION 3.1 INTRODUCTION The twenty-first century is an age of information explosion. We are witnessing a huge growth in digital data. The trend of increasing
More informationBinju Bentex *1, Shandry K. K 2. PG Student, Department of Computer Science, College Of Engineering, Kidangoor, Kottayam, Kerala, India
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Survey on Summarization of Multiple User-Generated
More informationVideo De-interlacing with Scene Change Detection Based on 3D Wavelet Transform
Video De-interlacing with Scene Change Detection Based on 3D Wavelet Transform M. Nancy Regina 1, S. Caroline 2 PG Scholar, ECE, St. Xavier s Catholic College of Engineering, Nagercoil, India 1 Assistant
More informationLearning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009
Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer
More informationAnalysis of Image and Video Using Color, Texture and Shape Features for Object Identification
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 16, Issue 6, Ver. VI (Nov Dec. 2014), PP 29-33 Analysis of Image and Video Using Color, Texture and Shape Features
More informationChapter 12 On-Line Image and Video Data Processing
Chapter 12 On-Line Image and Video Data Processing Nik Kasabov nkasabov@aut.ac.nz, www.kedri.info 12/16/2002 Nik Kasabov - Evolving Connectionist Systems Overview On-line colour quantisation On-line image
More informationCERIAS Tech Report ViBE: A Compressed Video Database Structured for Active Browsing and Search by J Chen, C Taskiran, A Albiol, E Delp, C
CERIAS Tech Report 2004-117 ViBE: A Compressed Video Database Structured for Active Browsing and Search by J Chen, C Taskiran, A Albiol, E Delp, C Bouman Center for Education and Research Information Assurance
More informationContent Based Image Retrieval Using Color Quantizes, EDBTC and LBP Features
Content Based Image Retrieval Using Color Quantizes, EDBTC and LBP Features 1 Kum Sharanamma, 2 Krishnapriya Sharma 1,2 SIR MVIT Abstract- To describe the image features the Local binary pattern (LBP)
More informationCompression of Stereo Images using a Huffman-Zip Scheme
Compression of Stereo Images using a Huffman-Zip Scheme John Hamann, Vickey Yeh Department of Electrical Engineering, Stanford University Stanford, CA 94304 jhamann@stanford.edu, vickey@stanford.edu Abstract
More informationChapter 3 Image Registration. Chapter 3 Image Registration
Chapter 3 Image Registration Distributed Algorithms for Introduction (1) Definition: Image Registration Input: 2 images of the same scene but taken from different perspectives Goal: Identify transformation
More informationHybrid Video Compression Using Selective Keyframe Identification and Patch-Based Super-Resolution
2011 IEEE International Symposium on Multimedia Hybrid Video Compression Using Selective Keyframe Identification and Patch-Based Super-Resolution Jeffrey Glaister, Calvin Chan, Michael Frankovich, Adrian
More informationRecall precision graph
VIDEO SHOT BOUNDARY DETECTION USING SINGULAR VALUE DECOMPOSITION Λ Z.»CERNEKOVÁ, C. KOTROPOULOS AND I. PITAS Aristotle University of Thessaloniki Box 451, Thessaloniki 541 24, GREECE E-mail: (zuzana, costas,
More informationData Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Descriptive model A descriptive model presents the main features of the data
More informationAlgorithms and System for High-Level Structure Analysis and Event Detection in Soccer Video
Algorithms and Sstem for High-Level Structure Analsis and Event Detection in Soccer Video Peng Xu, Shih-Fu Chang, Columbia Universit Aja Divakaran, Anthon Vetro, Huifang Sun, Mitsubishi Electric Advanced
More informationTreeGNG - Hierarchical Topological Clustering
TreeGNG - Hierarchical Topological lustering K..J.Doherty,.G.dams, N.Davey Department of omputer Science, University of Hertfordshire, Hatfield, Hertfordshire, L10 9, United Kingdom {K..J.Doherty,.G.dams,
More informationHierarchical Clustering 4/5/17
Hierarchical Clustering 4/5/17 Hypothesis Space Continuous inputs Output is a binary tree with data points as leaves. Useful for explaining the training data. Not useful for making new predictions. Direction
More informationUnsupervised Learning and Clustering
Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)
More informationViBE: A New Paradigm for Video Database Browsing and Search
ViBE: A New Paradigm for Video Database Browsing and Search Jau-Yuen Chen, Cüneyt Taşkiran, Edward J. Delp and Charles A. Bouman Electronic Imaging Systems Laboratory (EISL) Video and Image Processing
More informationSegmentation of Images
Segmentation of Images SEGMENTATION If an image has been preprocessed appropriately to remove noise and artifacts, segmentation is often the key step in interpreting the image. Image segmentation is a
More informationCluster Analysis. Ying Shen, SSE, Tongji University
Cluster Analysis Ying Shen, SSE, Tongji University Cluster analysis Cluster analysis groups data objects based only on the attributes in the data. The main objective is that The objects within a group
More informationRepresentation of 2D objects with a topology preserving network
Representation of 2D objects with a topology preserving network Francisco Flórez, Juan Manuel García, José García, Antonio Hernández, Departamento de Tecnología Informática y Computación. Universidad de
More informationTexture Classification by Combining Local Binary Pattern Features and a Self-Organizing Map
Texture Classification by Combining Local Binary Pattern Features and a Self-Organizing Map Markus Turtinen, Topi Mäenpää, and Matti Pietikäinen Machine Vision Group, P.O.Box 4500, FIN-90014 University
More informationTopological Correlation
Topological Correlation K.A.J. Doherty, R.G. Adams and and N. Davey University of Hertfordshire, Department of Computer Science College Lane, Hatfield, Hertfordshire, UK Abstract. Quantifying the success
More informationImage Segmentation. 1Jyoti Hazrati, 2Kavita Rawat, 3Khush Batra. Dronacharya College Of Engineering, Farrukhnagar, Haryana, India
Image Segmentation 1Jyoti Hazrati, 2Kavita Rawat, 3Khush Batra Dronacharya College Of Engineering, Farrukhnagar, Haryana, India Dronacharya College Of Engineering, Farrukhnagar, Haryana, India Global Institute
More informationVideo shot segmentation using late fusion technique
Video shot segmentation using late fusion technique by C. Krishna Mohan, N. Dhananjaya, B.Yegnanarayana in Proc. Seventh International Conference on Machine Learning and Applications, 2008, San Diego,
More informationDifferential Compression and Optimal Caching Methods for Content-Based Image Search Systems
Differential Compression and Optimal Caching Methods for Content-Based Image Search Systems Di Zhong a, Shih-Fu Chang a, John R. Smith b a Department of Electrical Engineering, Columbia University, NY,
More informationCompression of Light Field Images using Projective 2-D Warping method and Block matching
Compression of Light Field Images using Projective 2-D Warping method and Block matching A project Report for EE 398A Anand Kamat Tarcar Electrical Engineering Stanford University, CA (anandkt@stanford.edu)
More informationA 3-D Virtual SPIHT for Scalable Very Low Bit-Rate Embedded Video Compression
A 3-D Virtual SPIHT for Scalable Very Low Bit-Rate Embedded Video Compression Habibollah Danyali and Alfred Mertins University of Wollongong School of Electrical, Computer and Telecommunications Engineering
More informationLecture 10: Semantic Segmentation and Clustering
Lecture 10: Semantic Segmentation and Clustering Vineet Kosaraju, Davy Ragland, Adrien Truong, Effie Nehoran, Maneekwan Toyungyernsub Department of Computer Science Stanford University Stanford, CA 94305
More informationKey Frame Extraction and Indexing for Multimedia Databases
Key Frame Extraction and Indexing for Multimedia Databases Mohamed AhmedˆÃ Ahmed Karmouchˆ Suhayya Abu-Hakimaˆˆ ÃÃÃÃÃÃÈÃSchool of Information Technology & ˆˆÃ AmikaNow! Corporation Engineering (SITE),
More informationTexture Segmentation by Windowed Projection
Texture Segmentation by Windowed Projection 1, 2 Fan-Chen Tseng, 2 Ching-Chi Hsu, 2 Chiou-Shann Fuh 1 Department of Electronic Engineering National I-Lan Institute of Technology e-mail : fctseng@ccmail.ilantech.edu.tw
More informationVideo Compression An Introduction
Video Compression An Introduction The increasing demand to incorporate video data into telecommunications services, the corporate environment, the entertainment industry, and even at home has made digital
More informationVideo Representation. Video Analysis
BROWSING AND RETRIEVING VIDEO CONTENT IN A UNIFIED FRAMEWORK Yong Rui, Thomas S. Huang and Sharad Mehrotra Beckman Institute for Advanced Science and Technology University of Illinois at Urbana-Champaign
More informationLesson 3. Prof. Enza Messina
Lesson 3 Prof. Enza Messina Clustering techniques are generally classified into these classes: PARTITIONING ALGORITHMS Directly divides data points into some prespecified number of clusters without a hierarchical
More informationInteractive Video Retrieval System Integrating Visual Search with Textual Search
From: AAAI Technical Report SS-03-08. Compilation copyright 2003, AAAI (www.aaai.org). All rights reserved. Interactive Video Retrieval System Integrating Visual Search with Textual Search Shuichi Shiitani,
More informationUnsupervised Learning
Unsupervised Learning Unsupervised learning Until now, we have assumed our training samples are labeled by their category membership. Methods that use labeled samples are said to be supervised. However,
More informationContent-Based Image Retrieval of Web Surface Defects with PicSOM
Content-Based Image Retrieval of Web Surface Defects with PicSOM Rami Rautkorpi and Jukka Iivarinen Helsinki University of Technology Laboratory of Computer and Information Science P.O. Box 54, FIN-25
More informationMultimedia Systems Video II (Video Coding) Mahdi Amiri April 2012 Sharif University of Technology
Course Presentation Multimedia Systems Video II (Video Coding) Mahdi Amiri April 2012 Sharif University of Technology Video Coding Correlation in Video Sequence Spatial correlation Similar pixels seem
More information[Gidhane* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116
IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY AN EFFICIENT APPROACH FOR TEXT MINING USING SIDE INFORMATION Kiran V. Gaidhane*, Prof. L. H. Patil, Prof. C. U. Chouhan DOI: 10.5281/zenodo.58632
More informationMachine Learning. Unsupervised Learning. Manfred Huber
Machine Learning Unsupervised Learning Manfred Huber 2015 1 Unsupervised Learning In supervised learning the training data provides desired target output for learning In unsupervised learning the training
More informationClustering CS 550: Machine Learning
Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf
More informationVideo Compression MPEG-4. Market s requirements for Video compression standard
Video Compression MPEG-4 Catania 10/04/2008 Arcangelo Bruna Market s requirements for Video compression standard Application s dependent Set Top Boxes (High bit rate) Digital Still Cameras (High / mid
More informationRange Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation
Obviously, this is a very slow process and not suitable for dynamic scenes. To speed things up, we can use a laser that projects a vertical line of light onto the scene. This laser rotates around its vertical
More informationSelf-Organizing Maps for cyclic and unbounded graphs
Self-Organizing Maps for cyclic and unbounded graphs M. Hagenbuchner 1, A. Sperduti 2, A.C. Tsoi 3 1- University of Wollongong, Wollongong, Australia. 2- University of Padova, Padova, Italy. 3- Hong Kong
More informationCHAPTER 3 TUMOR DETECTION BASED ON NEURO-FUZZY TECHNIQUE
32 CHAPTER 3 TUMOR DETECTION BASED ON NEURO-FUZZY TECHNIQUE 3.1 INTRODUCTION In this chapter we present the real time implementation of an artificial neural network based on fuzzy segmentation process
More informationA Robust Video Hash Scheme Based on. 2D-DCT Temporal Maximum Occurrence
A Robust Video Hash Scheme Based on 1 2D-DCT Temporal Maximum Occurrence Qian Chen, Jun Tian, and Dapeng Wu Abstract In this paper, we propose a video hash scheme that utilizes image hash and spatio-temporal
More informationChapter 11.3 MPEG-2. MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps Defined seven profiles aimed at different applications:
Chapter 11.3 MPEG-2 MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps Defined seven profiles aimed at different applications: Simple, Main, SNR scalable, Spatially scalable, High, 4:2:2,
More informationClustering. CS294 Practical Machine Learning Junming Yin 10/09/06
Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,
More informationInternational Journal of Emerging Technology and Advanced Engineering Website: (ISSN , Volume 2, Issue 4, April 2012)
A Technical Analysis Towards Digital Video Compression Rutika Joshi 1, Rajesh Rai 2, Rajesh Nema 3 1 Student, Electronics and Communication Department, NIIST College, Bhopal, 2,3 Prof., Electronics and
More informationDigital Image Stabilization and Its Integration with Video Encoder
Digital Image Stabilization and Its Integration with Video Encoder Yu-Chun Peng, Hung-An Chang, Homer H. Chen Graduate Institute of Communication Engineering National Taiwan University Taipei, Taiwan {b889189,
More informationSoftware Design Document
ÇANKAYA UNIVERSITY Software Design Document Content Based Video Segmentation Berk Can Özütemiz-201311049, Ece Nalçacı-201411040, Engin Öztürk-201311049 28/12/2017 Table of Contents 1. INTRODUCTION... 3
More information5. Hampapur, A., Jain, R., and Weymouth, T., Digital Video Segmentation, Proc. ACM Multimedia 94, San Francisco, CA, October, 1994, pp
5. Hampapur, A., Jain, R., and Weymouth, T., Digital Video Segmentation, Proc. ACM Multimedia 94, San Francisco, CA, October, 1994, pp. 357-364. 6. Kasturi, R. and Jain R., Dynamic Vision, in Computer
More informationResearch on Construction of Road Network Database Based on Video Retrieval Technology
Research on Construction of Road Network Database Based on Video Retrieval Technology Fengling Wang 1 1 Hezhou University, School of Mathematics and Computer Hezhou Guangxi 542899, China Abstract. Based
More informationA Fast Method for Textual Annotation of Compressed Video
A Fast Method for Textual Annotation of Compressed Video Amit Jain and Subhasis Chaudhuri Department of Electrical Engineering Indian Institute of Technology, Bombay, Mumbai - 400076. INDIA. ajain,sc @ee.iitb.ac.in
More informationcoding of various parts showing different features, the possibility of rotation or of hiding covering parts of the object's surface to gain an insight
Three-Dimensional Object Reconstruction from Layered Spatial Data Michael Dangl and Robert Sablatnig Vienna University of Technology, Institute of Computer Aided Automation, Pattern Recognition and Image
More informationAIIA shot boundary detection at TRECVID 2006
AIIA shot boundary detection at TRECVID 6 Z. Černeková, N. Nikolaidis and I. Pitas Artificial Intelligence and Information Analysis Laboratory Department of Informatics Aristotle University of Thessaloniki
More informationImage Segmentation Techniques
A Study On Image Segmentation Techniques Palwinder Singh 1, Amarbir Singh 2 1,2 Department of Computer Science, GNDU Amritsar Abstract Image segmentation is very important step of image analysis which
More informationIMAGE DENOISING TO ESTIMATE THE GRADIENT HISTOGRAM PRESERVATION USING VARIOUS ALGORITHMS
IMAGE DENOISING TO ESTIMATE THE GRADIENT HISTOGRAM PRESERVATION USING VARIOUS ALGORITHMS P.Mahalakshmi 1, J.Muthulakshmi 2, S.Kannadhasan 3 1,2 U.G Student, 3 Assistant Professor, Department of Electronics
More informationRegion Feature Based Similarity Searching of Semantic Video Objects
Region Feature Based Similarity Searching of Semantic Video Objects Di Zhong and Shih-Fu hang Image and dvanced TV Lab, Department of Electrical Engineering olumbia University, New York, NY 10027, US {dzhong,
More informationComparative Study of Partial Closed-loop Versus Open-loop Motion Estimation for Coding of HDTV
Comparative Study of Partial Closed-loop Versus Open-loop Motion Estimation for Coding of HDTV Jeffrey S. McVeigh 1 and Siu-Wai Wu 2 1 Carnegie Mellon University Department of Electrical and Computer Engineering
More informationNeTra-V: Towards an Object-based Video Representation
Proc. of SPIE, Storage and Retrieval for Image and Video Databases VI, vol. 3312, pp 202-213, 1998 NeTra-V: Towards an Object-based Video Representation Yining Deng, Debargha Mukherjee and B. S. Manjunath
More informationDepth. Common Classification Tasks. Example: AlexNet. Another Example: Inception. Another Example: Inception. Depth
Common Classification Tasks Recognition of individual objects/faces Analyze object-specific features (e.g., key points) Train with images from different viewing angles Recognition of object classes Analyze
More informationImage Segmentation for Image Object Extraction
Image Segmentation for Image Object Extraction Rohit Kamble, Keshav Kaul # Computer Department, Vishwakarma Institute of Information Technology, Pune kamble.rohit@hotmail.com, kaul.keshav@gmail.com ABSTRACT
More informationPattern based Residual Coding for H.264 Encoder *
Pattern based Residual Coding for H.264 Encoder * Manoranjan Paul and Manzur Murshed Gippsland School of Information Technology, Monash University, Churchill, Vic-3842, Australia E-mail: {Manoranjan.paul,
More informationINTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO
INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO ISO/IEC JTC1/SC29/WG11/ m3110 MPEG 97 February 1998/San
More information15 Data Compression 2014/9/21. Objectives After studying this chapter, the student should be able to: 15-1 LOSSLESS COMPRESSION
15 Data Compression Data compression implies sending or storing a smaller number of bits. Although many methods are used for this purpose, in general these methods can be divided into two broad categories:
More informationObject detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation
Object detection using Region Proposals (RCNN) Ernest Cheung COMP790-125 Presentation 1 2 Problem to solve Object detection Input: Image Output: Bounding box of the object 3 Object detection using CNN
More informationA Geometrical Key-frame Selection Method exploiting Dominant Motion Estimation in Video
A Geometrical Key-frame Selection Method exploiting Dominant Motion Estimation in Video Brigitte Fauvet, Patrick Bouthemy, Patrick Gros 2 and Fabien Spindler IRISA/INRIA 2 IRISA/CNRS Campus Universitaire
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 4
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationChapter 7: Competitive learning, clustering, and self-organizing maps
Chapter 7: Competitive learning, clustering, and self-organizing maps António R. C. Paiva EEL 6814 Spring 2008 Outline Competitive learning Clustering Self-Organizing Maps What is competition in neural
More informationActive Image Database Management Jau-Yuen Chen
Active Image Database Management Jau-Yuen Chen 4.3.2000 1. Application 2. Concept This document is applied to the active image database management. The goal is to provide user with a systematic navigation
More information70 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 6, NO. 1, FEBRUARY ClassView: Hierarchical Video Shot Classification, Indexing, and Accessing
70 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 6, NO. 1, FEBRUARY 2004 ClassView: Hierarchical Video Shot Classification, Indexing, and Accessing Jianping Fan, Ahmed K. Elmagarmid, Senior Member, IEEE, Xingquan
More informationDepth Estimation for View Synthesis in Multiview Video Coding
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Depth Estimation for View Synthesis in Multiview Video Coding Serdar Ince, Emin Martinian, Sehoon Yea, Anthony Vetro TR2007-025 June 2007 Abstract
More informationCHAPTER 4: CLUSTER ANALYSIS
CHAPTER 4: CLUSTER ANALYSIS WHAT IS CLUSTER ANALYSIS? A cluster is a collection of data-objects similar to one another within the same group & dissimilar to the objects in other groups. Cluster analysis
More information28 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 12, NO. 1, JANUARY 2010
28 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 12, NO. 1, JANUARY 2010 Camera Motion-Based Analysis of User Generated Video Golnaz Abdollahian, Student Member, IEEE, Cuneyt M. Taskiran, Member, IEEE, Zygmunt
More informationData Warehousing and Machine Learning
Data Warehousing and Machine Learning Preprocessing Thomas D. Nielsen Aalborg University Department of Computer Science Spring 2008 DWML Spring 2008 1 / 35 Preprocessing Before you can start on the actual
More informationA Two-Level Adaptive Visualization for Information Access to Open-Corpus Educational Resources
A Two-Level Adaptive Visualization for Information Access to Open-Corpus Educational Resources Jae-wook Ahn 1, Rosta Farzan 2, Peter Brusilovsky 1 1 University of Pittsburgh, School of Information Sciences,
More information