Exploring video content structure for hierarchical summarization

Size: px
Start display at page:

Download "Exploring video content structure for hierarchical summarization"

Transcription

1 Multimedia Systems 10: (2004) Digital Object Identifier (DOI) /s Multimedia Systems Exploring video content structure for hierarchical summarization Xingquan Zhu 1, Xindong Wu 1, Jianping Fan 2, Ahmed K. Elmagarmid 3, Walid G. Aref 3 1 Department of Computer Science, University of Vermont, Burlington, VT 05405, USA 2 Department of Computer Science, University of North Carolina, Charlotte, NC 28223, USA 3 Department of Computer Science, Purdue University, W. Lafayette, IN 47907, USA Published online: 15 September 2004 c Springer-Verlag 2004 Abstract. In this paper, we propose a hierarchical video summarization strategy that explores video content structure to provide the users with a scalable, multilevel video summary. First, video-shot- segmentation and keyframe-extraction algorithms are applied to parse video sequences into physical shots and discrete keyframes. Next, an affinity (self-correlation) matrix is constructed to merge visually similar shots into clusters (supergroups). Since video shots with high similarities do not necessarily imply that they belong to the same story unit, temporal information is adopted by merging temporally adjacent shots (within a specified distance) from the supergroup into each video group. A video-scene-detection algorithm is thus proposed to merge temporally or spatially correlated video groups into scenario units. This is followed by a scene-clustering algorithm that eliminates visual redundancy among the units. A hierarchical video content structure with increasing granularity is constructed from the clustered scenes, video scenes, and video groups to keyframes. Finally, we introduce a hierarchical video summarization scheme by executing various approaches at different levels of the video content hierarchy to statically or dynamically construct the video summary. Extensive experiments based on real-world videos have been performed to validate the effectiveness of the proposed approach. Keywords: Hierarchical video summarization Video content hierarchy Video group detection Video scene detection Hierarchical clustering 1 Introduction Owing to the decreased cost of storage devices, higher transmission rates, and improved compression techniques, digital videos are becoming available at an ever-increasing rate. Correspondence to: X. Zhu ( xqzhu@cs.uvm.edu) This research has been supported by the NSF under grants EIA, IIS, EIA, and IIS, a grant from the state of Indiana 21 th Century Fund, and by the U.S. Army Research Laboratory and the U.S.Army Research Office under grant DAAD However, the manner in which video content is presented for access (such as browsing and retrieval) has become a challenging task, both for video application systems and for users. Some approaches have been described in [1 3] for presenting the visual content of the video through hierarchical browsing, storyboard posters, etc. The users can quickly browse through a video sequence, navigate from one segment to another to rapidly get an overview of the video content, and zoom to different levels of details to locate segments of interest. Research in the literature [3] has shown that, on average, there are about 200 shots for a 30-min video clip across various program types, such as news and drama. Assuming that one keyframe is selected to represent each shot, the presentation of 200 frames will impose a significant burden in terms of bandwidth and time. Using spatially reduced images, commonly known as thumbnail images, one can reduce the size, but it will still be expensive if all shots must be shown for a quick browse of the content. Hence, a video summarization strategy is needed to provide users with a compact video digest that shows only parts of the video shots. Generally, a video summary is defined as a sequence of still or moving pictures (with or without audio) presenting the content of a video in such a way that the respective target group is rapidly provided with concise information about the content, while the essential message of the original is preserved [4]. Three kinds of video summary styles are commonly used: A pictorial summary [3,7 13,18,19,25,30,31] is a collection of still images (possibly varying in size) arranged in time. The use of compact images provides the user with an overview of the video directly and efficiently; however, the motion and audio information of the video is lost. A video skimming [14 16,21 24,30] is a collection of video clips (e.g., video shots) arranged in a time series. Compared with still pictorial summaries, video skimming is more attractive to the users, since both motion and audio signals are preserved. However, the amount of time required for viewing a skimming suggests that skimmed video is not appropriate for quick browsing. A data distribution map [20 22] is a synthetic picture that illustrates the distribution of some specific data in the database. Due to the inherent complexity of images and videos, using pictorial frames or video skimming to

2 X. Zhu et al.: Exploring video content structure for hierarchical summarization 99 abstract a video database may be impractical. The data distribution map, therefore, appears to be an efficient mechanism for providing users with an overview of the whole dataset. Unfortunately, such a distribution map may not be as friendly and intuitive as pictorial frames or skimming, since synthetic points or curves may not make sense for naïve users. A survey of video summarization is given in [5]. When users are provided with a compact video digest, they can obtain video content quickly and comprehensively. Moreover, the power of visual summaries can be helpful in many applications such as multimedia archives, video retrieval, home entertainment, and digital magazines. To achieve a suitable video summarization for the various applications above, we introduce a hierarchical video summarization strategy that uses pictorial frames for presentation of the video summaries. In comparison with most approaches that use only visual and temporal clustering, we integrate the video content structure and visual redundancy among videos. Moreover, much emphasis has been put on scalable and hierarchical video summarization. The final results consist of various levels of static or dynamic summaries that address the video content in different granularities (Sect. 2, Definition 1). The rest of the paper is organized as follows. In the next section, we review related work on video summarization. Our overall system architecture is presented in Sect. 3. In Sect. 4, we introduce techniques on video content hierarchy detection that are used to construct video content structure from keyframes, groups, and scenes to clustered scenes. With the constructed content structure as a basis, in Sect. 5 we discuss a hierarchical video summarization scheme that uses the hierarchical clustering algorithm to construct a video summary at different levels. In addition to supporting the passive browsing of statically constructed video summaries, a dynamic summarization scheme has also been integrated to allow users to specify their preferred summary length. In Sect. 6, the effectiveness of the proposed approach is validated by using realworld videos. Concluding remarks are given in Sect Related work In practical terms, video summarization is an inseparable research area for many video applications, including video indexing, browsing, and retrieval [1,2,29]. The earliest developed research on video summarization is perhaps best exemplified in Video Magnifier [6], where video summaries were created from a series of low-resolution frames that were uniformly sampled from the original video sequence at certain time intervals. The obvious drawback of such an arrangement is the loss of content in short but important videos. Accordingly, various research efforts have addressed the generation of video overviews using more suitable approaches [7 25]. Basically, the quality of a generated video summary is influenced by the following factors: 1. How the selected summary units are visualized. 2. What kinds of units are selected to generate the summary. 3. Whether the summary is static or dynamic (Definitions 1 and 2 below). Since most research assumes the video summary is generated for efficient browsing, indexing, and navigating, the pictorial summary is popularly used to visualize the video summary via icon frames [3,7,11 13,25,30,31], mosaic images [18, 19,34], video objects [8 10], or a data distribution map [20 22]. A curve-simplification video-summarization strategy is introduced in [12], which maps each frame into a vector of high- dimensional features and segments the feature curve into units. The video summary is extracted according to the relationship between the units. In Video Manga [7], a pictorial video summary is presented with keyframes in various sizes, where the importance of each keyframe determines its size. Given a camera panning/tilting shot, however, none of the selected keyframes generated by the above strategies captures the underlying dynamics of the shot. Hence, the mosaic-based approach is utilized to generate a synthesized panoramic image that can represent the entire content in an intuitive manner [18,19,34]. Unfortunately, this approach works well only when specific camera motions are contained in the shot, and this critical drawback prevents it from being applied effectively to real-world videos. Instead of using features from the whole image for summarization, object- based strategies have been proposed [8 10]; however, due to the challenging task of object segmentation [17], it might be a long time before impressive results in general videos can be acquired. Instead of abstracting videos by pictorial images, some approaches provide video skimming [14 16,21 24,30], which trims the original video into a short, highlight stream. In [4, 15], the most representative movie segments are extracted to produce a movie trailer. The approach in [24] uses a method that applies different sampling rates to the video to generate a summary. The sample rate is controlled by motion information in the shot. In addition to using visual features, the strategies in [21 23] utilize closed-caption and speech signals to select segments of interest for skimming. Video skimming may be useful for some purposes, since a skimmed video stream is a more appealing method of browsing for users. However, the amount of time required for viewing a skim suggests that it is not appropriate for a quick overview or transmission. Defining what kinds of units should be selected is the most challenging and subjective process in creating video summaries. Many algorithms believe that the scenario (such as a story unit) is more suitable for summarization [3,30,31,42] than others, where video scenarios are detected using timeconstrained clustering [3], speaker or subject change detection [30,31,42], etc. The summary is constructed using the representative frames (or shots) of the story units in the form of a storyboard or a skimming. A summarization scheme that integrates video scenarios outperforms most other strategies since the presentation of scenarios in the summary may be the most effective way to tell the user what the video talks about. Unfortunately, although many videos from our daily life have scenario evolution (i.e., it is possible to detect story units), there are a significant number of videos that have no scenario structure, such as surveillance videos and sports videos. In these cases, a selection of video highlights might be more suitable for summarization. Generally, video highlights are selected by predefined rules, such as selecting frames with the highest contrast [4], close-up shots of leading action, special events like explosions and gunfire [15], frames with faces, camera motions, or text [21 23], or by considering the time

3 100 X. Zhu et al.: Exploring video content structure for hierarchical summarization and data information of the video [14]. On the other hand, since domain knowledge may assist us in acquiring semantic cues, some abstracting strategies have been executed for specific domains, such as home videos [14], stereoscopic videos [8], online presentation videos [16], and medical videos [25, 30,42], where rules and knowledge followed by the videos are used to analyze semantics [26 28]. However, another problem emerges: rather than abstracting the video, highlight strategies usually present some selected important samples that address the video details but not the topic and theme; as a result, a glance at the constructed summary imparts very limited information about video content to users. Moreover, since most highlight selection strategies are domain specific, they cannot be extended to other videos easily. Therefore, compared with scenario-based strategies, the generality of highlight-based strategies is limited. For the generation of static or dynamic summaries,many approaches supply a dynamic scheme that integrates user specifications to dynamically generate a summary online. Usually, a dynamic strategy requires the user to directly or indirectly specify the summary parameters, e.g., the length of the summary [4,8,9,13 15,24] (number of summary frames or time duration) or the types of interesting features for determining the summary frames (e.g., slides, close-ups of people) [44]. In this way, the scalability of the summary is preserved, since each user is supplied with a summary he/she prefers. Unfortunately, a user s specification for the summary is not always reasonable for addressing video content, especially if the user is unfamiliar with the videos in the database. Conversely, the approaches in [3,7,25,30,42] construct static summaries beforehand and supply all users with the same summary (the one that has been optimized by the system). However, static methods have the drawback of depriving the user of scalable summaries, which may be desirable in some cases. A dynamic scheme maintains the scalability of the summary but may not result in the best summary point for naive users, and a static scheme may supply the users with an optimal point in generating the summary, but the scalability is lost. Obviously, neither is perfect, and a new approach that integrates both may provide a better solution. Ideally, a video summary should briefly and concisely present the content of the input video source. It should be shorter than the original, focus on the content, and give the users an appropriate overview of the whole video. However, the problem is that appropriateness varies from user to user, depending on the user s familiarity with the source and genre, and with a given user s particular goal in watching the summary. Hence, hierarchical summarization strategies [13,20, 30,42] have been introduced to support various levels of summary and assist users in determining what is appropriate. In [13], a keyframe-based hierarchical strategy is presented, where keyframes are organized in a hierarchical manner from coarse to fine temporal resolution using a pairwise clustering method. Unfortunately, this strategy is unable to merge frames that are visually similar but temporally separated. By considering video content structure and domain knowledge, a multilevel static summary [30,42] has been constructed for medical video data. This strategy, however, still limits the scalability of the summary by supplying only the static summaries, and, moreover, it works only on medical video data. All the reviews above support the following observations: 1. For efficient presentation and browsing purposes, a pictorial summary may outperform video skimming. 2. Since defining video highlights is a relatively subjective and domain-dependent process, if the video content structure is available, a scenario-based summarization strategy might be a better choice. 3. Due to the obvious problems in static and dynamic summarization, an efficient scheme should integrate the two strategies for maximum benefit. 4. To capture an appropriate summary for a video overview, a hierarchical strategy should be adopted. Based on these observations, we propose a hierarchical video summarization strategy. We use pictorial keyframes to visualize the video content and introduce a hierarchical strategy to construct multilevel summaries, increasing in granularity from top to bottom. Moreover, instead of selecting video highlights, we use the semantically richer video units (scenes and groups) for summarization. In this paper, we concentrate on summarizing videos with content structures. For those having no story lines, the highlight selection scheme might be the best choice for summarization. Before we explore video summarization techniques, it would be worthwhile to define the terminology that will be used in our analysis. Definition 1. A static summarization scheme is an approach that requires no user participation for summary generation. For all users of the same video, this strategy will generate the same summary, which is called a static summary in this paper. Definition 2. A dynamic summarization scheme is an approach where user actions may result in differences in the generated summary. The summary generated by this scheme is called a dynamic summary in this paper. Definition 3. The video content structure is defined as a hierarchy of clustered scenes, video scenes, video groups, and video shots (keyframes), increasing in granularity from top to bottom in addressing video content, as shown in Fig. 1. A content structure can be found in most videos in our daily life. Definition 4.A video shot (denoted by S i ) is the basic element in videos; it records the frames resulting from a single continuous running of the camera, from the moment it is turned on to the moment it is turned off. Definition 5. A keyframe (denoted by K i ) is the frame that is representative to describe the content of a given shot. Depending on the content variance of the shot, one or multiple keyframes can be extracted. Definition 6. A video supergroup (denoted by SG i ) consists of visually similar shots within an entire video. Definition 7.A video group (denoted by G i ) is an intermediate entity between physical shots and semantic scenes and consists of visually similar shots that belong to the same story unit. Definition 8. A video scene (denoted by SE i ) is a collection of semantically related and temporally adjacent groups depicting and conveying a high-level concept or story. Definition 9. A clustered scene (CSE i ) is a collection of visually similar video scenes.

4 X. Zhu et al.: Exploring video content structure for hierarchical summarization 101 CSE1 CSE2 Clustered scenes Clustered scenes consist of visually similar scenes SE1 SE2 SE3 SEM Video scenes Scene Clustering Video scenes consist of temporally interlaced video groups G2 G1 Gi Video groups Scene detection S1 S2 S3 Si SN Video groups consist of visually similar shots (within a certain temporal distance) Video shots and key-frames S1 S2 S3 Si SN Group detection Video shots (Key-frames) Shot segmentation Key-frame extraction Video sequence Video frames Fig. 1. Pictorial video content structure Video content structure detection Hierarchical video summarization Merge the representative frames at level X+4 which belong to the same super-group to generate the highest video summary Summary Level 5+X Clustered scenes Summary Level 4+X CSE 1 CSE 2 CSE 3 Scene clustering Key-frames among clustered scenes Video scenes Summary Level 3+X SE 1 SE 2 SE 3 SE n Key-frames among scenes G1 G n Temporal interlaced video group correlation Summary Level 2+i (i=1, 2,..,X) Video groups Temporal adjacent video group detection Hierarchical clustering among each video group Static summary Dynamic summary Video super-groups Summary Level 2 SG 1 SG 2 SG 3 SG 4 SG n Key-frame grouping Key-frames among super-groups Video shot segmentation & Keyframe extraction Key-frames Digital video database Fig. 2. System architecture S1 S2 S3 S4 S5 S6 Si Sn Key-frames of shots Summary Level 1 video Summary database

5 102 X. Zhu et al.: Exploring video content structure for hierarchical summarization 3 System architecture Figure 2 shows that our system consists of two relatively independent components: video content structure detection and hierarchical video summarization. To explore video content structure, the shot segmentation and keyframe extraction are first applied to parse continuous video sequences into independent shots and discrete keyframes. For all extracted keyframes, an affinity matrix is constructed to explore the correlations among them. With the transferred affinity matrix, the visually similar shots are merged into supergroups. For each supergroup, the video shots with temporal distance smaller than a predefined threshold are then taken as one video group. Generally, a video scene consists of temporally interlaced (or visually similar and neighboring) video groups to convey a relatively independent story unit. Accordingly, the temporal and visual information among them are integrated to cluster video groups into scenes. To eliminate visual redundancy caused by repeated scenes in different parts of the video, the scene-clustering strategy is employed to merge visually similar scenes into unique clusters. As a result, a video content hierarchy from clustered scenes, scenes, groups to keyframes is constructed. Given the detected video content structure, an X +5level video summary (X indicating the dynamic summary level, which is determined by the user s specification of the summary length) is constructed, where the summary becomes more concise and compact from lower to higher levels. The lowest summary level consists of keyframes of all shots. By utilizing the affinity matrix, the visually similar shots are clustered into supergroups. It is likely that any supergroup containing only one shot will convey very limited content information. Hence, by eliminating supergroups with only one shot, the summary at the second level consists of all remaining keyframes. For each supergroup, a predefined threshold is applied to segment the temporally adjacent member shots into groups. Given any video group, the Minimal Spanning Tree [50] algorithm is utilized to cluster keyframes into a hierarchical structure. A dynamic summarization scheme is employed to select a certain number of representative frames from each group to form the user-specified dynamic summary. Beyond the dynamic summary levels, the summary at level X +3is constructed by eliminating groups that contain less than three shots and do not interlace with any other groups, since they probably contain less scenario information. With the scene-clustering scheme, visual redundancy among similar scenes is partially removed. Consequently, the summary at level X +4consists of representative frames of all clustered scenes. It is not unusual for different video scenes to contain visually similar video groups, and eliminating similar groups provides the user with a visually compact summary. Hence, the summary at the highest level is constructed by merging all representative frames at level X+4 that belong to the same supergroup. 4 Video content structure detection Most videos in our daily life have their own content structure, as shown in Fig. 1. While browsing videos, the hidden structure behind frames (scenes, groups, etc.) conveys themes and scenarios to us. Hence, the video summary should also fit with this structure by presenting overviews at various levels and with various granularities. To achieve this goal, we first explore the video content structure. The simplest method of parsing video data for efficient browsing, retrieval, and navigation is segmenting the continuous video sequence into physical shots [35,37,38] and then selecting a constant number of keyframes for each shot to depict its content [1,2,32,43,45,47]. Unfortunately, since the video shot is a physical unit, it is incapable of conveying independent scenario information. To this end, many solutions have been proposed to determine a cluster of video shots that conveys relatively higher- level video semantics. Zhong et. al [33] propose a strategy that clusters visually similar shots and supplies the user with a hierarchical structure for browsing. Similar strategies are found in [11,32,43,44]. However, spatial shot clustering considers only visual similarities, and the video context information is lost. To address this problem, Rui et. al [39] present a method that merges visually similar shots into groups and then constructs a video content table by considering temporal relationships among groups. A similar approach has been reported in [36]. In [41], a time- constrained shot-clustering strategy is proposed to parse temporally adjacent shots into clusters, and a scene transition graph is constructed to detect video scenes by utilizing the acquired cluster information. A temporally time- constrained shot-grouping strategy has also been discussed in [40,42] that considers the similarity between the current shot and the shots preceding and following it (within a small window size) to determine the boundary of the video scene. In a narrow sense, both temporal and spatial clustering schemes above have their disadvantages in exploring video content. The spatial clustering ignores temporal information, so video context information will not be available in the final result. On the other hand, the temporal clustering crucially depends on a predefined distance function that integrates the temporal window size and visual similarity between shots, but this function may not fit with user perceptions very well [39,36,40 42]. In this section, a new method is proposed to detect video content structure. We first discuss techniques to determine shots and keyframes. Then, an affinity matrix is constructed to cluster visually similar shots into supergroups. Although visually similar shots have a high probability of belonging to one scene, similar shots with a large temporal distance often belong to similar scenes in different parts of the video, e.g., similar dialogs between two actors may appear in different parts of the movie. Therefore, in each supergroup, adjacent shots with their temporal distance less than a predefined threshold are segmented into video groups. Due to the fact that video scenes usually consist of interlaced video groups (or visually similar and neighboring groups), a videoscene- detection algorithm is proposed by considering visual and temporal information among video groups. Finally, we apply a scene-clustering scheme that merges visually similar scenes to remove visual redundancy and produces a compact summary. 4.1 Video shot detection and keyframe extraction To support shot-based video content access, we have developed a shot-detection technique [38] that works on MPEG

6 X. Zhu et al.: Exploring video content structure for hierarchical summarization 103 a b Fig. 3. The video shot detection results from a medical video. a Detected video shots. b Corresponding frame difference and the determined threshold for different video shots, where the small window shows the local properties of the frame differences compressed videos. It adaptively adjusts the threshold for shot detection by considering the activities in different video sequences. Unfortunately, such techniques are not able to adapt the thresholds for different video shots within the same sequence. To adapt the thresholds to local activities within the same sequence, we adopt a small window (e.g., 2 3 s), and the threshold for each window is adapted to local visual activity by applying our automatic threshold detection and local activity analysis techniques. Figure 3 demonstrates the results from one video source in our system. As we can see, the threshold has been adapted to small changes between adjacent shots for successful shot segmentation, such as changes between eyeballs from various shots in Fig. 3a. After shot segmentation, the keyframe for each shot is extracted using the following strategy. 1. Given shot S i, its tenth frame is always taken as the keyframe of the shot, and this frame is denoted by K 1. (In this way, we may avoid selecting the gradual effect frames between shots as keyframes). 2. From the position of the most recently select keyframe, sequentially calculate the similarity between the succeeding frames (F i ) and all former selected keyframes K l (l =1, 2,...,N) using Eq. 1. If this value is less than a threshold T keyframe (which is determined by the threshold used to detect the current shot), the current frame F i is taken as a new keyframe and added to the keyframe set. KF mselect(f i )=Max{FmSim(F i,k l ); l =0, 1,...,N}, (1) where FmSim(F i,k l ) indicates the visual similarity between F i and K l, which is defined by Eq Iteratively execute step 2 until all frames in shot S i have been processed. To calculate the visual similarity between frames, three sets of visual features (256-dimensional HSV color histogram, 10- dimensional tamura coarseness, and 8-dimensional tamura directionality texture) are extracted. Suppose H i,j, j [0, 255], TC i,j, j [0, 9], and TD i,j, j [0, 7] are the normalized color histogram, coarseness, and directionality texture of frame F i. The similarity between F i and F j is defined by Eq. 2: 255 FmSim(F i,f j )=W c min(h i,k,h j,k ) +W TC +W TD k=0 1 9 (TC i,k TC j,k ) 2 ) k=0 1 7 (TD i,k TD j,k ) 2 ), (2) k=0 a b Fig. 4. Keyframe extraction results. a Constantly sampled frames in a given shot, where similar frames appear repeatedly (from left to right, top to bottom). b Extracted keyframes with our strategy

7 104 X. Zhu et al.: Exploring video content structure for hierarchical summarization where W C, W TC, and W TD indicate the weight of each feature. We set W C =0.6, W TC =0.2, W TD =0.2 in our system. Figure 4 presents the detected keyframes from a single shot, where Fig. 4a shows constantly sampled frames in the shot (a doctor examines a patient s face) and Fig. 4b denotes the keyframes detected with our strategy. In some cases, similar frames may appear repeatedly in different locations of a shot. The strategy that considers only the similarity between the current frame F i and the most recently selected keyframe [47] may result in redundancies among keyframes. With our strategy, the selected keyframes are those that exhibit enough dissimilarity (low redundancy) and capture the most visual variances of the shot. 4.2 Video supergroup detection After the keyframe extraction, we apply a new strategy to merge visually similar shots into one cluster (supergroup). The following steps will segment the temporally adjacent shots within each supergroup into semantically richer units. Assume N keyframes K l (l =1,...,N) have been extracted from a given video. To explore the correlations among them, an affinity matrix M [48,49,52,53] is constructed, with entries of the matrix given in Eq. 3 M(i, j) =M(j, i) =Exp[ (1 FmSim(K i,k j )) 2 /σ]; i, j [1,N], (3) where FmSim(K i,k j ) is the visual similarity between K i and K j (defined by Eq. 2) and σ is a constant scaling factor for stretching (σ =0.5in our system). In fact, M(i, j) is a selfcorrelation matrix, where the relationship among keyframes of the current video can be visualized and analyzed directly. As shown in Fig. 7c, where the transferred affinity matrix of 105 keyframes is utilized to explore the correlations, the gray intensity in the figure indicates the similarities between elements: the higher the intensity, the greater the similarity. To address the correlations within the affinity matrix M (in the following sections we use M as an abbreviation of M(i, j)) and adopt the Scott and Longuet-Higgins (SLH) [49] relocalization algorithm since it has been successfully utilized in image segmentation [52,53] and video event detection [46]. Given an affinity matrix M and a number k, SLH outputs a newly transferred matrix Q that is calculated in the following steps [48]: 1. Calculate the eigenvector of affinity matrix M, rank the eigenvectors in descending order by their eigenvalues, and denote the ranked eigenvectors by E 1,E 2,.., E N. 2. Construct the affinity matrix W whose columns are the first k eigenvectors of M, i.e., W =[E 1,E 2,.., E k, 0,..0]. 3. Normalize the rows of W with Eq. 4 so that they have the unit Euclidean norm: W (i, ) =W (i, )/ W (i, ). (4) 4. Construct the matrix Q with Q = WW T, i.e., Q i,j = k r=1 W irw jr. Then, each element Q i,j can be interpreted as an enhanced measurement of the interaction between keyframes K i and K j. 5. Segment the points by looking at the elements of Q. Ideally, the larger Q i,j is, the higher the probability that keyframes K i and K j belong to one group. Figure 7c represents the constructed Q matrix of 105 keyframes with the top 10 eigenvectors Q matrix processing As we have indicated, each element of matrix Q provides information about whether a pair of keyframes belongs to the same group. Now, the problem of grouping the keyframes has been reduced to that of sorting the entries of matrix Q by swapping pairs of rows and columns until it becomes block diagonal, with each diagonal indicating one cluster. Hence, the Q matrix processing for keyframe grouping consists of two steps: (1) sorting and (2) diagonal block detection, as shown in Fig. 5. By swapping pairs and rows of matrix Q, highly correlated keyframes are merged together. To do this, we apply an energy maximal resorting process to rearrange the matrix into diagonal block styles, as shown in Fig. 6. Given Q matrix, Q i,j (i = 1,...,N; j = 1,...,N), select one row as the seed for sorting. At each iteration, say, k, the current state is represented by a k k submatrix Q k that contains the sorted keyframes so far. To select a candidate keyframe for resorting, a cost function (Eq. 5) is given by the energy of the first k elements, E k h = k Q 2 l,h for (h = k +1,...,N), (5) l=1 which represents the total energy of interaction between each of the candidate keyframes and the set of already sorted keyframes. By maximizing the cost function Eh k, our search strategy selects the keyframe whose global energy of interaction with the current segmentation is the largest. Q Matrix Sorting Fig. 5. Q matrix processing Current Sorting * k+1 Q Next sorting k Q * k+1 Diagonal blocks Possible columns (Candidates) Detection N SG 1 SG 2 SG 3 Fig. 6. Q matrix sorting algorithm: at iteration k, columns k +1to N are permuted and the column with the highest norm selected to form Q k+1

8 X. Zhu et al.: Exploring video content structure for hierarchical summarization 105 The updated state Q k+1 is obtained by augmenting Q k with the column and row of the best selected keyframe. The column corresponding to this keyframe is first permuted with column k +1, followed by a permutation of rows with the indices. Matrix Q k+1 is then formed with the first (k +1) (k +1)elements of the permuted shape interaction matrix. As a result of this maximization strategy, submatrix Q k+1 has the maximal energy among all possible (k +1) (k +1) submatrices of Q. Recursively execute the resorting steps above until all rows have been selected. The rows and columns of matrix Q will then be sorted in descending order of similarity. Due to the fact that visually similar keyframes have similar interactions with all other keyframes, the resorted matrix Q will take a roughly block diagonal form, as shown in Fig. 5b, with each diagonal block representing one cluster of keyframes (supergroup). The successfully detected diagonal block will help us in determining the final keyframe clusters. To sort the Q matrix, a critical row (the seed row) should be selected before sorting. Since it is not possible to optimize the selection of such a seed, we randomly select several rows as seeds and determine the corresponding sorted Q matrices. Those that have distinct diagonal block results are retained for further processing. After the Q matrix has been sorted into diagonal blocks, various approaches might be utilized to detect the boundary (clusters) of the blocks [52]. However, since we do not actually know how many diagonal blocks the Q matrix contains, the performance of these strategies might be decreased since 0104 most of them need information about the number of clusters in advance. Therefore, a simple threshold scheme is applied to eliminate those elements belonging to a predefined threshold and detect the blocks along the diagonal. The successfully detected elements in one block are taken as one supergroup. As we indicated above, instead of using only one keyframe as the seed, we execute the algorithm several times using different seeds. The union of all detected supergroups is taken as the detection result. For example, if K 2,K 9,K 14,K 35 are detected as one supergroup in the first iteration, and K 2,K 9,K 22 are detected as one supergroup in the next iteration, their union, K 2,K 9,K 14,K 22,K 35, will be taken as the final detected supergroup Keyframe grouping for supergroup detection After the diagonal blocks have been detected, the keyframes belonging to the same block are taken as one supergroup, denoted by SG i. The keyframe that does not belong to any diagonal block is taken as a separate supergroup. Given a video shot S i, assume there are T keyframes K l (l =1,...,T) contained in S i. In the ideal situation, all T keyframes should be merged into one supergroup. Unfortunately, this assumption does not always hold due to the fact that a shot may exhibit significant visual variances. Suppose all T keyframes in S i have been clustered into NSG supergroup SG l (l =1,...,NSG) with T 1 keyframes contained in SG 1, T 2 keyframes contained in SG 2, and so on. It is obvious that T 1 +T T NSG = T. The final supergroup to which S i belongs is selected using Detected super video group a (a) b(b) c Key-frames (in original temporal order) (c) Sorting Sorted key-frames (temporal order discarded) d (d) Fig. 7. Video supergroup detection. a Sampled frames from 105 keyframes (a kidney transplant surgery), with its original temporal order from left to right, top to bottom. b Ten detected supergroups. c Q matrix of corresponding 105 keyframes. d Sorted Q matrix

9 106 X. Zhu et al.: Exploring video content structure for hierarchical summarization Shot ID SG i S 1 S 2 S 3 S 12 Ranked temporal index orders of all shots in SG i G 1 G 2 G 3 G Final segmented video groups in SG i Step 1 Step 3,4, 5 G 1 G 2 G 11 G Step 2 G 1 G 2 G 7 G Fig. 8. Video group detection by temporal shot segmentation Eq. 6: S i SG t, t = arg l {max(t l ); l =1, 2,...,NSG}, (6) i.e., the supergroup with the maximal number of keyframes of S i is selected as the target to which S i belongs. Figure 7 shows the pictorial results of our supergroup detection strategy applied to a real-world video with 105 keyframes. The sample frames are shown in Fig. 7a in their temporal order, from left to right, top to bottom. Figures 7c and d show the constructed Q matrix with the top ten eigenvectors and the sorted result. Obviously, the sorted diagonal blocks can be used separately for supergroup detection. The representative frames of ten detected super video groups are shown in Fig. 7b. 4.3 Video group detection In the strategy above, the temporal order information among shots has not been considered. Although visually similar shots have a high probability of belonging to one scene, the shots with a large temporal distance are more likely to belong to different scenarios.accordingly, given any supergroup (SG i ), those adjacent shots in SG i with their temporal distance larger than a predefined threshold are segmented into different video groups. Given two shots S l ands h in video V i, assume their temporal index order (shot ID) is denoted by ID Sl and ID Sh, respectively. The temporal index order distance (IOD) between S l and S h is defined by Eq. 7: IOD(S l,s h )= ID Sl ID Sh 1. (7) That is, the IOD between S l and S h is the number of shots between S l and S h in their temporal orders. Given a supergroup SG i with NT 1shots and a specific threshold T Group, the strategy below will segment all NT shots into corresponding groups. 1. Rank all shots S l (l =1,...,NT) of SG i in increasing order by their temporal index order (that is, the shot with the smallest index number (ID) will be assigned to the first place, as shown in Fig. 8). 2. Initially, assume there are NT groups G l (l =1,...,NT) in SG i with each group containing one shot. 3. Given any two neighboring groups G i (with NT i shots) and G j (with NT j shots) in SG i, if the IOD between G i and G j is smaller than the threshold T Group, G i and G j are merged into one group. Accordingly, we define the IOD between G i and G j, denoted by IOD(G i,g j ), using Eq. 8: IOD(G i,g j ) = Min {IOD(S l,s h ); l=1,...,nt i;s l G i h =1,...,NT j,s h G j }. (8) If IOD(G i,g j ) T Group, G i and G j are merged into a new video group; otherwise they remain separate. 4. Iteratively execute step 3 until there are no more groups to be merged. Assume that NG groups (G i,i =1,...,NG) remain. For any two remaining groups G l and G h, IOD(G l,g h ) >T Group ; G l,g h SG i. All remaining units are taken as video groups in SG i. Figure 8 presents an example to indicate our group segmentation algorithm. Given a supergroup SG i containing 12 shots, with the ranked temporal index orders of all shots given by 8, 14, 15, 17,..., 41. Initially, each shot is taken as one group, and any two neighboring groups with an IOD less than T Group are merged into one group (we define T Group =4 in our system, and a detailed analysis on selecting T Group is discussed in Sect. 6.1). Iterative execution finally segments all shots in SG i into four groups. 4.4 Video scene detection Using the strategy above, all shots can be parsed into groups, with each group containing visually similar and temporally adjacent shots. However, as Definition 7 in Sect. 2 indicates, video group is an intermediate entity between physical shots and semantic scenes. This means that a single video group may be incapable of acting as an independent story unit to convey a video scenario. Actually, a video scene usually consists of temporally interlaced groups or visually similar and neighboring groups. In this section, a group-merging algorithm is proposed for video scene detection. Before describing our scene-detection algorithm, we will discuss the characterization of video scenes. By considering the definitions in [51], we conclude that three types of scenes are widely used by filmmakers when they capture or edit videos: N-type scene: These scenes (also known as normal scenes) are characterized by a long-term consistency of chromatic composition, lighting conditions, sound, etc. All shots in the scene are assumed to exhibit some similarity with their visual or audio features. An example of this type of scene is shown in Fig. 9a, where various shots captured from different cameras are organized to report a news story on a flight show in an international aviation exhibition. Since all shots are captured in the same scene, they are somewhat similar in visual features, and all shots in the scene

10 X. Zhu et al.: Exploring video content structure for hierarchical summarization 107 a G 1 G 4 b G 2 G 3 c N-type scenes M-type scenes Fig. 9. Examples of various types of scenes. a N-type scenes. b M-type scenes. c Hybrid scenes where both N- and M-type scenes exist in the same story unit might be merged into one or several visually similar video groups. M-type scenes: These scenes (also known as montage scenes) are characterized by widely different visuals (different in location, time of creation, and lighting conditions) that create a unity of theme by the manner in which they have been juxtaposed. An example of this type of scene is shown in Fig. 9b, where three actors are involved in a dialog scene. Obviously, the shots in the scene have a large visual variance and have been classified into four distinct groups (G 1, G 2,G 3, and G 4 ). However, these four groups are temporally interlaced to convey the same scenario information. Hybrid scenes: The hybrid scenes consist of both N- and M-type scenes, where both visually similar neighboring groups and temporally interlaced groups are utilized to convey a scenario unit. This is shown in Fig. 9c, where the beginning is a typical N-type scene and the end is an M-type scene. Using the above observations and definitions, we now introduce a temporally and spatially integrated strategy for parsing video groups into semantic scenes. Given groups G i and G j, Fig. 10 presents three types of temporal relationship among them: interlaced, uninterlaced, and neighboring. To parse video semantic scenes, the temporally interlaced video groups and visually similar neighboring groups should be merged into one scene since they have a high probability of conveying the same scenario. Assume IODG b i and IODG e i denote the minimal and maximal temporal index order of the shots in G i, as shown in Fig. 10a. To classify G i and G j into interlaced, uninterlaced, or neighboring categories, the temporal group distance (TGD) between G i and G j (TGD(G i,g j ) is defined by Eq. 9: ID G b j IDG e i ; if IDG b j IDG b i TGD(G i,g j )= ID G b i IDG e j ; if IDG b j <IDG b i. (9) If TGD(G i,g j ) > 1, there is no interlacing between shots in G i and G j, and hence G i and G j are classified as uninterlaced groups, as shown in Fig. 10a. If TGD(G i,g j )=1, we clas- a b c Uninterlaced Groups Interlaced Groups Neighboring Groups b ID Gi G i Temporal index orders of shots in Gi Keep unmerged b e ID Gj G j ID G j e ID Gi Temporal index orders of shots in Gj G i Temporal index orders of shots in Gi G j Temporal index orders of shots in Gj G i Temporal index orders of shots in Gi G j Temporal index orders of shots in Gj Group merging GpSim(Gi,Gj)> TScene G i Temporal index orders of shots in Gi G j Temporal index orders of shots in Gj G k Temporal index orders of shots in Gk G k Temporal index orders of shots in Gi Fig. 10. Group merging for scene detection. a Uninterlaced groups. b Interlaced groups are merged into one unit. c Neighboring groups with similarity larger than the threshold are merged into one unit sify G i and G j as neighboring groups, as shown in Fig. 10c. If TGD(G i,g j ) < 0, G i and G j are classified as interlaced groups, as shown in Fig. 10b. Obviously, since a shot can be assigned to only one group, TGD(G i,g j ) will never equal 0. After classifying any two video groups into the appropriate categories, our scene detection follows the steps below. 1. Given any two video groups G i and G j,iftgd(g i,g j ) < 0, G i and G j are merged into a new group by combing all shots in G i and G j.as shown in Fig. 10b, the interlaced group G i and G j are merged together to form a new group G k, with numbers in different styles (bold & italic vs. roman) indicating shots from different groups.

11 108 X. Zhu et al.: Exploring video content structure for hierarchical summarization 2. For TGD(G i,g j )=1, if the similarity between G i and G j (given by Eq. 11) is larger than a given threshold T Scene, G i and G j are merged into a new group. Otherwise, they remain unmerged, as shown in Fig. 10c. 3. Iteratively execute the steps above until no more groups can be merged. All remaining and newly generated groups are taken as video scenes. To measure the similarity between groups for scene detection, we first consider the similarity between a shot and a group. Based on Eq. 2, given shot S i and group G j, the similarity between them is defined by Eq. 10: StGpSim(S i,g j ) = Max {FmSim(K l,k h )}. K l S i,k h G j (10) This implies that the similarity between S i and G j is the similarity between the keyframes in S i and G j that have the highest similarity. In general, when we evaluate the similarity between two video groups using the naked eye, we usually take the group with fewer shots as the benchmark and then determine whether there are any shots in the other group similar to shots in the benchmark group. If most shots in the benchmark group are similar enough to the other group, these two groups are treated with a high similarity [30,42]. Given groups G i and G j, assume Ĝi,j represents the group containing fewer shots and G i,j denotes the other group. Suppose NT(x) denotes the number of shots in group x; then the similarity between G i and G j is given by Eq GpSim(G i,g j )= NT(Ĝi,j) NT(Ĝi,j) i=1;s i Ĝi,j StGpSim(S i, G i,j ) (11) That is, the similarity between G i and G j is the average similarity between shots in the benchmark group and their most similar shots in the other group. Some typical scene detection results from real-world videos (movies, news, and medical videos) are presented in Fig. 11, where the video shots in the same story units are successfully detected. Actually, in some situations, the scene boundaries might be too vague to be distinguished by even the naked eye; we don t believe any state-of-the-art methods can attain satisfactory results from those videos. However, for the scenes with relatively clear boundaries, our method obtains attractive results (as demonstrated in Sect. 6.3), and these results will guide us in constructing hierarchical video summaries. 4.5 Video-scene clustering Thus far, a video content structure from video scenes and groups to keyframes has been constructed. Nevertheless, since our final goal is to use the video content structure to construct video summaries, a more compact level beyond the scenes is necessary for creating an overview of the video. This is accomplished by clustering visually similar scenes that appear at different places in the video. Given a scene SE i in video V k, assume there are NT shots contained in SE i, and assume further that all shots in SE i have been merged into NSG i supergroups SG l (l =1,...,NSG i ). The union of all these supergroups is then denoted by Eq. 12: USG(SE i )=SG 1 SG 2 SG NSGi. (12) Obviously, given any two scenes SE i and SE j, if USG(SE i ) USG(SE j ), this indicates that all shots in SE i come from the same supergroups as SE j. Hence, the visual redundancy between SE i and SE j can be eliminated by Movies News Medical videos Fig. 11. Some typical examples of the successfully detected video scenes among various types of videos (movies, news, medical video) with each row denoting one scene

70 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 6, NO. 1, FEBRUARY ClassView: Hierarchical Video Shot Classification, Indexing, and Accessing

70 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 6, NO. 1, FEBRUARY ClassView: Hierarchical Video Shot Classification, Indexing, and Accessing 70 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 6, NO. 1, FEBRUARY 2004 ClassView: Hierarchical Video Shot Classification, Indexing, and Accessing Jianping Fan, Ahmed K. Elmagarmid, Senior Member, IEEE, Xingquan

More information

648 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2005

648 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2005 648 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2005 InsightVideo: Toward Hierarchical Video Content Organization for Efficient Browsing, Summarization and Retrieval Xingquan Zhu, Ahmed K. Elmagarmid,

More information

Binju Bentex *1, Shandry K. K 2. PG Student, Department of Computer Science, College Of Engineering, Kidangoor, Kottayam, Kerala, India

Binju Bentex *1, Shandry K. K 2. PG Student, Department of Computer Science, College Of Engineering, Kidangoor, Kottayam, Kerala, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Survey on Summarization of Multiple User-Generated

More information

Searching Video Collections:Part I

Searching Video Collections:Part I Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia Representation Visual Features (Still Images and Image Sequences) Color Texture Shape Edges Objects, Motion

More information

CS 534: Computer Vision Segmentation and Perceptual Grouping

CS 534: Computer Vision Segmentation and Perceptual Grouping CS 534: Computer Vision Segmentation and Perceptual Grouping Ahmed Elgammal Dept of Computer Science CS 534 Segmentation - 1 Outlines Mid-level vision What is segmentation Perceptual Grouping Segmentation

More information

Automatic Video Caption Detection and Extraction in the DCT Compressed Domain

Automatic Video Caption Detection and Extraction in the DCT Compressed Domain Automatic Video Caption Detection and Extraction in the DCT Compressed Domain Chin-Fu Tsao 1, Yu-Hao Chen 1, Jin-Hau Kuo 1, Chia-wei Lin 1, and Ja-Ling Wu 1,2 1 Communication and Multimedia Laboratory,

More information

Video Analysis for Browsing and Printing

Video Analysis for Browsing and Printing Video Analysis for Browsing and Printing Qian Lin, Tong Zhang, Mei Chen, Yining Deng, Brian Atkins HP Laboratories HPL-2008-215 Keyword(s): video mining, video printing, user intent, video panorama, video

More information

Lecture 12: Video Representation, Summarisation, and Query

Lecture 12: Video Representation, Summarisation, and Query Lecture 12: Video Representation, Summarisation, and Query Dr Jing Chen NICTA & CSE UNSW CS9519 Multimedia Systems S2 2006 jchen@cse.unsw.edu.au Last week Structure of video Frame Shot Scene Story Why

More information

Optimal Video Adaptation and Skimming Using a Utility-Based Framework

Optimal Video Adaptation and Skimming Using a Utility-Based Framework Optimal Video Adaptation and Skimming Using a Utility-Based Framework Shih-Fu Chang Digital Video and Multimedia Lab ADVENT University-Industry Consortium Columbia University Sept. 9th 2002 http://www.ee.columbia.edu/dvmm

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods

More information

A Graph Theoretic Approach to Image Database Retrieval

A Graph Theoretic Approach to Image Database Retrieval A Graph Theoretic Approach to Image Database Retrieval Selim Aksoy and Robert M. Haralick Intelligent Systems Laboratory Department of Electrical Engineering University of Washington, Seattle, WA 98195-2500

More information

Analysis of Image and Video Using Color, Texture and Shape Features for Object Identification

Analysis of Image and Video Using Color, Texture and Shape Features for Object Identification IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 16, Issue 6, Ver. VI (Nov Dec. 2014), PP 29-33 Analysis of Image and Video Using Color, Texture and Shape Features

More information

Spectral Classification

Spectral Classification Spectral Classification Spectral Classification Supervised versus Unsupervised Classification n Unsupervised Classes are determined by the computer. Also referred to as clustering n Supervised Classes

More information

CS 664 Slides #11 Image Segmentation. Prof. Dan Huttenlocher Fall 2003

CS 664 Slides #11 Image Segmentation. Prof. Dan Huttenlocher Fall 2003 CS 664 Slides #11 Image Segmentation Prof. Dan Huttenlocher Fall 2003 Image Segmentation Find regions of image that are coherent Dual of edge detection Regions vs. boundaries Related to clustering problems

More information

AS STORAGE and bandwidth capacities increase, digital

AS STORAGE and bandwidth capacities increase, digital 974 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 13, NO. 7, JULY 2004 Concept-Oriented Indexing of Video Databases: Toward Semantic Sensitive Retrieval and Browsing Jianping Fan, Hangzai Luo, and Ahmed

More information

Region-based Segmentation

Region-based Segmentation Region-based Segmentation Image Segmentation Group similar components (such as, pixels in an image, image frames in a video) to obtain a compact representation. Applications: Finding tumors, veins, etc.

More information

TELCOM2125: Network Science and Analysis

TELCOM2125: Network Science and Analysis School of Information Sciences University of Pittsburgh TELCOM2125: Network Science and Analysis Konstantinos Pelechrinis Spring 2015 2 Part 4: Dividing Networks into Clusters The problem l Graph partitioning

More information

EE 701 ROBOT VISION. Segmentation

EE 701 ROBOT VISION. Segmentation EE 701 ROBOT VISION Regions and Image Segmentation Histogram-based Segmentation Automatic Thresholding K-means Clustering Spatial Coherence Merging and Splitting Graph Theoretic Segmentation Region Growing

More information

Scalable Hierarchical Summarization of News Using Fidelity in MPEG-7 Description Scheme

Scalable Hierarchical Summarization of News Using Fidelity in MPEG-7 Description Scheme Scalable Hierarchical Summarization of News Using Fidelity in MPEG-7 Description Scheme Jung-Rim Kim, Seong Soo Chun, Seok-jin Oh, and Sanghoon Sull School of Electrical Engineering, Korea University,

More information

Semantic Video Summarization Using Mutual Reinforcement Principle and Shot Arrangement Patterns

Semantic Video Summarization Using Mutual Reinforcement Principle and Shot Arrangement Patterns Semantic Video Summarization Using Mutual Reinforcement Principle and Shot Arrangement Patterns Shi Lu, Michael R. Lyu and Irwin King Department of Computer Science and Engineering The Chinese University

More information

Video Syntax Analysis

Video Syntax Analysis 1 Video Syntax Analysis Wei-Ta Chu 2008/10/9 Outline 2 Scene boundary detection Key frame selection 3 Announcement of HW #1 Shot Change Detection Goal: automatic shot change detection Requirements 1. Write

More information

Texture Segmentation by Windowed Projection

Texture Segmentation by Windowed Projection Texture Segmentation by Windowed Projection 1, 2 Fan-Chen Tseng, 2 Ching-Chi Hsu, 2 Chiou-Shann Fuh 1 Department of Electronic Engineering National I-Lan Institute of Technology e-mail : fctseng@ccmail.ilantech.edu.tw

More information

Lab # 2 - ACS I Part I - DATA COMPRESSION in IMAGE PROCESSING using SVD

Lab # 2 - ACS I Part I - DATA COMPRESSION in IMAGE PROCESSING using SVD Lab # 2 - ACS I Part I - DATA COMPRESSION in IMAGE PROCESSING using SVD Goals. The goal of the first part of this lab is to demonstrate how the SVD can be used to remove redundancies in data; in this example

More information

Chapter 2 Basic Structure of High-Dimensional Spaces

Chapter 2 Basic Structure of High-Dimensional Spaces Chapter 2 Basic Structure of High-Dimensional Spaces Data is naturally represented geometrically by associating each record with a point in the space spanned by the attributes. This idea, although simple,

More information

Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning

Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning Devina Desai ddevina1@csee.umbc.edu Tim Oates oates@csee.umbc.edu Vishal Shanbhag vshan1@csee.umbc.edu Machine Learning

More information

AIIA shot boundary detection at TRECVID 2006

AIIA shot boundary detection at TRECVID 2006 AIIA shot boundary detection at TRECVID 6 Z. Černeková, N. Nikolaidis and I. Pitas Artificial Intelligence and Information Analysis Laboratory Department of Informatics Aristotle University of Thessaloniki

More information

A Robust Wipe Detection Algorithm

A Robust Wipe Detection Algorithm A Robust Wipe Detection Algorithm C. W. Ngo, T. C. Pong & R. T. Chin Department of Computer Science The Hong Kong University of Science & Technology Clear Water Bay, Kowloon, Hong Kong Email: fcwngo, tcpong,

More information

ELEC Dr Reji Mathew Electrical Engineering UNSW

ELEC Dr Reji Mathew Electrical Engineering UNSW ELEC 4622 Dr Reji Mathew Electrical Engineering UNSW Review of Motion Modelling and Estimation Introduction to Motion Modelling & Estimation Forward Motion Backward Motion Block Motion Estimation Motion

More information

CS 664 Segmentation. Daniel Huttenlocher

CS 664 Segmentation. Daniel Huttenlocher CS 664 Segmentation Daniel Huttenlocher Grouping Perceptual Organization Structural relationships between tokens Parallelism, symmetry, alignment Similarity of token properties Often strong psychophysical

More information

Using Subspace Constraints to Improve Feature Tracking Presented by Bryan Poling. Based on work by Bryan Poling, Gilad Lerman, and Arthur Szlam

Using Subspace Constraints to Improve Feature Tracking Presented by Bryan Poling. Based on work by Bryan Poling, Gilad Lerman, and Arthur Szlam Presented by Based on work by, Gilad Lerman, and Arthur Szlam What is Tracking? Broad Definition Tracking, or Object tracking, is a general term for following some thing through multiple frames of a video

More information

Module 7 VIDEO CODING AND MOTION ESTIMATION

Module 7 VIDEO CODING AND MOTION ESTIMATION Module 7 VIDEO CODING AND MOTION ESTIMATION Lesson 20 Basic Building Blocks & Temporal Redundancy Instructional Objectives At the end of this lesson, the students should be able to: 1. Name at least five

More information

Video Representation. Video Analysis

Video Representation. Video Analysis BROWSING AND RETRIEVING VIDEO CONTENT IN A UNIFIED FRAMEWORK Yong Rui, Thomas S. Huang and Sharad Mehrotra Beckman Institute for Advanced Science and Technology University of Illinois at Urbana-Champaign

More information

Outline Introduction MPEG-2 MPEG-4. Video Compression. Introduction to MPEG. Prof. Pratikgiri Goswami

Outline Introduction MPEG-2 MPEG-4. Video Compression. Introduction to MPEG. Prof. Pratikgiri Goswami to MPEG Prof. Pratikgiri Goswami Electronics & Communication Department, Shree Swami Atmanand Saraswati Institute of Technology, Surat. Outline of Topics 1 2 Coding 3 Video Object Representation Outline

More information

Visual Representations for Machine Learning

Visual Representations for Machine Learning Visual Representations for Machine Learning Spectral Clustering and Channel Representations Lecture 1 Spectral Clustering: introduction and confusion Michael Felsberg Klas Nordberg The Spectral Clustering

More information

The ToCAI Description Scheme for Indexing and Retrieval of Multimedia Documents 1

The ToCAI Description Scheme for Indexing and Retrieval of Multimedia Documents 1 The ToCAI Description Scheme for Indexing and Retrieval of Multimedia Documents 1 N. Adami, A. Bugatti, A. Corghi, R. Leonardi, P. Migliorati, Lorenzo A. Rossi, C. Saraceno 2 Department of Electronics

More information

CIE L*a*b* color model

CIE L*a*b* color model CIE L*a*b* color model To further strengthen the correlation between the color model and human perception, we apply the following non-linear transformation: with where (X n,y n,z n ) are the tristimulus

More information

Latest development in image feature representation and extraction

Latest development in image feature representation and extraction International Journal of Advanced Research and Development ISSN: 2455-4030, Impact Factor: RJIF 5.24 www.advancedjournal.com Volume 2; Issue 1; January 2017; Page No. 05-09 Latest development in image

More information

CHAPTER 6 IDENTIFICATION OF CLUSTERS USING VISUAL VALIDATION VAT ALGORITHM

CHAPTER 6 IDENTIFICATION OF CLUSTERS USING VISUAL VALIDATION VAT ALGORITHM 96 CHAPTER 6 IDENTIFICATION OF CLUSTERS USING VISUAL VALIDATION VAT ALGORITHM Clustering is the process of combining a set of relevant information in the same group. In this process KM algorithm plays

More information

SOM+EOF for Finding Missing Values

SOM+EOF for Finding Missing Values SOM+EOF for Finding Missing Values Antti Sorjamaa 1, Paul Merlin 2, Bertrand Maillet 2 and Amaury Lendasse 1 1- Helsinki University of Technology - CIS P.O. Box 5400, 02015 HUT - Finland 2- Variances and

More information

Video Alignment. Final Report. Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin

Video Alignment. Final Report. Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin Final Report Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin Omer Shakil Abstract This report describes a method to align two videos.

More information

A 12-STEP SORTING NETWORK FOR 22 ELEMENTS

A 12-STEP SORTING NETWORK FOR 22 ELEMENTS A 12-STEP SORTING NETWORK FOR 22 ELEMENTS SHERENAZ W. AL-HAJ BADDAR Department of Computer Science, Kent State University Kent, Ohio 44240, USA KENNETH E. BATCHER Department of Computer Science, Kent State

More information

Video scenes clustering based on representative shots

Video scenes clustering based on representative shots ISSN 746-7233, England, UK World Journal of Modelling and Simulation Vol., No. 2, 2005, pp. -6 Video scenes clustering based on representative shots Jun Ye, Jian-liang Li 2 +, C. M. Mak 3 Dept. Applied

More information

CORRELATION BASED CAR NUMBER PLATE EXTRACTION SYSTEM

CORRELATION BASED CAR NUMBER PLATE EXTRACTION SYSTEM CORRELATION BASED CAR NUMBER PLATE EXTRACTION SYSTEM 1 PHYO THET KHIN, 2 LAI LAI WIN KYI 1,2 Department of Information Technology, Mandalay Technological University The Republic of the Union of Myanmar

More information

Integration of Global and Local Information in Videos for Key Frame Extraction

Integration of Global and Local Information in Videos for Key Frame Extraction Integration of Global and Local Information in Videos for Key Frame Extraction Dianting Liu 1, Mei-Ling Shyu 1, Chao Chen 1, Shu-Ching Chen 2 1 Department of Electrical and Computer Engineering University

More information

AUTOMATIC VIDEO INDEXING

AUTOMATIC VIDEO INDEXING AUTOMATIC VIDEO INDEXING Itxaso Bustos Maite Frutos TABLE OF CONTENTS Introduction Methods Key-frame extraction Automatic visual indexing Shot boundary detection Video OCR Index in motion Image processing

More information

9/8/2016. Characteristics of multimedia Various media types

9/8/2016. Characteristics of multimedia Various media types Chapter 1 Introduction to Multimedia Networking CLO1: Define fundamentals of multimedia networking Upon completion of this chapter students should be able to define: 1- Multimedia 2- Multimedia types and

More information

DATA and signal modeling for images and video sequences. Region-Based Representations of Image and Video: Segmentation Tools for Multimedia Services

DATA and signal modeling for images and video sequences. Region-Based Representations of Image and Video: Segmentation Tools for Multimedia Services IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 9, NO. 8, DECEMBER 1999 1147 Region-Based Representations of Image and Video: Segmentation Tools for Multimedia Services P. Salembier,

More information

Fast Non-Linear Video Synopsis

Fast Non-Linear Video Synopsis Fast Non-Linear Video Synopsis Alparslan YILDIZ, Adem OZGUR and Yusuf Sinan AKGUL {yildiz, akgul}@bilmuh.gyte.edu.tr, aozgur@gyte.edu.tr GIT Vision Lab - http://vision.gyte.edu.tr Gebze Institute of Technology

More information

A new predictive image compression scheme using histogram analysis and pattern matching

A new predictive image compression scheme using histogram analysis and pattern matching University of Wollongong Research Online University of Wollongong in Dubai - Papers University of Wollongong in Dubai 00 A new predictive image compression scheme using histogram analysis and pattern matching

More information

Video Summarization Using MPEG-7 Motion Activity and Audio Descriptors

Video Summarization Using MPEG-7 Motion Activity and Audio Descriptors Video Summarization Using MPEG-7 Motion Activity and Audio Descriptors Ajay Divakaran, Kadir A. Peker, Regunathan Radhakrishnan, Ziyou Xiong and Romain Cabasson Presented by Giulia Fanti 1 Overview Motivation

More information

Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong)

Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong) Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong) References: [1] http://homepages.inf.ed.ac.uk/rbf/hipr2/index.htm [2] http://www.cs.wisc.edu/~dyer/cs540/notes/vision.html

More information

Hierarchical Video Summarization Based on Video Structure and Highlight

Hierarchical Video Summarization Based on Video Structure and Highlight Hierarchical Video Summarization Based on Video Structure and Highlight Yuliang Geng, De Xu, and Songhe Feng Institute of Computer Science and Technology, Beijing Jiaotong University, Beijing, 100044,

More information

Scalable storyboards in handheld devices: applications and evaluation metrics

Scalable storyboards in handheld devices: applications and evaluation metrics Multimedia Tools and Applications manuscript No. (will be inserted by the editor) Scalable storyboards in handheld devices: applications and evaluation metrics Luis Herranz Shuqiang Jiang Received: date

More information

Browsing News and TAlk Video on a Consumer Electronics Platform Using face Detection

Browsing News and TAlk Video on a Consumer Electronics Platform Using face Detection MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Browsing News and TAlk Video on a Consumer Electronics Platform Using face Detection Kadir A. Peker, Ajay Divakaran, Tom Lanning TR2005-155

More information

Multimedia Database Systems. Retrieval by Content

Multimedia Database Systems. Retrieval by Content Multimedia Database Systems Retrieval by Content MIR Motivation Large volumes of data world-wide are not only based on text: Satellite images (oil spill), deep space images (NASA) Medical images (X-rays,

More information

Unsupervised learning in Vision

Unsupervised learning in Vision Chapter 7 Unsupervised learning in Vision The fields of Computer Vision and Machine Learning complement each other in a very natural way: the aim of the former is to extract useful information from visual

More information

Segmentation Computer Vision Spring 2018, Lecture 27

Segmentation Computer Vision Spring 2018, Lecture 27 Segmentation http://www.cs.cmu.edu/~16385/ 16-385 Computer Vision Spring 218, Lecture 27 Course announcements Homework 7 is due on Sunday 6 th. - Any questions about homework 7? - How many of you have

More information

A Miniature-Based Image Retrieval System

A Miniature-Based Image Retrieval System A Miniature-Based Image Retrieval System Md. Saiful Islam 1 and Md. Haider Ali 2 Institute of Information Technology 1, Dept. of Computer Science and Engineering 2, University of Dhaka 1, 2, Dhaka-1000,

More information

CHAPTER 2 TEXTURE CLASSIFICATION METHODS GRAY LEVEL CO-OCCURRENCE MATRIX AND TEXTURE UNIT

CHAPTER 2 TEXTURE CLASSIFICATION METHODS GRAY LEVEL CO-OCCURRENCE MATRIX AND TEXTURE UNIT CHAPTER 2 TEXTURE CLASSIFICATION METHODS GRAY LEVEL CO-OCCURRENCE MATRIX AND TEXTURE UNIT 2.1 BRIEF OUTLINE The classification of digital imagery is to extract useful thematic information which is one

More information

Reduction of Periodic Broadcast Resource Requirements with Proxy Caching

Reduction of Periodic Broadcast Resource Requirements with Proxy Caching Reduction of Periodic Broadcast Resource Requirements with Proxy Caching Ewa Kusmierek and David H.C. Du Digital Technology Center and Department of Computer Science and Engineering University of Minnesota

More information

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009 Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer

More information

E-Companion: On Styles in Product Design: An Analysis of US. Design Patents

E-Companion: On Styles in Product Design: An Analysis of US. Design Patents E-Companion: On Styles in Product Design: An Analysis of US Design Patents 1 PART A: FORMALIZING THE DEFINITION OF STYLES A.1 Styles as categories of designs of similar form Our task involves categorizing

More information

Efficient Acquisition of Human Existence Priors from Motion Trajectories

Efficient Acquisition of Human Existence Priors from Motion Trajectories Efficient Acquisition of Human Existence Priors from Motion Trajectories Hitoshi Habe Hidehito Nakagawa Masatsugu Kidode Graduate School of Information Science, Nara Institute of Science and Technology

More information

Scene Change Detection Based on Twice Difference of Luminance Histograms

Scene Change Detection Based on Twice Difference of Luminance Histograms Scene Change Detection Based on Twice Difference of Luminance Histograms Xinying Wang 1, K.N.Plataniotis 2, A. N. Venetsanopoulos 1 1 Department of Electrical & Computer Engineering University of Toronto

More information

One image is worth 1,000 words

One image is worth 1,000 words Image Databases Prof. Paolo Ciaccia http://www-db. db.deis.unibo.it/courses/si-ls/ 07_ImageDBs.pdf Sistemi Informativi LS One image is worth 1,000 words Undoubtedly, images are the most wide-spread MM

More information

CSE 258 Lecture 5. Web Mining and Recommender Systems. Dimensionality Reduction

CSE 258 Lecture 5. Web Mining and Recommender Systems. Dimensionality Reduction CSE 258 Lecture 5 Web Mining and Recommender Systems Dimensionality Reduction This week How can we build low dimensional representations of high dimensional data? e.g. how might we (compactly!) represent

More information

CONTENT MODEL FOR MOBILE ADAPTATION OF MULTIMEDIA INFORMATION

CONTENT MODEL FOR MOBILE ADAPTATION OF MULTIMEDIA INFORMATION CONTENT MODEL FOR MOBILE ADAPTATION OF MULTIMEDIA INFORMATION Maija Metso, Antti Koivisto and Jaakko Sauvola MediaTeam, MVMP Unit Infotech Oulu, University of Oulu e-mail: {maija.metso, antti.koivisto,

More information

CHAPTER 4: CLUSTER ANALYSIS

CHAPTER 4: CLUSTER ANALYSIS CHAPTER 4: CLUSTER ANALYSIS WHAT IS CLUSTER ANALYSIS? A cluster is a collection of data-objects similar to one another within the same group & dissimilar to the objects in other groups. Cluster analysis

More information

Research on Construction of Road Network Database Based on Video Retrieval Technology

Research on Construction of Road Network Database Based on Video Retrieval Technology Research on Construction of Road Network Database Based on Video Retrieval Technology Fengling Wang 1 1 Hezhou University, School of Mathematics and Computer Hezhou Guangxi 542899, China Abstract. Based

More information

Multi-View Image Coding in 3-D Space Based on 3-D Reconstruction

Multi-View Image Coding in 3-D Space Based on 3-D Reconstruction Multi-View Image Coding in 3-D Space Based on 3-D Reconstruction Yongying Gao and Hayder Radha Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48823 email:

More information

A nodal based evolutionary structural optimisation algorithm

A nodal based evolutionary structural optimisation algorithm Computer Aided Optimum Design in Engineering IX 55 A dal based evolutionary structural optimisation algorithm Y.-M. Chen 1, A. J. Keane 2 & C. Hsiao 1 1 ational Space Program Office (SPO), Taiwan 2 Computational

More information

Short Survey on Static Hand Gesture Recognition

Short Survey on Static Hand Gesture Recognition Short Survey on Static Hand Gesture Recognition Huu-Hung Huynh University of Science and Technology The University of Danang, Vietnam Duc-Hoang Vo University of Science and Technology The University of

More information

Introduzione alle Biblioteche Digitali Audio/Video

Introduzione alle Biblioteche Digitali Audio/Video Introduzione alle Biblioteche Digitali Audio/Video Biblioteche Digitali 1 Gestione del video Perchè è importante poter gestire biblioteche digitali di audiovisivi Caratteristiche specifiche dell audio/video

More information

Mobile Human Detection Systems based on Sliding Windows Approach-A Review

Mobile Human Detection Systems based on Sliding Windows Approach-A Review Mobile Human Detection Systems based on Sliding Windows Approach-A Review Seminar: Mobile Human detection systems Njieutcheu Tassi cedrique Rovile Department of Computer Engineering University of Heidelberg

More information

Multimedia Databases. Wolf-Tilo Balke Younès Ghammad Institut für Informationssysteme Technische Universität Braunschweig

Multimedia Databases. Wolf-Tilo Balke Younès Ghammad Institut für Informationssysteme Technische Universität Braunschweig Multimedia Databases Wolf-Tilo Balke Younès Ghammad Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Previous Lecture Audio Retrieval - Query by Humming

More information

28 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 12, NO. 1, JANUARY 2010

28 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 12, NO. 1, JANUARY 2010 28 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 12, NO. 1, JANUARY 2010 Camera Motion-Based Analysis of User Generated Video Golnaz Abdollahian, Student Member, IEEE, Cuneyt M. Taskiran, Member, IEEE, Zygmunt

More information

A Semi-Automatic 2D-to-3D Video Conversion with Adaptive Key-Frame Selection

A Semi-Automatic 2D-to-3D Video Conversion with Adaptive Key-Frame Selection A Semi-Automatic 2D-to-3D Video Conversion with Adaptive Key-Frame Selection Kuanyu Ju and Hongkai Xiong Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai, China ABSTRACT To

More information

Available online at ScienceDirect. Procedia Computer Science 87 (2016 ) 12 17

Available online at  ScienceDirect. Procedia Computer Science 87 (2016 ) 12 17 Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 87 (2016 ) 12 17 4th International Conference on Recent Trends in Computer Science & Engineering Segment Based Indexing

More information

At the end of the chapter, you will learn to: Present data in textual form. Construct different types of table and graphs

At the end of the chapter, you will learn to: Present data in textual form. Construct different types of table and graphs DATA PRESENTATION At the end of the chapter, you will learn to: Present data in textual form Construct different types of table and graphs Identify the characteristics of a good table and graph Identify

More information

Video Key-Frame Extraction using Entropy value as Global and Local Feature

Video Key-Frame Extraction using Entropy value as Global and Local Feature Video Key-Frame Extraction using Entropy value as Global and Local Feature Siddu. P Algur #1, Vivek. R *2 # Department of Information Science Engineering, B.V. Bhoomraddi College of Engineering and Technology

More information

CHAPTER 3 WAVELET DECOMPOSITION USING HAAR WAVELET

CHAPTER 3 WAVELET DECOMPOSITION USING HAAR WAVELET 69 CHAPTER 3 WAVELET DECOMPOSITION USING HAAR WAVELET 3.1 WAVELET Wavelet as a subject is highly interdisciplinary and it draws in crucial ways on ideas from the outside world. The working of wavelet in

More information

Visual interfaces for a semantic content-based image retrieval system

Visual interfaces for a semantic content-based image retrieval system Visual interfaces for a semantic content-based image retrieval system Hagit Hel-Or a and Dov Dori b a Dept of Computer Science, University of Haifa, Haifa, Israel b Faculty of Industrial Engineering, Technion,

More information

Clustering Methods for Video Browsing and Annotation

Clustering Methods for Video Browsing and Annotation Clustering Methods for Video Browsing and Annotation Di Zhong, HongJiang Zhang 2 and Shih-Fu Chang* Institute of System Science, National University of Singapore Kent Ridge, Singapore 05 *Center for Telecommunication

More information

A Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks

A Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 8, NO. 6, DECEMBER 2000 747 A Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks Yuhong Zhu, George N. Rouskas, Member,

More information

MATRIX BASED INDEXING TECHNIQUE FOR VIDEO DATA

MATRIX BASED INDEXING TECHNIQUE FOR VIDEO DATA Journal of Computer Science, 9 (5): 534-542, 2013 ISSN 1549-3636 2013 doi:10.3844/jcssp.2013.534.542 Published Online 9 (5) 2013 (http://www.thescipub.com/jcs.toc) MATRIX BASED INDEXING TECHNIQUE FOR VIDEO

More information

An explicit feature control approach in structural topology optimization

An explicit feature control approach in structural topology optimization th World Congress on Structural and Multidisciplinary Optimisation 07 th -2 th, June 205, Sydney Australia An explicit feature control approach in structural topology optimization Weisheng Zhang, Xu Guo

More information

HIERARCHICAL VISUAL DESCRIPTION SCHEMES FOR STILL IMAGES AND VIDEO SEQUENCES

HIERARCHICAL VISUAL DESCRIPTION SCHEMES FOR STILL IMAGES AND VIDEO SEQUENCES HIERARCHICAL VISUAL DESCRIPTION SCHEMES FOR STILL IMAGES AND VIDEO SEQUENCES Universitat Politècnica de Catalunya Barcelona, SPAIN philippe@gps.tsc.upc.es P. Salembier, N. O Connor 2, P. Correia 3 and

More information

Short Run length Descriptor for Image Retrieval

Short Run length Descriptor for Image Retrieval CHAPTER -6 Short Run length Descriptor for Image Retrieval 6.1 Introduction In the recent years, growth of multimedia information from various sources has increased many folds. This has created the demand

More information

CSE 255 Lecture 5. Data Mining and Predictive Analytics. Dimensionality Reduction

CSE 255 Lecture 5. Data Mining and Predictive Analytics. Dimensionality Reduction CSE 255 Lecture 5 Data Mining and Predictive Analytics Dimensionality Reduction Course outline Week 4: I ll cover homework 1, and get started on Recommender Systems Week 5: I ll cover homework 2 (at the

More information

Feature Selection Using Principal Feature Analysis

Feature Selection Using Principal Feature Analysis Feature Selection Using Principal Feature Analysis Ira Cohen Qi Tian Xiang Sean Zhou Thomas S. Huang Beckman Institute for Advanced Science and Technology University of Illinois at Urbana-Champaign Urbana,

More information

Clustering and Visualisation of Data

Clustering and Visualisation of Data Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some

More information

AN EFFICIENT BATIK IMAGE RETRIEVAL SYSTEM BASED ON COLOR AND TEXTURE FEATURES

AN EFFICIENT BATIK IMAGE RETRIEVAL SYSTEM BASED ON COLOR AND TEXTURE FEATURES AN EFFICIENT BATIK IMAGE RETRIEVAL SYSTEM BASED ON COLOR AND TEXTURE FEATURES 1 RIMA TRI WAHYUNINGRUM, 2 INDAH AGUSTIEN SIRADJUDDIN 1, 2 Department of Informatics Engineering, University of Trunojoyo Madura,

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

More information

6. Dicretization methods 6.1 The purpose of discretization

6. Dicretization methods 6.1 The purpose of discretization 6. Dicretization methods 6.1 The purpose of discretization Often data are given in the form of continuous values. If their number is huge, model building for such data can be difficult. Moreover, many

More information

Introduction to digital image classification

Introduction to digital image classification Introduction to digital image classification Dr. Norman Kerle, Wan Bakx MSc a.o. INTERNATIONAL INSTITUTE FOR GEO-INFORMATION SCIENCE AND EARTH OBSERVATION Purpose of lecture Main lecture topics Review

More information

Self-Organizing Maps for cyclic and unbounded graphs

Self-Organizing Maps for cyclic and unbounded graphs Self-Organizing Maps for cyclic and unbounded graphs M. Hagenbuchner 1, A. Sperduti 2, A.C. Tsoi 3 1- University of Wollongong, Wollongong, Australia. 2- University of Padova, Padova, Italy. 3- Hong Kong

More information

View Synthesis for Multiview Video Compression

View Synthesis for Multiview Video Compression View Synthesis for Multiview Video Compression Emin Martinian, Alexander Behrens, Jun Xin, and Anthony Vetro email:{martinian,jxin,avetro}@merl.com, behrens@tnt.uni-hannover.de Mitsubishi Electric Research

More information

STUDYING THE FEASIBILITY AND IMPORTANCE OF GRAPH-BASED IMAGE SEGMENTATION TECHNIQUES

STUDYING THE FEASIBILITY AND IMPORTANCE OF GRAPH-BASED IMAGE SEGMENTATION TECHNIQUES 25-29 JATIT. All rights reserved. STUDYING THE FEASIBILITY AND IMPORTANCE OF GRAPH-BASED IMAGE SEGMENTATION TECHNIQUES DR.S.V.KASMIR RAJA, 2 A.SHAIK ABDUL KHADIR, 3 DR.S.S.RIAZ AHAMED. Dean (Research),

More information

Chapter 3 Image Registration. Chapter 3 Image Registration

Chapter 3 Image Registration. Chapter 3 Image Registration Chapter 3 Image Registration Distributed Algorithms for Introduction (1) Definition: Image Registration Input: 2 images of the same scene but taken from different perspectives Goal: Identify transformation

More information