Adaptive Fusion of Human Visual Sensitive Features for Surveillance Video Summarization

Size: px
Start display at page:

Download "Adaptive Fusion of Human Visual Sensitive Features for Surveillance Video Summarization"

Transcription

1 Research Article Journal of the Optical Society of America A 1 Adaptive Fusion of Human Visual Sensitive Features for Surveillance Video Summarization MD. MUSFEQUS SALEHIN 1,* AND MANORANJAN PAUL 1 1 School of Computing and Mathematics, Charles Sturt University, Bathurst, NSW 2795, Australia * Corresponding author: msalehin@csu.edu.au Compiled March 31, 2017 Surveillance video camera captures a large amount of continuous video stream every day. To analyze or investigate any significant events, it is laborious and boring job to identify these events from the huge video data if it is done manually. Existing approaches sometime neglect key frames with significant visual contents and/or select some unimportant frames with low/no activity. To solve this problem, in the paper a video summarization technique is proposed by combining three multi-modal human visual sensitive features such as foreground objects, motion information and visual saliency. In a video stream, foreground objects are one of the most important content of a video as they contain more detail information and play a major role for important events. Moreover, motion is another stimulus of a video which attracts human visual attention significantly. To obtain this, motion information is calculated in spatial as well as frequency domain. Spatial motion information can select object motion accurately; however it is sensitive to illumination changes. On the other hand, frequency motion information is robust to illumination change, although it is easily affected by noise. Therefore, motion information both in spatial and frequency domain is employed. Furthermore, visual attention cue is a sensitive feature to measure the indication of user s attraction label for determining key frames. As these features individually cannot perform every well, they are combined to obtain better results. For this propose, an adaptive linear weighted fusion scheme is proposed to combine the features to rank video frames for summarization. Experimental results reveal that the proposed method outperforms the state of the art method Optical Society of America OCIS codes: ( ) Lasers, distributed feedback; ( ) Fibers, polarization-maintaining;( ) Fiber Bragg gratings INTRODUCTION Every day an enormous amount of surveillance video is captured throughout the whole world for providing security, monitoring, preventing crime, controlling traffic, and so on. In general, a number of surveillance video cameras are set up in a number of different places of a building, business area, or congested area and these cameras are connected to a monitoring server for storing and investigating. To store this huge volume of video data requires tremendous memory space. In addition to this, to find out any important event from the stored video for investigating or performing analysis, operators need to access the store videos. This process is very tedious, time consuming and not cost effective. To solve these problems, a method for generating the shorter version of original video containing important events is highly desirable for memory management and information retrieval. Video summarization is the process or technique to select the most informative frames so that it can contain all the necessary events or information of a long video and reject unnecessary frames to make the summarized video as concise as possible. Therefore, a good video summarization method is one that has several important properties. First, it must have the capability to produce a video with all significant incidents within the original video. Second, it should be able to generate a smaller version of the provided long video. Third, it should not contain repetitive information. The main purpose of video summarization is to represent a long original video in a condensed version in such a way that a user can get the overall idea of the events occurred in the entire video within a constrained amount of time. Existing approaches sometime neglect key frames with significant visual contents and/or select some unimportant frames with low/no activity. In a video stream, foreground objects are one of the most important content of a video as they contain more detail information and play a major role for important events [1]. A frame with large size of foreground objects is more informative than that with less or no foreground objects. More-

2 Research Article Journal of the Optical Society of America A 2 over, motion is another stimuli of a video which attracts human visual attention significantly [2]. Human activity, which is also an important content of a video, can be easily represented by motion information. Furthermore, visual attention cue is a sensitive feature to measure the indication of user s attraction label for determining key frames [3]. This attention information is used to monitor the human perception system for understanding the content of the video [3]. Motivated by above-mentioned findings, a video summarization scheme is proposed in this paper based on foreground objects, their motion, and visual attention cue in a video. The frame with an absolute larger foreground area has the higher probability to present an important event. To include the absolute size of the foreground object areas, Gaussian mixture-based parametric dynamic background modeling (DBM) [4] has been applied in the proposed approach. However, not all frames with larger foreground areas within an event should be selected as key frames if they are similar and have no/little motion among them compared to background scene. To acquire the complete information of object motion in a video, object motion is extracted not only in spatial domain but also in frequency domain. Although spatial motion information is able to indicate object motion related to important event, it is sensitive to illumination changes [5][6] and it provides over estimated motion areas comprising object areas, uncovered background areas and occluded areas compared to background scene. On the other hand, motion information in frequency domain is invariant to illumination [5][6]. However, it is easily affected by noise [5][6]. To obtain motion information in spatial domain, consecutive frame difference (CFD) is applied. For achieving object motion in frequency domain, we use phase correlation (PC) technique due to its remarkable accuracy and its robustness to uniform variations of illumination and signal noise in images [7]. Besides, the size of foreground and motion information, visual saliency is also important for key frame extraction [3]. For this, we employ visual attention cue using graph based visual saliency (GBVS) method [8]. There are several advances of using GBVS approach. It powerfully predicts human fixations within a frame more accurately. Again, it confirms higher saliency value at the centre of the image plane. Furthermore, it draws more attention to salient region robustly. GBVS method prepares saliency map based on the spatial contrast within a frame whether it contains any foreground object or not. It is revealed in this proposed method that when any object movement occur in a frame, the saliency value at almost each point of saliency map changes. This incidence motivates us to consider the difference of saliency maps between two consecutive frames as feature for video summarization. Because the saliency difference map provides more distinguishable information than the single saliency map. In the proposed method, a novel adaptive fusion scheme is also introduced for combining features. The weights of the fusion are learnt during training session. Therefore, it provides an opportunity to adjust the result of video summarization. Finally, the summary of the video is generated as per the skimming ratio provided by the user. Otherwise, the proposed approach generates summary of the original video based on the default skimming ratio. Therefore, the contributions of the paper are as follows: 1. We introduce a novel feature namely the peak of phase shift obtained from the phase correlation technique and apply for summarizing a video to extract robust motion information in frequency domain in the case of illumination changes; 2. We introduce another novel feature namely saliency difference to exploit the temporal salient information as the single saliency map does not indicate the salient changes; 3. we develop an adaptive fusion scheme to combine different features as the contents are not always the same within a video or among all videos; The structure of the remaining paper is as follows. Section 2 reviews related research. The proposed method is described in section 3. Experimental results as well as detail discussions are provided in section 4. Finally, a concluding remark is drawn in section RELATED RESEARCH In the literature, difference approaches have been proposed for summarizing various types of videos. These videos can be categories into egocentric video, user generated video, movie, endoscopic video, and surveillance video. Egocentric video is generally captured by wearable cameras for socio-behavioral analysis of camera wearer s daily life. For summarizing this type of video, region saliency is predicted in [9] using a regression model and storyboard are generated based on region importance score. In [10], story driven egocentric video is summarized by discovering the most influential objects within a video. Gaze tracking information is applied in [11] for egocentric video summarization using sub-modular function maximization. User generated video is usually captured by handheld cameras or smart phone by a non-processional. For summarizing user generated video, an adaptive sub-modular maximization function is applied in [12]. A collaborative sparse coding model is utilized in [13] for generating summary of the same type of videos. Web images are used in [14] to enhance the process of summarizing the user generated video. Category specific (eg. birthday party) user video is summarized in [15] by automatic temporal segmenting of video, scoring each segment applying support vector machine (SVM), and selecting higher segments. In [16], Deep Event Network (DevNet) is introduced for high level events detection and spatial-temporal important evidence localization in a user generated video. For detecting important events from user generated video, low level and semantic level features of visual and audio are applied in [17]. Movies or films captured by professional cinematographers contain high definition video and audio for entertainment. To summarize movie, aural, visual and textual features are merged in [18]. As role communities contain information about previous and later scene, network of role communities is applied in [19] for movie summarization. Film comic is generated using eye tracking data in [20]. To summarize endoscopic video, ORB (Oriented FAST and Rotated BRIEF) key-point descriptor is applied in [21]. A unsupervised learning method is proposed using visual and temporal descriptors in [22] to partition frames in homogeneous categories. Later, the most typical frames are selected to summarize endoscopic video. In [23], image moments, curvature, and multi-scale contrast are combined to generate the saliency map for each frame. This saliency map is used to select key frames for endoscopic video summarization. A hidden Markov model based frame is introduced in[24] for endoscopic video summarization. However, the importance of surveillance video summarization for providing security, monitoring a restricted area, preventing crime, controlling traffic [1] is higher than other types of video summarization (e.g., egocentric, user generated, movie,

3 Research Article Journal of the Optical Society of America A 3 etc.). Because, the main purpose of user generated video or egocentric video is to summarize daily social activities [9][10][11] [12][13][14]. Therefore, we motivated to propose a framework to summarize surveillance video. To summarize surveillance video, object centered technique is applied in [25]. Dynamic videobook is proposed in [26] for representing the surveillance video in a hierarchical order. Learned distance metric is introduced in [27] for summarizing nursery school surveillance video. In [28], motion saliency is calculated based on a dynamic visual saliency model using integral-image-based temporal gradients. Later, informative key frames are extracted based on motion contrast and the salient object s coverage ratio. Maximum a posteriori probability (MAP) is used in [29] for summary generation. The dynamic visual saliency is calculated by temporal gradient and the static saliency is measured by discrete cosine transform (DCT) in [30]. A non-linear weighted fusion method is applied to combine the static and dynamic saliency. In [31], the correlation of RGB color channel, color histogram and moments of inertia are combined to extract key frames. In [32], each video frame is divided into 8 8 blocks and DCT is applied on them. Later, a DC term of each block is calculated and a DC image is constructed which is 64 times smaller than the original frame. DC image is converted into HSV color space and a 256-dimensional color histogram is computed. After that, zero-mean normalized cross correlation is applied on the color histogram to select representative frames. Finally, color distribution and gradient orientation are applied for redundant frame removal. Recently, a method is proposed in [1] for surveillance video summarization. Single view summarization is generated in the approach for each sensor independently. For this purpose, MPEG-7 color layout descriptor is applied to each video frame and an online-gaussian mixture model (GMM) is used for clustering. The key frames are selected based on the parameters of cluster. As the decision of selecting or neglecting a frame is performed based on the continuously updates of these clustering parameters, a video segment is extracted instead of key frames. Video summarization technique using a single type descriptor (i.e., color descriptor) in frame-level with on-line learning (i.e., GMM) strategy provides very good performance if the video has uni-modal phenomenon, however, the technique may not perform well if the video has multi-modal phenomena such as illumination change, variation of local motion, occlusion. To overcome these, we need to use multi-modal features for selecting key frames as a uni-modal feature sometimes fails to capture a specific phenomenon. For example, foreground objects sometimes could not contain explicit motion information. Again, spatial motion among adjacent frames does not provide appropriate motion information for key frame selection in the case of illumination changes [5][6] and sometimes it provides over estimated motion areas comprising object areas, uncovered background areas and occluded areas. Moreover, motion information in frequency domain is able to provide better motion information in the case of illumination change, however, it is sensitive to noise and it lacks of localization problem [6]. Furthermore, visual saliency is able to predict human fixations in a frame more accurately, however, it may extract a non-interesting area where there is significant spatial contrast exist. Therefore, we propose a novel method combing all these human visual sensitive features. In addition to this, machine learning-based classifier, support vector machine (SVM), is also used to classify key frames using the proposed features. The results indicate that SVM-based key frame selection does not always provide very good fusion for the features. As a result, a new adaptive fusion scheme is proposed combining these features. 3. THE PROPOSED METHOD The proposed scheme is based on the area of foreground objects, their motion information in spatial and frequency domain, and visual saliency difference information. The main steps of the proposed method are (A) foreground object extraction, (B) motion information calculation in spatial domain, (C) motion estimation in frequency domain, (D) visual saliency difference calculation, (E) adaptive linear weighted fusion of these features, and (F) video summary generation with flexible length. The flow chart of the proposed method is shown in Fig. 1. The detail of each step is explained in the subsequent sub-sections. Video Frames Multi-modal Feature Extraction User Provided Video Summary Yes Learning Weights of Features Adaptive Fusion Scheme by Learned Weights Ranking of Frames User Provided Skimming Ratio Yes Key Frames Selection Summarized Video No Fusion Scheme by the Default Weights No System's Default Skimming Ratio Fig. 1. The conceptual framework of the proposed summarization method A. Foreground Object Extraction Foreground objects are the most informative parts in a video stream as they contain more detail information and play a major role for important events [1]. In order to obtain the foreground object information in a video frame Gaussian mixture-based DBM [4][33] is applied. In this DBM, each pixel is modeled by the K Gaussian distributions (K=3) and each Gaussian model represents either background or different foreground objects over the time i.e., in different frames. For instance, suppose a pixel intensity x t at time t is modeled by k th Gaussian with recent value γk t, ɛt k, standard deviation σt k and weight ρt k such that Σρ t k = 1. The learning parameter α is used to update parameter values, such as mean, standard deviation, etc. At the beginning, the system contains empty set of Gaussian models. After observing the first pixel (t=1), a new Gaussian model (k=1)

4 Research Article Journal of the Optical Society of America A Area of Foreground Spatial Motion Information Motion Information in Frequency Domain Visual Silency Difference Adaptive Weighted Fusion of Features Frames Fig. 2. A representation of features applied on bl-18 video; the first, second, third, forth, and fifth rows represent area of foreground objects, spatial motion information, frequency domain motion information, visual saliency difference, and adaptive fusion of all features respectively. The red and black lines represent ground truth key frames and threshold value respectively for video summarization. is generated with γk t = ɛt k = x t, standard deviation σk t =30 and ρ t k = Then for each new observation of pixel intensity x t of the same location at t, it tries to find a matched model from the existing models such that x t ɛ k 2.5σ k. If it is successful to find a model, the parameters of the matched model are updated as in [4][33]. Otherwise, a new Gaussian model is introduced similar to the first pixel. Interested readers may find the details explanation of Gaussian mixture-based DBM in [4][33]. In the proposed method, each coloured video frame is converted into gray scale image I(t) and DBM [4][33] is applied to obtain its corresponding gray scale background frame B(t). To obtain the foreground objects in a frame, the difference between I(t) and B(t) is calculated. In this way, a foreground pixel U i,j (t) is obtained as follows U i,j (t) = I i,j (t) B i,j (t) (1) where (i, j) is the pixel position. After that, the summation of U i,j (t) is used as the area of foreground object feature Γ(t) which is obtained by the following equation Γ(t) = b c i=1 j=1 U i,j (t) (2) where b and c represent row and column of U respectively. Fig. 2 shows a demonstration for key frame selection strategy based on a number of features in bl-18 video to show individual feature strength for video summarization. From the first row of Fig. 2, it reveals that the size of foreground based on DBM [4][33] provides good key frame detection, however, it takes little time to consider the non-motion objects (which has motion previously) as a background objects due to the adaptive learning process. As a result, some unnecessary frames i.e., frames after stopping the movement of an object, might be selected as the key frames if we consider the area of foreground. Therefore, only considering the foreground object is not sufficient to generate a better video summarization. To overcome this problem, in the proposed scheme motion feature is applied in addition to foreground feature. Again, according to the psychological theories of human attention, the static attention clue is less informative than motion information [2]. Therefore, motion information in spatial and frequency domain is included in the proposed method in addition to the foreground object. B. Motion Information Calculation in Spatial Domain Human being usually gives more attention to the moving objects in a video [2] to understand an event. In order to obtain object motion information in spatial domain, CFD is computed by considering two consecutive color frames F(t 1) and F(t) at time t-1 and t in video respectively. To find out spatial motion

5 Research Article Journal of the Optical Society of America A 5 information, the absolute color difference between these frames is calculated. Therefore, the spatial motion information S i,j (t) in pixel (i, j) at time t can be obtained by the following equation S i,j (t) = F i,j (t) F i,j (t 1) (3) where (i, j) is the pixel position. The spatial motion information Υ(t) is obtained at time t by summing all values in S i,j (t) as follows Υ(t) = b c i=1 j=1 S i,j (t) (4) where b and c represent row and column of S respectively. In Fig. 2, motion feature Υ(t) is shown in the second row. It is obvious that combining features of foreground Γ(t) and motion information Υ(t), unnecessary frames obtained by the foreground feature can be removed (see Fig. 2 first and second row). Again, motion alone is not a very good feature. Because, a tiny foreground object with significant motion information is less attractive as well as informative than a large foreground object with sufficient motion information. Therefore, combination of foreground and motion information can provide a better result for key frame selection process. To explain this visually, Fig. 3 shows some frames with a small portion of human head. Although these frames contain sufficient motion information of the human head areas, they might not be a suitable candidate to be key frames, as they do not have enough foreground area. Thus, these frames should not be selected as key frames. Moreover, CFD is sensitive to illumination changes [5][6] and it provides over estimated motion areas comprising object areas, uncovered background areas and occluded areas. Frame no. 717 Frame no. 723 Frame no. 729 Frame no Frame no Frame no Fig. 3. An illustration of small object with adequate motion information (taken from bl-3 video). These frames should not be selected as key frames. C. Motion Information Extraction in Frequency Domain To overcome the problem of spatial motion information, motion information is also calculated in frequency domain. Motion estimation in frequency domain has some advantages over estimated motion in spatial domain [6]. It is efficient for global changes of illumination and robust to motion estimation near object boundaries. To obtain motion information in frequency domain, each frame is divided into a number of blocks of 1616 pixels size. Then, phase correlation technique [7][34] is applied between the current block and reference block. The phase correlation peak (β) (i.e., the magnitude of the motion accuracy) extracted from phase correlation method is used as motion indicator for that block. The phase difference (θ) is calculated between the current block and its co-located reference block after applying Fast Fourier Transformation (FFT) on each block using the following equation θ = i f f t(e j( η re f η cur ) ) (5) where η re f and η cur represent phase of FFT of reference and current block respectively. The maximum phase difference is calculated as follows θ max = max(θ) (6) The phase correlation peak (β), the maximum movement is obtained by the following equation β = 1 θ max (7) If the value of of a block greater than a threshold δ, it is considered that this block contains sufficient motion information. In the proposed method, the value of δ set to All the values greater than δ are summed to obtain motion information (t) in frequency domain (t) = L M l=1 m=1 I l,m (t) (8) where L and M represent row/16 and column/16 of gray scale image I respectively. The phase difference calculated in frequency domain applying phase correlation technique at different blocks of frame no of bl-14 video is shown in Fig. 4. No motion is represented in block (4, 4) with only a single highest peak almost equal to 1 (Fig. 4(d)). Single motion (block (8, 9)) with the phase difference value equal to 0.8 and complex motions (block (8, 3)) values less than 0.2 are showed in Fig 4(e) and 4(f) respectively. The magnitude of the peak value inversely varies with motion. The figure reveals that if a block has no/little motion, the magnitude of the peak value is close to one, if a block has complex (cannot be represented using translational motion based on phase correlation) the magnitude is close to zero, and if a block has motion (that can be represented with single translational motion using phase correlation), the magnitude is around 0.5. Thus, the magnitude of a block using phase correlation could be a good feature for video summarization in the case of illumination changes and capturing local motion. The obtained values of the magnitudes change abruptly as shown in Fig. 5. Therefore, we apply Savitzky-Golay filtering [35] with window size ω (see values in Table I) to smooth data. The main advantage of this filtering is that it enhances local maxima [35]. After smoothing data, motion information in frequency domain (t) is obtained. In Fig. 2, the third row represents motion information in frequency domain. It is clear from the curve that it can represent the motion information more accurately than spatial motion information. The reasons are that it is efficient for global changes of illumination and robust to motion estimation near object boundaries. However, it also generates some extra motion information due to noise. Again, frequency based approach lacks of localization problem [6]. Therefore, motion information is calculated in both spatial and frequency domains for generating video summary.

6 1 Research Article Journal of the Optical Society of America A 6 (a) (b) (c) (d) (e) (f) Fig. 4. An example of motion generated in each block of frame no of bl-14 video; (a) frame no. 3868, (b) frame difference between 3868 and 3869 (multiplied by 6 for better visualization), (c) frame no. 3869, phase correlation pick with no motion, single motion and complex motion are represented in (d), (e) and (f) respectively. Frequency Motion Information Smooth Frequency Motion GroundTruth Motion Values Frames Fig. 5. A representation of frequency domain motion information obtained by Phase correlation and smoothed by Savitzky-Golay filtering [32] for bl-14 video with window size 100. The green, blue and red lines indicates ground truth frames, raw and smooth frequency domain motion information.

7 Research Article Journal of the Optical Society of America A 7 D. Visual Saliency Difference Calculation Visual attention cue is a significant sensitive feature to measure the indication of user s attraction label for determining key frames [3]. In order to calculate the visual attention, GBVS method [8] is applied on each frame of a video stream. In GBVS method, graph algorithm is applied due to its computational power, topographical structure, and parallel nature. A fully connected directed graph is obtained by connecting each nodes of a feature map. The weights of the directed edges are assigned according to their dissimilarity and closeness between two nodes. A Markov chain is defined on this directed graph and the equilibrium distribution of this chain reflect the activation values. Later, to concentrate the activation values to the most salient regions, another weighted graph is constructed from these activation values. The weight of the edge of this graph is assigned based on the activation values between two nodes. A Markov chain over this graph offers to calculate the equilibrium distribution over the nodes. In this way, the more attractive regions get more saliency values. Therefore, more uniform and informative saliency map is obtained. It is revealed in this proposed method that when any object s movement occurs in a frame, the saliency value at almost each point of saliency map changes. This incidence motivates to calculate the difference of saliency maps between two consecutive frames. Because the saliency difference map provides more distinguishable information than the single saliency map. After obtaining saliency map, the sum of the saliency difference between two consecutive frames is used as another feature. Suppose, at time t and t-1, two consecutive color frames are F(t) and F(t 1) and their corresponding visual saliency map are V(t) and V(t 1) respectively. The saliency difference H(t) between V(t) and V(t 1) is calculated as follows H i,j (t) = V i,j (t) V i,j (t 1) (9) where (i, j) is the pixel position. Following that, the visual saliency difference feature Λ(t) is obtained by the following equation b c Λ(t) = H i,j (t) (10) i=1 j=1 where b and c represent row and column of H respectively. The visual saliency difference of bl-18 video is shown in the fourth column of Fig. 2. It is easily visible that there are some foreground objects (first row) and tiny motion information (second row) near frame no However, this is not a part of ground truth. In this case, including saliency difference feature to other features (foreground and motion) can provide better result as shown in fourth row of Fig. 2. The difference of saliency maps between two consecutive frames keeps the relative information between them. However, visual saliency difference does not provide the foreground information and accurate motion information. E. Adaptive Linear Weighted Fusion In this approach, a novel adaptive linear weighted fusion scheme is proposed to combine the features for ranking each frame according to their representativeness in a video. Before applying fusion method, each feature is converted into z-score normalization (Z(t)) using the following equation Z(t) = X(t) µ ϱ (11) where X(t) is a feature value at time t, µ is the mean and ϱ is standard deviation of the feature values. Z-score, Z(t) is a normalized form of X(t). In this scheme, z-score normalization is the preferred method because it produces meaningful information about each data point, and provides better results in the presence of outliers than min-max based normalization [36]. The weighted linear fusion is obtained as follows R(t) = w 1 Z Γ (t) + w 2 Z Υ (t) + w 3 Z (t) + w 4 Z Λ (t); (12) where R(t) is the fusion value; Z Γ (t), Z Υ (t), Z (t) and Z Λ (t) are z-score normalization of foreground feature Γ(t), spatial motion feature Υ(t), motion information in frequency domain (t), and visual saliency difference Λ(t) respectively at time t and w 1, w 2, w 3 and w 4 are weights assigned to Z Γ (t), Z Υ (t), Z (t) and Z Λ (t) respectively. The weights values are obtained through a learning process. In the learning step, a small video segment containing important event(s) and unnecessary frames is applied. Frames within important event(s) are selected as key frames and remaining frames are considered as non-key frames. The value of each weight is assigned between 0 and 100. The values of four weights (w 1, w 2, w 3 and w 4 ) are selected such a way that the total summation of these values are equal to 100. Different combination of weights values is applied in Eq.12. For each combination, fusion values are calculated, sorted in descending order, and selected the number of frames from the top similar to the number key frames provided during learning step. After that, the selected frames are matched with the key frames and a score is provided based on the similarity between the selected frames and the key frames to the combination. Finally, a set of weight values with the highest score is selected from the all the combination of weights values and set for the entire video. After that, the fusion values R(1), R(2), R(3),...R(Y) (where Y is the total number of frames in a video) are calculated using Eq.12 with the set of weight values obtained during learning phase. Fusion values are sorted based on descending order. In Fig. 2, the last row presents fusion values for bl-18 video of BL-7F dataset [1]. The proposed fusion method combines all the features in such a way that it is successfully to suppress the unnecessary frames and to highlight the most informative frames. F. Video Summary Generation with User Preferences In the final step, a set of key frames are selected from the sorted fusion values. The introduced approach offers the user to select the skimming ratio (λ) for the summarized video. Otherwise, this approach uses the default value of λ. It generates summarized video based on λ. From the sorted fusion values R(1), R(2), R(3),..., R(Y); video frames are selected from the top based on λ. Finally, summarized video is produced from these selected frames keeping their sequential order in the original video. In Fig. 2, the black line in last row indicates the threshold value of λ. Form this figure, it is easily seen that the proposed method can provide consistence results with the ground truth. 4. RESULTS AND DISCUSSION The proposed method is evaluated by the publicly available BL- 7F dataset [1], Office [37] and Office Lobby dataset [37]. In BL-7F dataset, 19 surveillance videos are taken from fixed surveillance cameras located in the seventh floor of the BarryLam Building

8 Research Article Journal of the Optical Society of America A 8 in National Taiwan University. The duration of each video is 7 minutes 10 seconds and contains 12,900 frames. This dataset also provides a complete list of selected key frames as a ground truth for each video in BL-7F dataset. In Office dataset [37], four videos are collected from stably held with non-fixed cameras. The main difficulties are the vibration of camera and different lighting conditions. Similarly, three videos are collected in Office Lobby dataset [37], with stably held but non-fixed cameras. However, they contain crowded scene with richer activities compared to Lobby and Office datasets. The ground truth key frames for both Office and Office Lobby dataset are also publicly available. (a) (b) (c) (d) (e) (f) (a) (b) (c) Fig. 6. An example of foreground objects extraction. (a) Frame no. 820 (gray scale) of bl-1 video, (b) background frame of (a) and (c) foreground objects. In Fig. 6, an illustration of foreground object extracted by the proposed method is shown. Frame no. 820 from bl-1 video from BL-7F dataset is selected to represent the foreground area. This frame is converted into gray scale image and shown in Fig 6(a). The corresponding gray scale background image of 820 frame obtained by DBM-based method is shown in Fig. 6(b). The foreground object of frame 820 after applying Eq.1 1 is provided in Fig. 6(c). It is demonstrated by Fig. 6 that the proposed technique is capable of extracting foreground region applying DBM [4] for selecting important event information. (a) (b) (c) Fig. 7. An illustration of frame-to-frame motion information estimation, (a) and (b) are frame no. 820 and 819 of bl-1 video respectively, (c) the object motion between frames no 820 and 819. In Fig. 7, the CFD for spatial motion information extracted by the proposed method is presented. Two consecutive (820 and 819) frames (Fig. 7(a) and 7(b)) from bl-1 video taken from BL-7F dataset are selected for this purpose. The object motion information between these two consecutive frames (820 and 819) using Eq. 3 is represented in Fig. 7(c). It is confirmed by Fig. 7 that the proposed method is very competent to estimate spatial motion information. In Fig. 8, an example of the visual saliency difference calculation is displayed. Figure 8(a) and 8(d) show 820 and 819 frames respectively from bl-0 video of BL-7F dataset. Figure 8(b) and 8(e) are the corresponding saliency maps obtained by GBVS algorithm [8][8]. The overlaid images of saliency maps on Fig. 8(a) and 8(b) are shown in Fig. 8(c) and 8(f) respectively. It is easily visible that the most salient regions in Fig. 8(c) and 8(f) are not (g) (h) (i) Fig. 8. A representation of visual saliency difference calculation, (a) and (d) are two consecutive (820 and 819) frames of bl-1 video respectively, (b) and (e) are saliency map of (a) and (d) respectively obtained by [8], (c) and (f) are overlaid saliency map of (a) and (d) respectively. (g) The saliency map difference between (b) and (e); (h) The multiplication of (g) by 5 for clear visualization, (i) The overlaid saliency map difference on (a). very attractive. This information does not provide an accurate indication to select key frames. To overcome this problem, the difference of two consecutive saliency maps is calculated and used as one of the features. Figure 8(g) shows the difference of saliency map of Fig. 8(b) and 8(e). For better visualization, Fig. 8(g) is multiplied by 5 and displayed in Fig. 8(h). The overlaid saliency map of Fig. 8(g) on Fig. 8(a) is shown in Fig. 8(i). It is observed from Fig. 8(i) that the difference of visual saliency represents the more accurate salient region. The motion information in frequency domain extracted by phase correlation technique is shown in Fig 9. Frames no 740 and 741 of bl-0 video are shown in Fig. 9(a), and 9(b) respectively. The motion information extracted by the phase correlation technique is presented in Fig. 9(c). (a) (b) (c) Fig. 9. An example of motion information extracted by the phase correlation method; (a) and (b) are frame no. 820 and 819 of bl-1 video, (c) is the motion obtained by phase correlation technique on frame no To evaluate the proposed method, an object comparison has been performed. For this purpose, a set of evaluation metrics including precision, recall and F-measure are computed. The

9 Research Article Journal of the Optical Society of America A 9 F_foreground F_spatial F_saliency F_frequency F_proposed F_measure bl 0 bl 1 bl 2 bl 3 bl 4 bl 5 bl 6 bl 7 bl 8 bl 9 bl 10 bl 11 bl 12 bl 13 bl 14 bl 15 bl 16 bl 17 bl 18 lobby 0 lobby 1 lobby 2 office 0 office 1 office 2 office 3 Video Fig. 10. F-measures of area of foreground object, spatial motion, saliency difference, and frequency motion features for BL-7F, Lobby, and Office datasets. definition of precision and recall are as follows Precision = Recall = t p t p + f p (13) t p t p + f n (14) where t p, f p and f n are the number of frames selected by a method and the ground truth, the number of frames selected by a method but not by the ground truth, the number of frames selected by the ground truth but not a method respectively. However, either precision or recall alone cannot provide a good indication of perfect measurement. For example, a method can offer better precision but poor recall or vice-versa. To be an efficient and robust method, it must achieve both higher precision and recall. To represent this measure, F-measure is defined combining precision, recall, and represented as follows F 1 measure = 2 Precision Recall Precision + Recall (15) A high value for both precision and recall indicate that F- measure is also very high. Thus, a method with high F-measure value confirms better summarization technique. The graphical representations of F-measure of the proposed method using only foreground objects (F-foreground), spatial motion (F-spatial), saliency difference (F- saliency), motion in frequency domain (F-frequency), and combination of all are shown in Fig.10. After examining this graph, it is obvious that the proposed method using only foreground objects shows better performance in bl-7, bl-8, bl-9, bl-10, bl-11, bl-15, bl-16, lobby-1, lobby-2, and office-3 than other features. If only spatial motion is considered, it performs superior in bl-0, bl-2, bl-4, bl-12, bl-14, bl- 15, bl-16, bl-17, lobby-0, and lobby-1 videos. Again, the proposed method applying only phase correlation technique outperforms in bl-1, bl-3, bl-5, bl-6, bl-18, office-0, office-1, office-2, and office- 3 videos. It is revealed that in case of illumination change it performs better than other features. Saliency difference feature performs better in lobby-1, and lobby-2 that motion in frequency domain feature and in office-2 video that area of foreground feature. Therefore, the proposed approach combines all these features and performs superior to GMM based method in all videos in BL-7F, Office, and Lobby dataset. The proposed approach is compared with the single-view video summarization results provided by GMM-based method [1], saliency directed prioritization (SDP) method [28] and summarization on compress domain (SCD) method [32]. These methods are is the most relevant and the state-of-the-art method to summarize surveillance video. As the proposed method applies Gaussian mixture model, we compare the proposed method with another GMM-based method. There are key differences between GMM based method [1] and the proposed method. Firstly, GMM based method works at frame level whereas Gaussian mixture-based DBM [33] applied in the proposed works in pixel level. Secondly, GMM based method utilizes color descriptor as feature while the proposed method uses three human visual sensitive features such as foreground objects, motion information in spatial and frequency domains, and visual saliency map difference. Since the proposed method employs saliency map difference as a feature, we compare the proposed method with recently proposed SDM method which also applies saliencybased technique. The proposed method applies phase correlation (PC) technique and SCD method implements discrete cosine transform (DCT) technique. As both PC and DCT works on frequency domain, the proposed method is also compared with SCD. In Fig. 11, a number of ground truth frames of bl-11 video of BL-7F dataset [1] and the results obtained by GMM based method as well as the proposed method are shown. Although there are significant contents in frame-number 9963 and 12523, GMM based method fails to select these frames. In contrast, the proposed method is capable to select these frames successfully. The main reason of this success is that the proposed method combines area of foreground objects, visual saliency difference, as well as frequency and spatial motion information. In this proposed method, the user preferred skimming ratio λ is set to the total number of ground truth key frames for each video. It is evaluated that the introduced scheme generates more accurate results if λ + 2% of λ frames are selected from the ranked sorted list of R(1), R(2), R(3),... R(Y) where Y is the total number of frames in a video. The default value of λ is set to 20% of the total number of frames of a video. This skimming ratio is also consistence with some other existing methods [38] [39]. In Fig. 12, the skimming ratio of the ground truth key frames and the total number of frames and the default skimming ratio (20% of the total video frames) for BL-7F, Lobby, and Office [1][37] are shown. It is clear from the graph is that the default skimming ratio is all most consistence of the ground truth skimming ratio provided in [1][37]. The values of the different weights(w 1, w 2, w 3, w 4, and ω) used in the proposed method obtained by the adaptive fusion method are shown in Table 1. The table reveals that the values of weights of w 1, w 2, w 3, and w 4 ) vary from 5% to 85% depends on the nature of videos. The average weights of foreground size and motions are the larger compared to the weight of the saliency feature. We may observe this because for a video motion and amount of foreground are the two most prominent human

10 Research Article Journal of the Optical Society of America A 10 Frame No Ground Truth GMM [1] The Proposed Method Not Selected Not Selected Fig. 11. Evaluation of key frames extraction of bl-11 video of BL-7F dataset; first, second, third, and forth columns indicate frame no., ground truth, results obtained by GMM based method and the proposed method respectively. visual features compared to the salience variation within frames. The results of precision, recall, and F-measure of the proposed method, GMM based method (intra-view), SDP-based method (intra-view) [28] and SCD-based method [32] are shown in Table 2. It is observed from Table 2 that the mean values of the F1-measure for BL-7F dataset obtained by the proposed method is 92.5 whereas those achieved by GMM-based method, SDP-based method and SCD-based method are 66.6, 79.8 and 41.6 respectively. For Lobby dataset, the mean F1-measure of the proposed method is 83.0 which is higher than GMM-based method (80.0), SDP-based method (74.0) and SCD-based method (48.1). In the case of Office dataset, the highest mean of F1- measure is obtained by the proposed method scoring The mean of F1-measure obtained in Office dataset by GMM-based method, SDP-based method and SCD-based method are 53.0, 61.7 and 21.5 respectively. The proposed method achieves higher F1-measure than existing and relevant methods for all videos of BL-7F, Office and Lobby datasets except bl-12 video of BL-7F dataset. Table 2 indicates that the proposed method not only performs in higher accuracy, but also the variance of the performance is more consistent in the different videos of Bl-7F, Office, and Lobby dataset compared to the existing and relevant stateof-the-art methods. The results of F-measure of the proposed method with adaptive weight scheme, average-weight approach, default skimming ratio and SVM along with GMM based approach [1] are shown in Fig. 13. In the proposed method, a well-known SVM library LIBSVM [40] is applied to train a model and applied on a set of test images to obtain key frames. We have employed radial basis function (RBF) as a kernel function as it maps the feature non-linearly in the high dimensional space so that it can handle the non-linear relationship between class labels and attributes [40]. From this graph, it is observed that the proposed method with adaptive weights performs superior all videos of Bl-7F, Office, and Lobby dataset to the recently proposed the-state-ofthe-art GMM based approach [1] except bl-12 video of BL-7F dataset and other approaches (the proposed method with average weight scheme, default skimming ratio and SVM-based scheme). The proposed method with average weights similar to the proposed method with adaptive weight achieves better in lobby-0 and lobby-2 video of Lobby dataset. However, it performs worse in bl-2, bl-12, and bl-15 videos. The proposed method with default skimming ratio performs almost the similar to GMM based method for bl-0, bl-2, bl-6, bl-14, bl-17 video of BL-7F dataset [1] and office-0, office-1, office-2 videos of Office dataset [37]. However, it performs worse in bl-11, bl-12 video of BL-7F dataset [1], lobby-0, lobby-1, lobby-2 video of Office Lobby dataset [37]. The main reason of worse performance for these videos is that the default skimming ratio (20%) is much less than the ground truth skimming ratio (Fig. 12). Therefore, the proposed method with default skimming ratio ignores some key frames. To overcome this problem, we offer the user to select the skimming ratio. The proposed method with SVM also achieves better result as the proposed method with adaptive weights in bl- 15 video of Bl-7F dataset. In contrast, it shows less performance in bl-17 video of BL-7F dataset, lobby-0, lobby-1, and lobby-2 of

11 Skimming Ratio Research Article Journal of the Optical Society of America A Skimming Ratio by Ground Truth Default Skimming Ratio Videos Fig. 12. A comparison of the skimming ratio provided by the ground truth and proposed by the system F1 measure F_gmm F_proposed+Adaptive F_proposed+Avg Weighted F_proposed+SVM F_proposed+DefaultSkimming Video Fig. 13. F-measures of GMM based method (intra-view)[1], proposed method with adaptive, average and default weight values, and SVM.

12 Research Article Journal of the Optical Society of America A 12 Table 1. The threshold values obtained by the proposed adaptive fusion scheme Datasets Video w 1 w 2 w 3 w 4 ω BL-7F Lobby Office bl bl bl bl bl bl bl bl bl bl bl bl bl bl bl bl bl bl bl lobby lobby lobby office office office office Average Lobby dataset [37], and office-1, and office-2 of Office dataset [37][34]. GMM based approach attains poor performance in bl-0, bl-1, bl-3, bl-4, bl-5, bl-6, bl-7, bl-8, bl-9, bl-10, bl-11, bl-13, bl-14, bl-16, bl-18 videos of BL-7F dataset, and office-0, and office-3 of Office dataset [37]. The main reason of the poor performance of GMM based method [1] is that it applies only MPEG-7 color layout descriptor as a feature. It does not consider pixel wise foreground object, motion, and human visual salient information. As the proposed method considers these human visual sensitive features, it outperforms GMM based method. The proposed method with average weight scheme is not performing better than the proposed method with adaptive weight because the contents are not the same for all videos. Therefore, fixed weight values assigned to a video does not guarantee better result for another video. Again, the proposed method with SVM shows inferior performance to the proposed method with adaptive weights. The results of SVM depend on the choice of kernel. Moreover, discrete data does not always provide better result using SVM classifier [41]. However, GMM based approach performs the best only in bl-12 video of BL-7F dataset. After observing the key frames extracted by the proposed method for bl-12 video, the reasons of failure have been explored. In bl-12 video, there are some frames with significant object and motion, however, they are not selected as ground truth frames according to [1]. Again, although there is no foreground object and/or motion, in some frames, they are considered as ground truth. For example, frame no. 4083, 4120, and 4563 contain sufficient amount of object, and motion as shown in first row of Fig. 14. In these frames, it is clearly visible that a person is working near the door. However, these frames are not selected as ground truth (key frames). On the other hand, there is no object, and significant motion exist in frame no , and (see the second row of Fig. 14). Nonetheless, they are selected as key frames (ground truth). There is no explanation found about this incident in [1]. Fig. 14. Sample frames of bl-12 are not selected as ground truth (first row) and considered as key frame (second row). 5. CONCLUSION In this paper, an effective and robust framework is proposed to summarize surveillance video by combining human visual sensitive features such as area of foreground object, motion information in spatial and frequency domain and visual saliency difference in adjacent frames. According to [1], foreground objects usually contain details information of the video contents. Moreover, human being naturally gives more attention to object motion in a video [2]. Furthermore, visual attention cue is a

Related Work A new parameter-less tangent estimation method is proposed for conic part 94 construction; 95

Related Work A new parameter-less tangent estimation method is proposed for conic part 94 construction; 95 Video Summarization using Line Segments, Angles and Conic Parts Md Musfequs Salehin *, Manoranjan Paul, Muhammad Ashad Kabir School of Computing and Mathematics, Charles Sturt University, Bathurst, NSW-2795,

More information

HUMAN VISUAL FIELD BASED SALIENCY PREDICTION METHOD USING EYE TRACKER DATA FOR VIDEO SUMMARIZATION

HUMAN VISUAL FIELD BASED SALIENCY PREDICTION METHOD USING EYE TRACKER DATA FOR VIDEO SUMMARIZATION HUMAN VISUAL FIELD BASED SALIENCY PREDICTION METHOD USING EYE TRACKER DATA FOR VIDEO SUMMARIZATION Md. Musfequs Salehin, Manoranjan Paul School of Computing and Mathematics, Charles Sturt University Bathurst,

More information

Learning video saliency from human gaze using candidate selection

Learning video saliency from human gaze using candidate selection Learning video saliency from human gaze using candidate selection Rudoy, Goldman, Shechtman, Zelnik-Manor CVPR 2013 Paper presentation by Ashish Bora Outline What is saliency? Image vs video Candidates

More information

An ICA based Approach for Complex Color Scene Text Binarization

An ICA based Approach for Complex Color Scene Text Binarization An ICA based Approach for Complex Color Scene Text Binarization Siddharth Kherada IIIT-Hyderabad, India siddharth.kherada@research.iiit.ac.in Anoop M. Namboodiri IIIT-Hyderabad, India anoop@iiit.ac.in

More information

Tri-modal Human Body Segmentation

Tri-modal Human Body Segmentation Tri-modal Human Body Segmentation Master of Science Thesis Cristina Palmero Cantariño Advisor: Sergio Escalera Guerrero February 6, 2014 Outline 1 Introduction 2 Tri-modal dataset 3 Proposed baseline 4

More information

Video Summarization using Geometric Primitives

Video Summarization using Geometric Primitives Video Summarization using Geometric Primitives Md. Musfeq Md. Musfequs Salehin, Manoranjan Paul School of Computing and Mathematics, Charles Sturt University Bathurst, NSW-2795, Australia msalehin@csu.edu.au,

More information

Short Survey on Static Hand Gesture Recognition

Short Survey on Static Hand Gesture Recognition Short Survey on Static Hand Gesture Recognition Huu-Hung Huynh University of Science and Technology The University of Danang, Vietnam Duc-Hoang Vo University of Science and Technology The University of

More information

An Introduction to Content Based Image Retrieval

An Introduction to Content Based Image Retrieval CHAPTER -1 An Introduction to Content Based Image Retrieval 1.1 Introduction With the advancement in internet and multimedia technologies, a huge amount of multimedia data in the form of audio, video and

More information

Motion Tracking and Event Understanding in Video Sequences

Motion Tracking and Event Understanding in Video Sequences Motion Tracking and Event Understanding in Video Sequences Isaac Cohen Elaine Kang, Jinman Kang Institute for Robotics and Intelligent Systems University of Southern California Los Angeles, CA Objectives!

More information

82 REGISTRATION OF RETINOGRAPHIES

82 REGISTRATION OF RETINOGRAPHIES 82 REGISTRATION OF RETINOGRAPHIES 3.3 Our method Our method resembles the human approach to image matching in the sense that we also employ as guidelines features common to both images. It seems natural

More information

Robust Ring Detection In Phase Correlation Surfaces

Robust Ring Detection In Phase Correlation Surfaces Griffith Research Online https://research-repository.griffith.edu.au Robust Ring Detection In Phase Correlation Surfaces Author Gonzalez, Ruben Published 2013 Conference Title 2013 International Conference

More information

Chapter 3 Image Registration. Chapter 3 Image Registration

Chapter 3 Image Registration. Chapter 3 Image Registration Chapter 3 Image Registration Distributed Algorithms for Introduction (1) Definition: Image Registration Input: 2 images of the same scene but taken from different perspectives Goal: Identify transformation

More information

WP1: Video Data Analysis

WP1: Video Data Analysis Leading : UNICT Participant: UEDIN Fish4Knowledge Final Review Meeting - November 29, 2013 - Luxembourg Workpackage 1 Objectives Fish Detection: Background/foreground modeling algorithms able to deal with

More information

Mobile Human Detection Systems based on Sliding Windows Approach-A Review

Mobile Human Detection Systems based on Sliding Windows Approach-A Review Mobile Human Detection Systems based on Sliding Windows Approach-A Review Seminar: Mobile Human detection systems Njieutcheu Tassi cedrique Rovile Department of Computer Engineering University of Heidelberg

More information

Accelerometer Gesture Recognition

Accelerometer Gesture Recognition Accelerometer Gesture Recognition Michael Xie xie@cs.stanford.edu David Pan napdivad@stanford.edu December 12, 2014 Abstract Our goal is to make gesture-based input for smartphones and smartwatches accurate

More information

Pairwise Threshold for Gaussian Mixture Classification and its Application on Human Tracking Enhancement

Pairwise Threshold for Gaussian Mixture Classification and its Application on Human Tracking Enhancement Pairwise Threshold for Gaussian Mixture Classification and its Application on Human Tracking Enhancement Daegeon Kim Sung Chun Lee Institute for Robotics and Intelligent Systems University of Southern

More information

Robust color segmentation algorithms in illumination variation conditions

Robust color segmentation algorithms in illumination variation conditions 286 CHINESE OPTICS LETTERS / Vol. 8, No. / March 10, 2010 Robust color segmentation algorithms in illumination variation conditions Jinhui Lan ( ) and Kai Shen ( Department of Measurement and Control Technologies,

More information

Fast trajectory matching using small binary images

Fast trajectory matching using small binary images Title Fast trajectory matching using small binary images Author(s) Zhuo, W; Schnieders, D; Wong, KKY Citation The 3rd International Conference on Multimedia Technology (ICMT 2013), Guangzhou, China, 29

More information

CHAPTER 5 MOTION DETECTION AND ANALYSIS

CHAPTER 5 MOTION DETECTION AND ANALYSIS CHAPTER 5 MOTION DETECTION AND ANALYSIS 5.1. Introduction: Motion processing is gaining an intense attention from the researchers with the progress in motion studies and processing competence. A series

More information

CHAPTER 6 PERCEPTUAL ORGANIZATION BASED ON TEMPORAL DYNAMICS

CHAPTER 6 PERCEPTUAL ORGANIZATION BASED ON TEMPORAL DYNAMICS CHAPTER 6 PERCEPTUAL ORGANIZATION BASED ON TEMPORAL DYNAMICS This chapter presents a computational model for perceptual organization. A figure-ground segregation network is proposed based on a novel boundary

More information

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin

More information

Efficient Acquisition of Human Existence Priors from Motion Trajectories

Efficient Acquisition of Human Existence Priors from Motion Trajectories Efficient Acquisition of Human Existence Priors from Motion Trajectories Hitoshi Habe Hidehito Nakagawa Masatsugu Kidode Graduate School of Information Science, Nara Institute of Science and Technology

More information

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS Cognitive Robotics Original: David G. Lowe, 004 Summary: Coen van Leeuwen, s1460919 Abstract: This article presents a method to extract

More information

Video Key-Frame Extraction using Entropy value as Global and Local Feature

Video Key-Frame Extraction using Entropy value as Global and Local Feature Video Key-Frame Extraction using Entropy value as Global and Local Feature Siddu. P Algur #1, Vivek. R *2 # Department of Information Science Engineering, B.V. Bhoomraddi College of Engineering and Technology

More information

Edge and local feature detection - 2. Importance of edge detection in computer vision

Edge and local feature detection - 2. Importance of edge detection in computer vision Edge and local feature detection Gradient based edge detection Edge detection by function fitting Second derivative edge detectors Edge linking and the construction of the chain graph Edge and local feature

More information

Texture Image Segmentation using FCM

Texture Image Segmentation using FCM Proceedings of 2012 4th International Conference on Machine Learning and Computing IPCSIT vol. 25 (2012) (2012) IACSIT Press, Singapore Texture Image Segmentation using FCM Kanchan S. Deshmukh + M.G.M

More information

Dynamic visual attention: competitive versus motion priority scheme

Dynamic visual attention: competitive versus motion priority scheme Dynamic visual attention: competitive versus motion priority scheme Bur A. 1, Wurtz P. 2, Müri R.M. 2 and Hügli H. 1 1 Institute of Microtechnology, University of Neuchâtel, Neuchâtel, Switzerland 2 Perception

More information

Motion in 2D image sequences

Motion in 2D image sequences Motion in 2D image sequences Definitely used in human vision Object detection and tracking Navigation and obstacle avoidance Analysis of actions or activities Segmentation and understanding of video sequences

More information

A new predictive image compression scheme using histogram analysis and pattern matching

A new predictive image compression scheme using histogram analysis and pattern matching University of Wollongong Research Online University of Wollongong in Dubai - Papers University of Wollongong in Dubai 00 A new predictive image compression scheme using histogram analysis and pattern matching

More information

Classification of objects from Video Data (Group 30)

Classification of objects from Video Data (Group 30) Classification of objects from Video Data (Group 30) Sheallika Singh 12665 Vibhuti Mahajan 12792 Aahitagni Mukherjee 12001 M Arvind 12385 1 Motivation Video surveillance has been employed for a long time

More information

Robotics Programming Laboratory

Robotics Programming Laboratory Chair of Software Engineering Robotics Programming Laboratory Bertrand Meyer Jiwon Shin Lecture 8: Robot Perception Perception http://pascallin.ecs.soton.ac.uk/challenges/voc/databases.html#caltech car

More information

Multi-Camera Calibration, Object Tracking and Query Generation

Multi-Camera Calibration, Object Tracking and Query Generation MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Multi-Camera Calibration, Object Tracking and Query Generation Porikli, F.; Divakaran, A. TR2003-100 August 2003 Abstract An automatic object

More information

CS 223B Computer Vision Problem Set 3

CS 223B Computer Vision Problem Set 3 CS 223B Computer Vision Problem Set 3 Due: Feb. 22 nd, 2011 1 Probabilistic Recursion for Tracking In this problem you will derive a method for tracking a point of interest through a sequence of images.

More information

Dense Image-based Motion Estimation Algorithms & Optical Flow

Dense Image-based Motion Estimation Algorithms & Optical Flow Dense mage-based Motion Estimation Algorithms & Optical Flow Video A video is a sequence of frames captured at different times The video data is a function of v time (t) v space (x,y) ntroduction to motion

More information

SEMI-BLIND IMAGE RESTORATION USING A LOCAL NEURAL APPROACH

SEMI-BLIND IMAGE RESTORATION USING A LOCAL NEURAL APPROACH SEMI-BLIND IMAGE RESTORATION USING A LOCAL NEURAL APPROACH Ignazio Gallo, Elisabetta Binaghi and Mario Raspanti Universitá degli Studi dell Insubria Varese, Italy email: ignazio.gallo@uninsubria.it ABSTRACT

More information

JPEG compression of monochrome 2D-barcode images using DCT coefficient distributions

JPEG compression of monochrome 2D-barcode images using DCT coefficient distributions Edith Cowan University Research Online ECU Publications Pre. JPEG compression of monochrome D-barcode images using DCT coefficient distributions Keng Teong Tan Hong Kong Baptist University Douglas Chai

More information

Evaluation of Moving Object Tracking Techniques for Video Surveillance Applications

Evaluation of Moving Object Tracking Techniques for Video Surveillance Applications International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2015INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Evaluation

More information

Adaptive Background Mixture Models for Real-Time Tracking

Adaptive Background Mixture Models for Real-Time Tracking Adaptive Background Mixture Models for Real-Time Tracking Chris Stauffer and W.E.L Grimson CVPR 1998 Brendan Morris http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Motivation Video monitoring and surveillance

More information

Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging

Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging 1 CS 9 Final Project Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging Feiyu Chen Department of Electrical Engineering ABSTRACT Subject motion is a significant

More information

CS 231A Computer Vision (Fall 2012) Problem Set 3

CS 231A Computer Vision (Fall 2012) Problem Set 3 CS 231A Computer Vision (Fall 2012) Problem Set 3 Due: Nov. 13 th, 2012 (2:15pm) 1 Probabilistic Recursion for Tracking (20 points) In this problem you will derive a method for tracking a point of interest

More information

Short Run length Descriptor for Image Retrieval

Short Run length Descriptor for Image Retrieval CHAPTER -6 Short Run length Descriptor for Image Retrieval 6.1 Introduction In the recent years, growth of multimedia information from various sources has increased many folds. This has created the demand

More information

Learning a Manifold as an Atlas Supplementary Material

Learning a Manifold as an Atlas Supplementary Material Learning a Manifold as an Atlas Supplementary Material Nikolaos Pitelis Chris Russell School of EECS, Queen Mary, University of London [nikolaos.pitelis,chrisr,lourdes]@eecs.qmul.ac.uk Lourdes Agapito

More information

Motion Estimation and Optical Flow Tracking

Motion Estimation and Optical Flow Tracking Image Matching Image Retrieval Object Recognition Motion Estimation and Optical Flow Tracking Example: Mosiacing (Panorama) M. Brown and D. G. Lowe. Recognising Panoramas. ICCV 2003 Example 3D Reconstruction

More information

Segmentation and Tracking of Partial Planar Templates

Segmentation and Tracking of Partial Planar Templates Segmentation and Tracking of Partial Planar Templates Abdelsalam Masoud William Hoff Colorado School of Mines Colorado School of Mines Golden, CO 800 Golden, CO 800 amasoud@mines.edu whoff@mines.edu Abstract

More information

CHAPTER 9. Classification Scheme Using Modified Photometric. Stereo and 2D Spectra Comparison

CHAPTER 9. Classification Scheme Using Modified Photometric. Stereo and 2D Spectra Comparison CHAPTER 9 Classification Scheme Using Modified Photometric Stereo and 2D Spectra Comparison 9.1. Introduction In Chapter 8, even we combine more feature spaces and more feature generators, we note that

More information

C. Premsai 1, Prof. A. Kavya 2 School of Computer Science, School of Computer Science Engineering, Engineering VIT Chennai, VIT Chennai

C. Premsai 1, Prof. A. Kavya 2 School of Computer Science, School of Computer Science Engineering, Engineering VIT Chennai, VIT Chennai Traffic Sign Detection Via Graph-Based Ranking and Segmentation Algorithm C. Premsai 1, Prof. A. Kavya 2 School of Computer Science, School of Computer Science Engineering, Engineering VIT Chennai, VIT

More information

International Journal of Advance Engineering and Research Development

International Journal of Advance Engineering and Research Development Scientific Journal of Impact Factor (SJIF): 4.72 International Journal of Advance Engineering and Research Development Volume 4, Issue 11, November -2017 e-issn (O): 2348-4470 p-issn (P): 2348-6406 Comparative

More information

MOVING OBJECT DETECTION USING BACKGROUND SUBTRACTION ALGORITHM USING SIMULINK

MOVING OBJECT DETECTION USING BACKGROUND SUBTRACTION ALGORITHM USING SIMULINK MOVING OBJECT DETECTION USING BACKGROUND SUBTRACTION ALGORITHM USING SIMULINK Mahamuni P. D 1, R. P. Patil 2, H.S. Thakar 3 1 PG Student, E & TC Department, SKNCOE, Vadgaon Bk, Pune, India 2 Asst. Professor,

More information

Bus Detection and recognition for visually impaired people

Bus Detection and recognition for visually impaired people Bus Detection and recognition for visually impaired people Hangrong Pan, Chucai Yi, and Yingli Tian The City College of New York The Graduate Center The City University of New York MAP4VIP Outline Motivation

More information

Guided Image Super-Resolution: A New Technique for Photogeometric Super-Resolution in Hybrid 3-D Range Imaging

Guided Image Super-Resolution: A New Technique for Photogeometric Super-Resolution in Hybrid 3-D Range Imaging Guided Image Super-Resolution: A New Technique for Photogeometric Super-Resolution in Hybrid 3-D Range Imaging Florin C. Ghesu 1, Thomas Köhler 1,2, Sven Haase 1, Joachim Hornegger 1,2 04.09.2014 1 Pattern

More information

Digital Image Processing. Prof. P.K. Biswas. Department of Electronics & Electrical Communication Engineering

Digital Image Processing. Prof. P.K. Biswas. Department of Electronics & Electrical Communication Engineering Digital Image Processing Prof. P.K. Biswas Department of Electronics & Electrical Communication Engineering Indian Institute of Technology, Kharagpur Image Segmentation - III Lecture - 31 Hello, welcome

More information

A Feature Point Matching Based Approach for Video Objects Segmentation

A Feature Point Matching Based Approach for Video Objects Segmentation A Feature Point Matching Based Approach for Video Objects Segmentation Yan Zhang, Zhong Zhou, Wei Wu State Key Laboratory of Virtual Reality Technology and Systems, Beijing, P.R. China School of Computer

More information

Text Extraction in Video

Text Extraction in Video International Journal of Computational Engineering Research Vol, 03 Issue, 5 Text Extraction in Video 1, Ankur Srivastava, 2, Dhananjay Kumar, 3, Om Prakash Gupta, 4, Amit Maurya, 5, Mr.sanjay kumar Srivastava

More information

The SIFT (Scale Invariant Feature

The SIFT (Scale Invariant Feature The SIFT (Scale Invariant Feature Transform) Detector and Descriptor developed by David Lowe University of British Columbia Initial paper ICCV 1999 Newer journal paper IJCV 2004 Review: Matt Brown s Canonical

More information

MULTIVIEW REPRESENTATION OF 3D OBJECTS OF A SCENE USING VIDEO SEQUENCES

MULTIVIEW REPRESENTATION OF 3D OBJECTS OF A SCENE USING VIDEO SEQUENCES MULTIVIEW REPRESENTATION OF 3D OBJECTS OF A SCENE USING VIDEO SEQUENCES Mehran Yazdi and André Zaccarin CVSL, Dept. of Electrical and Computer Engineering, Laval University Ste-Foy, Québec GK 7P4, Canada

More information

Estimating the wavelength composition of scene illumination from image data is an

Estimating the wavelength composition of scene illumination from image data is an Chapter 3 The Principle and Improvement for AWB in DSC 3.1 Introduction Estimating the wavelength composition of scene illumination from image data is an important topics in color engineering. Solutions

More information

Lecture 6: Multimedia Information Retrieval Dr. Jian Zhang

Lecture 6: Multimedia Information Retrieval Dr. Jian Zhang Lecture 6: Multimedia Information Retrieval Dr. Jian Zhang NICTA & CSE UNSW COMP9314 Advanced Database S1 2007 jzhang@cse.unsw.edu.au Reference Papers and Resources Papers: Colour spaces-perceptual, historical

More information

Salient Region Detection and Segmentation in Images using Dynamic Mode Decomposition

Salient Region Detection and Segmentation in Images using Dynamic Mode Decomposition Salient Region Detection and Segmentation in Images using Dynamic Mode Decomposition Sikha O K 1, Sachin Kumar S 2, K P Soman 2 1 Department of Computer Science 2 Centre for Computational Engineering and

More information

CORRELATION BASED CAR NUMBER PLATE EXTRACTION SYSTEM

CORRELATION BASED CAR NUMBER PLATE EXTRACTION SYSTEM CORRELATION BASED CAR NUMBER PLATE EXTRACTION SYSTEM 1 PHYO THET KHIN, 2 LAI LAI WIN KYI 1,2 Department of Information Technology, Mandalay Technological University The Republic of the Union of Myanmar

More information

Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions

Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions Akitsugu Noguchi and Keiji Yanai Department of Computer Science, The University of Electro-Communications, 1-5-1 Chofugaoka,

More information

Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning

Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning Devina Desai ddevina1@csee.umbc.edu Tim Oates oates@csee.umbc.edu Vishal Shanbhag vshan1@csee.umbc.edu Machine Learning

More information

Summarization of Egocentric Moving Videos for Generating Walking Route Guidance

Summarization of Egocentric Moving Videos for Generating Walking Route Guidance Summarization of Egocentric Moving Videos for Generating Walking Route Guidance Masaya Okamoto and Keiji Yanai Department of Informatics, The University of Electro-Communications 1-5-1 Chofugaoka, Chofu-shi,

More information

Detecting and Identifying Moving Objects in Real-Time

Detecting and Identifying Moving Objects in Real-Time Chapter 9 Detecting and Identifying Moving Objects in Real-Time For surveillance applications or for human-computer interaction, the automated real-time tracking of moving objects in images from a stationary

More information

Obtaining Feature Correspondences

Obtaining Feature Correspondences Obtaining Feature Correspondences Neill Campbell May 9, 2008 A state-of-the-art system for finding objects in images has recently been developed by David Lowe. The algorithm is termed the Scale-Invariant

More information

Cellular Learning Automata-Based Color Image Segmentation using Adaptive Chains

Cellular Learning Automata-Based Color Image Segmentation using Adaptive Chains Cellular Learning Automata-Based Color Image Segmentation using Adaptive Chains Ahmad Ali Abin, Mehran Fotouhi, Shohreh Kasaei, Senior Member, IEEE Sharif University of Technology, Tehran, Iran abin@ce.sharif.edu,

More information

Moving Object Segmentation Method Based on Motion Information Classification by X-means and Spatial Region Segmentation

Moving Object Segmentation Method Based on Motion Information Classification by X-means and Spatial Region Segmentation IJCSNS International Journal of Computer Science and Network Security, VOL.13 No.11, November 2013 1 Moving Object Segmentation Method Based on Motion Information Classification by X-means and Spatial

More information

Experimentation on the use of Chromaticity Features, Local Binary Pattern and Discrete Cosine Transform in Colour Texture Analysis

Experimentation on the use of Chromaticity Features, Local Binary Pattern and Discrete Cosine Transform in Colour Texture Analysis Experimentation on the use of Chromaticity Features, Local Binary Pattern and Discrete Cosine Transform in Colour Texture Analysis N.Padmapriya, Ovidiu Ghita, and Paul.F.Whelan Vision Systems Laboratory,

More information

DYNAMIC BACKGROUND SUBTRACTION BASED ON SPATIAL EXTENDED CENTER-SYMMETRIC LOCAL BINARY PATTERN. Gengjian Xue, Jun Sun, Li Song

DYNAMIC BACKGROUND SUBTRACTION BASED ON SPATIAL EXTENDED CENTER-SYMMETRIC LOCAL BINARY PATTERN. Gengjian Xue, Jun Sun, Li Song DYNAMIC BACKGROUND SUBTRACTION BASED ON SPATIAL EXTENDED CENTER-SYMMETRIC LOCAL BINARY PATTERN Gengjian Xue, Jun Sun, Li Song Institute of Image Communication and Information Processing, Shanghai Jiao

More information

Background subtraction in people detection framework for RGB-D cameras

Background subtraction in people detection framework for RGB-D cameras Background subtraction in people detection framework for RGB-D cameras Anh-Tuan Nghiem, Francois Bremond INRIA-Sophia Antipolis 2004 Route des Lucioles, 06902 Valbonne, France nghiemtuan@gmail.com, Francois.Bremond@inria.fr

More information

Graph-based High Level Motion Segmentation using Normalized Cuts

Graph-based High Level Motion Segmentation using Normalized Cuts Graph-based High Level Motion Segmentation using Normalized Cuts Sungju Yun, Anjin Park and Keechul Jung Abstract Motion capture devices have been utilized in producing several contents, such as movies

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

ELEC Dr Reji Mathew Electrical Engineering UNSW

ELEC Dr Reji Mathew Electrical Engineering UNSW ELEC 4622 Dr Reji Mathew Electrical Engineering UNSW Review of Motion Modelling and Estimation Introduction to Motion Modelling & Estimation Forward Motion Backward Motion Block Motion Estimation Motion

More information

A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems

A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems Anestis Gkanogiannis and Theodore Kalamboukis Department of Informatics Athens University of Economics

More information

Colour Segmentation-based Computation of Dense Optical Flow with Application to Video Object Segmentation

Colour Segmentation-based Computation of Dense Optical Flow with Application to Video Object Segmentation ÖGAI Journal 24/1 11 Colour Segmentation-based Computation of Dense Optical Flow with Application to Video Object Segmentation Michael Bleyer, Margrit Gelautz, Christoph Rhemann Vienna University of Technology

More information

Computationally Efficient Serial Combination of Rotation-invariant and Rotation Compensating Iris Recognition Algorithms

Computationally Efficient Serial Combination of Rotation-invariant and Rotation Compensating Iris Recognition Algorithms Computationally Efficient Serial Combination of Rotation-invariant and Rotation Compensating Iris Recognition Algorithms Andreas Uhl Department of Computer Sciences University of Salzburg, Austria uhl@cosy.sbg.ac.at

More information

Lecture 9: Hough Transform and Thresholding base Segmentation

Lecture 9: Hough Transform and Thresholding base Segmentation #1 Lecture 9: Hough Transform and Thresholding base Segmentation Saad Bedros sbedros@umn.edu Hough Transform Robust method to find a shape in an image Shape can be described in parametric form A voting

More information

Saliency Detection for Videos Using 3D FFT Local Spectra

Saliency Detection for Videos Using 3D FFT Local Spectra Saliency Detection for Videos Using 3D FFT Local Spectra Zhiling Long and Ghassan AlRegib School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA ABSTRACT

More information

CS4733 Class Notes, Computer Vision

CS4733 Class Notes, Computer Vision CS4733 Class Notes, Computer Vision Sources for online computer vision tutorials and demos - http://www.dai.ed.ac.uk/hipr and Computer Vision resources online - http://www.dai.ed.ac.uk/cvonline Vision

More information

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009 Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer

More information

Automatic Colorization of Grayscale Images

Automatic Colorization of Grayscale Images Automatic Colorization of Grayscale Images Austin Sousa Rasoul Kabirzadeh Patrick Blaes Department of Electrical Engineering, Stanford University 1 Introduction ere exists a wealth of photographic images,

More information

Spatial Outlier Detection

Spatial Outlier Detection Spatial Outlier Detection Chang-Tien Lu Department of Computer Science Northern Virginia Center Virginia Tech Joint work with Dechang Chen, Yufeng Kou, Jiang Zhao 1 Spatial Outlier A spatial data point

More information

Change detection using joint intensity histogram

Change detection using joint intensity histogram Change detection using joint intensity histogram Yasuyo Kita National Institute of Advanced Industrial Science and Technology (AIST) Information Technology Research Institute AIST Tsukuba Central 2, 1-1-1

More information

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant

More information

AN EFFICIENT BATIK IMAGE RETRIEVAL SYSTEM BASED ON COLOR AND TEXTURE FEATURES

AN EFFICIENT BATIK IMAGE RETRIEVAL SYSTEM BASED ON COLOR AND TEXTURE FEATURES AN EFFICIENT BATIK IMAGE RETRIEVAL SYSTEM BASED ON COLOR AND TEXTURE FEATURES 1 RIMA TRI WAHYUNINGRUM, 2 INDAH AGUSTIEN SIRADJUDDIN 1, 2 Department of Informatics Engineering, University of Trunojoyo Madura,

More information

Multidimensional Image Registered Scanner using MDPSO (Multi-objective Discrete Particle Swarm Optimization)

Multidimensional Image Registered Scanner using MDPSO (Multi-objective Discrete Particle Swarm Optimization) Multidimensional Image Registered Scanner using MDPSO (Multi-objective Discrete Particle Swarm Optimization) Rishiganesh V 1, Swaruba P 2 PG Scholar M.Tech-Multimedia Technology, Department of CSE, K.S.R.

More information

EE368 Project Report CD Cover Recognition Using Modified SIFT Algorithm

EE368 Project Report CD Cover Recognition Using Modified SIFT Algorithm EE368 Project Report CD Cover Recognition Using Modified SIFT Algorithm Group 1: Mina A. Makar Stanford University mamakar@stanford.edu Abstract In this report, we investigate the application of the Scale-Invariant

More information

Last update: May 4, Vision. CMSC 421: Chapter 24. CMSC 421: Chapter 24 1

Last update: May 4, Vision. CMSC 421: Chapter 24. CMSC 421: Chapter 24 1 Last update: May 4, 200 Vision CMSC 42: Chapter 24 CMSC 42: Chapter 24 Outline Perception generally Image formation Early vision 2D D Object recognition CMSC 42: Chapter 24 2 Perception generally Stimulus

More information

Performance Evaluation of Monitoring System Using IP Camera Networks

Performance Evaluation of Monitoring System Using IP Camera Networks 1077 Performance Evaluation of Monitoring System Using IP Camera Networks Maysoon Hashim Ismiaal Department of electronic and communications, faculty of engineering, university of kufa Abstract Today,

More information

Feature extraction. Bi-Histogram Binarization Entropy. What is texture Texture primitives. Filter banks 2D Fourier Transform Wavlet maxima points

Feature extraction. Bi-Histogram Binarization Entropy. What is texture Texture primitives. Filter banks 2D Fourier Transform Wavlet maxima points Feature extraction Bi-Histogram Binarization Entropy What is texture Texture primitives Filter banks 2D Fourier Transform Wavlet maxima points Edge detection Image gradient Mask operators Feature space

More information

Video shot segmentation using late fusion technique

Video shot segmentation using late fusion technique Video shot segmentation using late fusion technique by C. Krishna Mohan, N. Dhananjaya, B.Yegnanarayana in Proc. Seventh International Conference on Machine Learning and Applications, 2008, San Diego,

More information

Color Image Segmentation Using a Spatial K-Means Clustering Algorithm

Color Image Segmentation Using a Spatial K-Means Clustering Algorithm Color Image Segmentation Using a Spatial K-Means Clustering Algorithm Dana Elena Ilea and Paul F. Whelan Vision Systems Group School of Electronic Engineering Dublin City University Dublin 9, Ireland danailea@eeng.dcu.ie

More information

Comparative Study of ROI Extraction of Palmprint

Comparative Study of ROI Extraction of Palmprint 251 Comparative Study of ROI Extraction of Palmprint 1 Milind E. Rane, 2 Umesh S Bhadade 1,2 SSBT COE&T, North Maharashtra University Jalgaon, India Abstract - The Palmprint region segmentation is an important

More information

Image Processing. Image Features

Image Processing. Image Features Image Processing Image Features Preliminaries 2 What are Image Features? Anything. What they are used for? Some statements about image fragments (patches) recognition Search for similar patches matching

More information

Figure 1 shows unstructured data when plotted on the co-ordinate axis

Figure 1 shows unstructured data when plotted on the co-ordinate axis 7th International Conference on Computational Intelligence, Communication Systems and Networks (CICSyN) Key Frame Extraction and Foreground Modelling Using K-Means Clustering Azra Nasreen Kaushik Roy Kunal

More information

A Model of Dynamic Visual Attention for Object Tracking in Natural Image Sequences

A Model of Dynamic Visual Attention for Object Tracking in Natural Image Sequences Published in Computational Methods in Neural Modeling. (In: Lecture Notes in Computer Science) 2686, vol. 1, 702-709, 2003 which should be used for any reference to this work 1 A Model of Dynamic Visual

More information

Saliency Extraction for Gaze-Contingent Displays

Saliency Extraction for Gaze-Contingent Displays In: Workshop on Organic Computing, P. Dadam, M. Reichert (eds.), Proceedings of the 34th GI-Jahrestagung, Vol. 2, 646 650, Ulm, September 2004. Saliency Extraction for Gaze-Contingent Displays Martin Böhme,

More information

A Fast Moving Object Detection Technique In Video Surveillance System

A Fast Moving Object Detection Technique In Video Surveillance System A Fast Moving Object Detection Technique In Video Surveillance System Paresh M. Tank, Darshak G. Thakore, Computer Engineering Department, BVM Engineering College, VV Nagar-388120, India. Abstract Nowadays

More information

ROBUST LINE-BASED CALIBRATION OF LENS DISTORTION FROM A SINGLE VIEW

ROBUST LINE-BASED CALIBRATION OF LENS DISTORTION FROM A SINGLE VIEW ROBUST LINE-BASED CALIBRATION OF LENS DISTORTION FROM A SINGLE VIEW Thorsten Thormählen, Hellward Broszio, Ingolf Wassermann thormae@tnt.uni-hannover.de University of Hannover, Information Technology Laboratory,

More information

Segmentation and Grouping

Segmentation and Grouping Segmentation and Grouping How and what do we see? Fundamental Problems ' Focus of attention, or grouping ' What subsets of pixels do we consider as possible objects? ' All connected subsets? ' Representation

More information

Pattern based Residual Coding for H.264 Encoder *

Pattern based Residual Coding for H.264 Encoder * Pattern based Residual Coding for H.264 Encoder * Manoranjan Paul and Manzur Murshed Gippsland School of Information Technology, Monash University, Churchill, Vic-3842, Australia E-mail: {Manoranjan.paul,

More information