Fabio Remondino. Tracking of human movements in image space
|
|
- Maud Parks
- 6 years ago
- Views:
Transcription
1 Tracking of human movements in image space 1
2 Table of content 1. Introduction 3 2. Human tracking overview 4 3. Data acquisition 6 4. Algorithms overview The least square matching tracker Object tracking The Shi-Tomasi-Kanade tracker Detection and tracking of moving objects Features selection for tracking human body part Results Least square matching tracking Shi-Tomasi-Kanade tracker Detection and tracking of moving objects Object tracking Conclusions Feature works 35 Bibliography 36 2
3 1. Introduction Human motion analysis is receiving increasing attention from researchers of different fields of study. The interest is motivated by a wide spectrum of applications, such as athletic performance analysis, surveillance, man-machine interface, video-conferencing, human-computer interaction, motion capture (games and animation). A complete model of human consists of both the movements and the shape of the body. Many of the available systems consider the two modeling processes as separate even if they are very close. Depending on the applications (animation, visualization, medical imaging) different methods can be used for the measurement of the body shape: laser scanner, infra-red light scanner, photogrammetry, structured light. The modeling of the movement is often obtained capturing the motion with tracking processes: this can be achieved with photogrammetric methods, electromagnetic or mechanical sensors systems and image-based methods. In general the tracking process can be described as the establishment of correspondences of the image structure between consecutive frames, based on features related to position, velocity, shape, color and texture. The main problem is to establish automatically the corresponding features in different images. The tracking is required for 2D and 3D object localization and it is also used for object detection, classification and identification. The main goals of motion studies are to detect moving regions (points, features, areas), estimate the motion, model articulated objects and interpret the motion. It is a very hard task as: - the appearance of the people can vary dramatically from frame to frame; - people can appear in arbitrary poses; - the human body can deform in complex way; - tracked points can be occluded, resulting in ambiguities and multi interpretations; - tracked points (joints) are often not well observable (clothing hide the underlying structure); - it is geometrically under-constrained problem (images are 2D entities of a 3D world). This work focuses on the tracking of movements of humans in monocular sequences of images. Section 2 deals with a general overview of tracking techniques, including motion capture, human modelling processes and moving objects detection. In Section 3 the used techniques for images acquisition and the contrast enhancement process are presented. In Section 4 an overview of the implemented algorithms is described while a short description of interest features is contained in Section 5. Finally Section 6 shows all the results for the validation of the algorithms. 3
4 2. Human tracking overview The main problem of tracking humans (and in particular humans movements) is how to capture the position and motion in space of articulated parts of human body. Typically the tracking process involves the matching between frames using pixels, points, lines and blobs based on their motion, shape or other visual information. Tracking movements of persons and modeling the different part of the human body are two applications very close to each other. There are two main techniques to capture the human motions [2]: (a) Tracking using body markers These tracking systems can be divided in [13]: 1. Systems which employ sensors on the body that sense artificial external sources (e.g. electromagnetic field), or natural external sources. These systems provide 3D world-based information but their workspace and accuracy is generally limited due to the use of the external sources and their formfactor restricts their use to medium and larger sized body parts. 2. Systems which employ an external sensor that senses artificial sources or markers on the body, e.g. an electro-optical system that tracks reflective markers, or natural sources on the body (e.g. a video-camera based system that tracks the pupil and cornea). These systems generally suffer from occlusion and a limited workspace. 3. Systems which employ sensors and sources that are both on the body (e.g. a glove with piezoresistive flex sensors). The sensors generally have small form-factors and are therefore especially suitable for tracking small body parts. These systems allow for capture of any body movement and allow for an unlimited workspace but generally do not provide 3D world-based information. In figure 2.1 some systems for motion capture are presented. Fig.2.1: Different systems for motion capture. Left and right retro-reflective markers. Middle: electro-mechanical system All these techniques are used especially in motion capture where object s position and orientation in physical space are recorded as information in a suitable form that animators can use to control elements in a computer generated scene. The disadvantages of these technique are: - displacement of the markers during movement brings to uncertainty in the results; - difficulty to place on complex articulation (like shoulders, knees); - rigidity in movement (psychological effects) - difficult calibration of the system. 4
5 The main advantage is the capability of some systems to process the data and produce 3D results in real time. (b) Tracking without markers (marker free methods) Marker free methods are based on image sequences processing/analysis. These methods are often model-based; the image sequences can be acquired either from one camera (monocular vision), or from multiple cameras (multi-views). In monocular case different approaches can be used to track the human body: matching point features, contour extraction (sensitive to noise), 3-D geometric primitives (projected onto the images) [13], probabilistic models of the joint positions [22], particle filtering [3], active part decomposition. In the multi-views approach, multiple cameras acquire simultaneously different views of the person and the 3-D body poses and motions at each time instant can be recovered from the multiimage sequences [7]. The marker free methods offer the subject complete freedom of movement which is not the case of tracking with markers. Image understanding and extrapolation of the third dimension are the main problems for these methods, especially in monocular vision. In this case the 3D coordinates can be induced from the 2D image coordinates e.g. using a Bayesian approach and a set of training data [11] or fitting the projection of a three-dimensional person model through the sequence [21, 24]. The main problems of these approaches are the models of the different part of the body (using cylinders, cones, elliptical cylinders), the large number of degrees of freedom of the model (body joints, rotations, orientations) and the modeling of the motion (prediction of the next steps). In the multi-image approach, stereo-vision can be used to extract 3D information from the sequence. Fig.2.2: Left: geometric primitives projected onto the image [21]. Right: a volumetric human model [1] The interest in human motions analysis can also be limited to detect moving objects in image sequences. In applications as real-time tracking, monitoring of wide-area sites or surveillance, tracking approaches based on moving objects localization and body shape or body boundaries tracking are used (fig.2.3). The moving objects can be identified in the images using background subtraction or optical flow. If also a Fig.2.3: Moving shapes tracking motion of the camera is present, a rectification of the frames must be performed in order to apply the background knowledges [12]. Moving objects in the scene are often segmented while occlusion problems can be solved using temporal analysis and trajectory prediction (Kalman filter) [17]. 5
6 3. Data acquisition Four sequences (fig.3.1, 3.2, 3.3, 3.4) have been acquired with a Sony DCR-VX700E, a Sony digital handycam that records images in digital format on a mini DV tape. The images are stored in DV format with a size of 720x576 pixel and 24 bit color resolution. The DV format is a Sony property, compressed digital and video audio recording standard. As CCD cameras are interlaced, i.e. a full frame is split into two different fields which are recorded and read-out consecutively, odd and even lines of an image are captured at different time and a saw pattern is created during the digitizing process. For this reason only the odd (even) lines of an image are used in the algorithm, reducing the resolution in vertical direction by 50 per cent. Other two sequences (fig.3.5, 3.6) have been acquired digitizing an old VHS tape. Also in this case the digitalization process creates a saw pattern in the images; therefore reduced images are used for the validation of the algorithm. Fig.3.1: Sequence of 24 frames of a walking man: the camera is rotating on a tripod Fig.3.2: Sequence of 60 frames: the camera is still and the guy is just raising his arms Two sequences acquired from VHS tape (fig.3.5, 3.6) have very low resolution because of the video-tape and the digitalization process (RAZOR software). No way was successful in the enhancement of the frames: different filters and also motion blur compensation didn t achieve good results. Therefore just a local contrast enhancement has been done. 6
7 Fig.3.5: Sequence of 100 frames: two people are walking one over the other. Their trajectories are perpendicular to the camera which is still far away from them Fig.3.6: Sequence of 50 frames of moving people walking towards the camera Fig.3.3: Sequence of 9 frames acquired from VHS tape Fig.3.4: Sequence of 10 frames from VHS tape 7
8 4. Algorithms overview In this section the implemented algorithms are described: least square matching tracker, object tracking and extraction, Shi-Tomasi-Kanade tracker, detection and tracking of moving objects. 4.1 The least square matching tracker The basic idea of this algorithm is to track a selected point through a sequence of images using least square matching (LSM). The process is based on adaptive least square method technique [9] and is similar to [4]. Assume two image regions are given as discrete two-dimensional functions f(x,y) and g(x,y) and that f(x,y) is the template in one image and g(x,y) the patch in the other image; a correspondence is established if f(x,y) = g(x,y) (4.1) Because of random effects (noise) in both images, the above equation is not consistent. Therefore, a noise vector e(x,y) is added, resulting in f(x,y) - e(x,y) = g(x,y) (4.2) The location of the function values g(x,y) must be determined in order to provide the match point. This is achieved by minimizing a goal function, which measures the distances between the grey levels in a template and in an other patch. The goal function to be minimized in this approach is the L2-norm of the residuals of least squares estimation. Eq.4.2 can be considered as a non linear observation equation which model the vector of observation f(x,y) with a function g(x,y), whose location in the other image must be estimated. The location is usually described by shift parameters which are estimated with respect to an initial position of g(x,y). In order to account for a variety of systematic image deformations and to obtain a better match, image shaping parameters (affine image shaping) and radiometric corrections can be introduced beside the shift parameters [9]. An affine transformation is often used and the pixel coordinates of the matched point are computed as x new =a 0 +a 1 x+a 2 y (4.3.1) y new =b 0 +b 1 x+b 2 y (4.3.2) where the 6 parameters of the affine transformation must be estimated from eq. (4.2) by minimizing the sum of the squares of the differences between the grey values in image patches. The function g(x,y) in eq. (4.2) is linearized with respect to the unknow parameters and the obtained linear system is iterated using a Gauss-Markov method [9]. The implemented algorithm uses two images, one as template and the other as search image. The patches in the search image are modified by the affine transformation (translations, rotation, shearing and scaling) and the corresponding point is found in the search image after some iterations. Fig.4.1 shows the result of the least squares matching: the red box is the selected patch in the template image and the green box represents the affinely transformed patch in the search image (emphasize). Fig.4.1: LSM algorithm: template image (left) and search image (right) 8
9 In [4] three sequences of images of three synchronized cameras are available: spatial correspondences between three images at the same instant t and also temporal correspondences between subsequent frames of each camera are computed and 3D trajectory can be determined. In our case the algorithm works with monocular sequences of images and only temporal correspondences can be found. The fundamental operations of the tracking process are three: 1. predict the position in the next frames; 2. search the position with the highest cross correlation value; 3. establish the point in the next frame using lest square matching. If the images have been taken at near time instants, they are strongly related to each other and the image position of two corresponding features is very similar. Therefore, for the frame at time t+1, the predicted position of a point is the same as time t (fig.4.2). Around this position a search box is defined (blue box) and scanned for searching the position which has the higher cross-correlation. This position is considered an approximation of the exact position of the point to be tracked. The LSM algorithm is then applied at that position (red cross) and the result of the matching is considered the exact position of the tracked point in the next frame. Frame at time t: in red the patch for LSM Frame at time t+1: in blue the search area for cross-correlation Fig.4.2: The cross-correlation process to find the approximation for LSM For the frame at time t+2 a linear prediction of the position of the point from the two previous frames is computed (fig.4.3). Then a search box is defined around this predicted position and the point with bigger cross-correlation is used for the LSM computation. For the next frames a linear prediction (based on the previous positions) is always computed even if a more complicated interpolation could be implemented (splines or kalman filter, especially after occlusions). As the algorithm works with monocular sequences, few automatic controls on the corresponding matched points can be performed. In order to verify the reliability of the tracked points, two post-processing verification have been implemented: 1. cross-correlation computation: it checks if the matched point is reliable between two frames. If the cross-correlation coefficient of a point in two consecutive images is smaller than a predefined threshold value, the points is rejected; 2. distance between two matched joints: this test can be performed if the camera does not zoom and is stationary or if its movements are slower than the moving objects; in these cases a distance t-1 t t+1 Fig.4.3: Linear prediction to find the approximated position of the point 9
10 can be computed, in each frame, between two points on the body that must remain at the same distance (e.g. feet-knee, wrist-shoulder). Then the difference of this distance in two consecutive frame is calculated and if the difference does not belongs to a predefined domain, the tracked point is rejected. A cross-correlation computation has been also implemented to recover lost points after occlusions. Manually the user must select the last image where the point is visible and the image where the point reappears. The process finds the new position after occlusion using a suitable window; these coordinates are considered an approximation of the point and the LSM is applied to compute the correct position. If the tracked points have been selected in correspondence of the human joints, a final animation of the tracked points can be done and the 2D trajectories can be drawn. 4.2 Object tracking A tracking process can also involve the extraction of part of objects using few tracked points. Using an images matching process [4] which establishes many correspondences in three consecutive images it is possible to extract the full body (or part of it) through the sequence. The process is based on the adaptive least square method [9] and automatically determines a dense set of corresponding points between images starting from few points sparse on the surface to extract. The template image is divided into polygonal regions according to which of the seed points is closest (Voronoi tessellation)(fig.4.4). zoom seed points matched points o seed points. matched points Fig.4.4: Search strategy for establishment of correspondences between images Starting from the seed points and using a user-defined border of the interest object, the algorithm tries to match corresponding points in three consecutive images. The central image is used as template and the other two as search images. The matcher searchs the corresponding points in the two images independently. The process starts from a selected point, shift horizontally in the template and in the search images and applies the LSM algorithm in the shifted location. If the quality of the matching is good the matched point is stored and the process continues horizontally until it reaches the region boundaries. The covering of the entire polygonal region of a seed point is achieved by sequential horizontal and vertical shifts. In monocular sequences the reliability of the matched surfaces depends only on the matching parameters; in multi-views sequences a control can be done using the computed 3D coordinates and check the wrong correspondences [5]. 10
11 To evaluate the quality of the matched points the following indicators are used: - a posteriori standard deviation of the least square adjustment; - standard deviation of the shift in x and y directions. If the quality of the matching is not satisfactory, the algorithm computes again the process changing some parameters like smaller shift from the neighbor or bigger patch size. At the end of the process, a cloud of 2D points is obtained (fig second row) even if some holes due to not analyzed area can appear in the results: the algorithm tries to close these gaps by searching from all direction around. If the holes are in areas with low texture, the matching does not find many correspondences; therefore the results can be improved by increasing the number of seed points in these areas or using neighborhood information. Fig.4.5: Triplet of successive frames and found 2D correspondences 4.3 The Shi-Tomasi-Kanade tracker In this section the Shi-Tomasi-Kanade tracker [14, 19, 23] will be briefly described. In general, any function of three variables I(x,y,t), where the space variables x and y as well the time variable t are discrete and suitably bounded, can represent the intensity of an image sequence. If the camera moves, the patterns of image intensities change in a complex way; but images taken at near time instants are usually strongly related to each other, because in general they refer to the same scene taken from only slightly different viewpoints. Consider an image sequence I(x,t), with x=[u,v] T the coordinates of an image point. If the time sampling frequency is sufficiently high, we can assume that small image regions are displaced but their intensities remain unchanged. Therefore I(x,t) is not arbitrary but satisfies: I ( x, t) = I ( δ( x), t + t) (4.4) where δ(x) is the motion field, specifying the warping that is applied to image points between time instant t and t+ t. The fast-sampling hypothesis allow us to approximate the motion with a translation, that is, δ(x)=x ± d, where d is a displacement vector. So, a later image at time t+ t can be obtained by moving every point in the current image, taken at time t, by a suitable amount d. As the image motion model is not perfect and because of image noise, equation (4.4) is not exactly satisfied and can be written as: I ( x, t) = I ( δ( x), t + t) + n( x) (4.5) where n is a noise function. 11
12 The tracker s task is to compute the displacement d for a number of selected points for each pair of successive frames in the sequence. The displacement must be computed minimizing the SSD (Sum of Square Differences) residual: ε = [ I ( x + d, t + t) I ( x, t) ] 2 W (4.6) where W is a small image window centered on the point for which d is computed. By plugging the first-order Taylor expansion of I(x+d,t+ t) into eq. (4.6) and imposing that the derivatives with respect to d are zero, we obtain the linear system Gd=e (4.7) 2 I where: G u Iu I = v (4.8.1) W 2 I u I v I v with I u I v = I = I I (4.8.2) u v and e, the error vector, is: e = T I t I u I v W (4.8.3) I with I t =. t The derivatives of the function I can be computed with finite pixel difference but there are always problems with image noise and local minima. A better solution can be achieved with a convolution of the function with special filter (Gaussian kernel). The tracker is based on eq.(4.7): given a pair of successive frames, d is the solution of (4.7) that is d=g -1 e, and is used to compute the position in the new frame. The procedure is iterated according to Newton-Rapshon scheme, until the convergence of the displacement is estimated. The translation model δ(x)=x ± d, cannot account for certain transformation of the feature window we are tracking, for instance rotation, scaling and shear. An affine motion model is more accurate [19]: x + u y + v = a 1 a 2 a 3 a 4 x y + a 5 a 5, (4.9) because two rotations, two translations, a scale in x/y and a shear are considered. 12
13 It computes δ(x) of eq.(4.4) as δ(x)=ax+d (4.10) where d is a displacement and A is a 2x2 matrix accounting for affine warping and can be written as A=I+D, with D=[d ij ] a deformation matrix and I the identity matrix. As in the translational case, the motion parameters D and d are estimated by minimizing the residuals (SSD): ε = [ I( Ax + d, t + t) I ( x, t) ] 2 (4.11) W The equation (4.11) is differentiated with respect to the unknown entries of the matrix D and the vector d and the results are set to zero. Linearizing the resulting system by Taylor expansion, we obtain the linear system: Tz=a, (4.12) where: z=[d 11 d 12 d 21 d 22 d 1 d 2 ] T (4.13.1) contains the unknown entries of the deformation matrix D and the displacement vector d; T a = I t ui (4.13.2) u ui v vi u vi v I vu I W v is the error vector that depends on the differences between the two images; T = W U V V T G (4.13.3) and U is a 4x4 matrix containing the products of the first 4 element of the vector a for each of these elements; V is a 2x4 matrix containing the product of the elements I u and I v for the first 4 elements of a; G as in equation (4.8.1). Finally equation (4.12) can be solved iteratively for the entry of z. In both cases (translational and affine model) feature selection is very important. In [19] is recommended that T (or G) is well conditioned, i.e. the ratio between the largest and the smallest eigenvalue of T (or G) should not be too big (corner selection). Once the displacement has been found and the new position of the point has been determined, a control on the new position must be done. The control is computed with a cross-correlation process: given a template window around the point in frame n and a slave window around the matched point in frame n+1, a cross-correlation coefficient ρ is computed. The corresponding feature in frame n+1 is accepted if the computed ρ is bigger than a user-defined threshold value ρ 0. 13
14 Usually the STK tracker is not used for tracking human movements in image sequences; but if the images have been taken at near time instants they are usually strongly related to each other and this (extended) tracker can give quite good results for not very long sequences of high texture images. 4.4 Detection and tracking of moving objects In applications like video-surveillance and monitoring of human activities, the main idea is to detect and track moving objects (people, vehicles, etc.) as they move through the scene. Considering one image, regions of moving objects should be separated from the static environment. To identify and separate the moving object, different approach have been proposed: background subtraction [17], 2D active shape models [18], combination of motion, skin color and face detection [8]. If the camera is stationary or its movement are very small compared to the objects, a simple subtraction of two consecutive frames can be used (fig.4.6-c). The resulting image has much larger values for the moving components of the frame than the stationary components. A moving object produces two regions having large values: 1. a front region of the object caused by covering of the background by the object; 2. a rear region of the object caused by the uncovering of the object from the background. Therefore, using a threshold of the image it is possible to detect the rear region of the moving object. The threshold value is determined by experiments. The binary thresholded image can contain some noise which can be easily removed with an erosion process or with a median filter (Fig.4.6-d). (a) (b) (c) (d) Fig.4.6: Example of image subtraction. Two frames of a sequence (a, b). Binary image after absolute images difference with noise (c): black pixel represent movements. Result after median filter (d) Once the moving objects have been localized, their bounding boxes can be computed. For this purpose vertical projections of the binary image is at first performed (fig. 4.7). The different objects in the image are often already visible from this projection. The position of the objects in the horizontal axes are determined by slicing the vertical projections. If the counted number of pixel in a slice is higher than a threshold, then the slice is identified as an area of moving activities. This is done for all the slices along the horizontal axes and finally the adjacent slices with moving activities are joined together obtaining a set of areas where moving activities have been detected (fig. 4.7). The size of the slices can be adapted to the specific conditions of the acquired images. The smaller the slices are, the better will be the precision of the detected areas, but if the 14
15 slices are too small, then different moving objects could be detected as a single moving object. The threshold for the identification of a slice as a moving area depends on the size of the slices and has to be determined by experiments. Fig.4.7: Vertical projection (left) with 2 picks representing the two men. Vertical lines (right) delimiting the moving objects Then the same process is performed with the horizontal projections of the different determined areas of the horizontal axes. The horizontal projection of a person is sometimes divided in 2 different moving areas: indeed the middle of the body is usually not moving during the walk, therefore it is not detected. Once the moving areas are detected, the square bounding boxes can be obtained. Fig.4.8: Horizontal projections of the x-axis areas (left) and computed bounding boxes (right) In case of occlusions (two people walking one towards the other), it can be difficult to divide the vertical projections into its components. To avoid this problem, the center of gravity is computed and the boxes are calculated with respect to this center. Occlusions can also be predicted, detected and handled properly by estimating the positions and velocities of the object and projecting these estimation to the image plane[14]. Once the boxes have been computed, it is possible to visualize the moving foreground regions using background subtraction. 15
16 5. Features selection for tracking human body part Regardless the methods used for tracking, not all the parts of an image contain motion information. Moreover along an edge we can only determine the motion component orthogonal to the edge and so we must take care in selecting the feature to follow in the sequence. In general, to avoid these difficulties, only regions with enough texture are used. In fact a single pixel cannot be tracked unless it has a very distinctive brightness with respect to all its neighbors. As a consequence, it is often hard or impossible to determine where the single pixel is moved in the subsequence frame, based only on local information. Because of these problems, we do not track single points but windows containing good features and sufficient texture. The point features are usually extracted by local operators, often called interest operators. The attributes are computed within a rectangular or circular window, in selected or in all directions and are usually compared to a threshold to decide whether a feature is good or not. Many feature point extractors have been proposed in the last years [6, 10, 20]. Concerning all these interest operators, some characteristics can be found: 1. they work with a predefined or arbitrary idea of how a good window looks like; 2. they assume that a good feature is independently of the tracking algorithm; 3. they often find features well trackable only in pure translation; 4. they often find features which are good only in the first frames. So the resulting features are not guarantee to be the best for the tracking algorithm all over the sequence. Therefore a feature point must be consistently and should have enough information in its neighborhood over the different frames. Concerning tracking operations, researchers have proposed to track features as corners, windows with high spatial frequency content or region where some mix of second-order derivatives is sufficiently high [19]. But for human body movements tracking, as we want to extract 2D or 3D information from the tracked points, we cannot take the features randomly all over the body as an interest operator would make, or just in correspondence of edges, but we must select precisely points (joints). We are interested in capture the movement of the human body, therefore we should select points which can define the motion. Usually points in correspondence of head, shoulders, elbows, wrists, hips, knees and ankles are selected. Once this set of points has been extracted from the image, a human skeleton can be drawn (fig.5.1). Fig.5.1: Skeleton of human body (EPFL) 16
17 6. Results After selecting some points of interest, we can apply the different algorithms to track the points. The first two part of this chapter present the results obtained with least square matching tracker and Shi-Tomasi-Kanade tracker. The results of the detection of moving object, tracking and computation of bounding boxes are presented in the third part while the tracking of an whole object and its visualization is shown in the last part. All the results are in image space: 3D coordinates will be recovered in future works. 6.1 Least square matching tracking The least square matching tracking process starts from some points selected on the image. These results consider points selected manually and in particular positions (fig.6.1.1) as we want to extract a skeleton of the human body. The algorithm using these set of coordinates, computes the corresponding points in the other frames. The parameter file used in the computation contains: - used/not used parameters for affine transformation; - max sigma 0 of the matching; - max sigma-x and sigma-y in the computation of the affine parameter a 0 and b 0 ; - max value for the affine parameters a 0 and b 0 (translation parameters); - size of the window in the template and search image for LSM; - size of the window in the search image for cross-correlation between first and second frame; - size of the window in the search image for cross-correlation in the next frames; - step for cross-correlation computation in the search image; - size of a bigger window in the search image for cross-correlation when the value of LSM is not satisfactory. A result is stored when the computed values of the three sigma and of the two translation parameters are smaller than the default ones in the parameter file. The default value for sigma 0 is 25.0 and for sigma-x and sigma-y is 0.20; usually all the 6 parameters of the affine transformation are used and the max value for a 0 and b 0 is set to 4.0. A post-processing computation checks the reliability of the matched points computing the crosscorrelation coefficient between consecutive frames. The default value is 0.75 but the threshold can be decreased for low resolution images. In the next pages some results of the LSM tracking process are shown. Fig.6.1.1: Points selected on the image 17
18 The first sequence has been acquired from a VHS tape and has very low resolution; the camera is panning following the walking man and 10 frames were available. 14 points have been selected in the first frame; at the end of the process, 10 points have been tracked all over the sequence (fig.6.1.2). The average cross-correlation coefficient of all the points is The LSM algorithm worked with a sigma 0 of 30 while sigma x and sigma y were fixed to Frame nr.1 Frame nr.5 Frame nr.9 Fig.6.1.2: Some frames of the sequence with the tracked points In the next sequence, consisting of 60 frames, 14 points have been selected in correspondence of body joints: head, neck, shoulders, elbows, wrists, hips, knees and ankles. After 10 frames the points in correspondences of the elbows were lost while all the other joints have been tracked over the all sequence (fig.6.1.3). The sigma-y was fixed to 0.30 because the guy was moving his arms in vertical direction and the images have half resolution in vertical direction as only the odd lines are used. (a) (b) (c) (d) (e) (a)frame nr.1 (b)frame nr.11 (c)frame nr. 30 (d)frame nr. 50 (e)frame nr.60 Fig.6.1.3: Points tracked in a sequence of 60 frames 18
19 Because of the presence of clothes, when the guy was moving his arms, the folds of the sweater changed, so points selected in correspondence of big movements of the folds were not matched (or not well matched). The cross-correlation coefficient between tracked points in two consecutive frames was calculated and the results are summarize in Table 1. All the 12 points tracked over the sequence had a crosscorrelation coefficient bigger than 0.9. If the camera is still and stays approximately at the same distance from the subject, another control on the tracked points can be done, computing the differences of the distances between two points with fixed distance, namely feet-knee or neck-shoulder or neck-head. Figure shows the computed differences of the distances in all the frames. There is just a big outlier (with a difference of 4 pixels) while all the other differences are in the interval [-2.4, +2.2] pixels, that is an average error of one pixel for every matched point. The big outlier can be due to the folds of the sweater on the wrist, as said before. Table 1: Average of cross-correlation coefficient Pt1: wrist left Pt2: shoulder left Pt3: neck Pt4: head Pt5: shoulder right Pt6: wrist left Pt7 hip left Pt8 hip right Pt9 knee lefs Pt10 knee rights Pt11 ankle left Pt12: ankle right Distances Distance head neck Distance feet knee left Distance feet knee right Distance wrist shoulder left Distance wrist shoulder right 3 2 Distances [pixel] Frames Fig.6.1.4: Differences of the distances between some joints all over the sequence of 60 frames Once the 2D coordinates of the joints are computed, it is possible to build (by now only in 2D) a skeleton of the human body and represent the stylized person in all the sequence. An animation has been created and a visualization is shown in fig and with cylindric reconstruction of the human body parts. 19
20 Frames Y Fig.6.1.5: Visualization of the computed 2D skeleton to the human body X Frames Y X Fig.6.1.6: Cylindric reconstruction of the human skeleton from 2D points computed with the LSM tracking 20
21 Another sequence is presented in fig (a) (b) (c) (d) (e) (a) frame nr.2 (b) frame nr.6 (c) frame nr.12 (d) frame nr.15 (e) frame nr.22 Fig.6.1.7: Tracked frames with occlusion of some points This sequence is composed of 24 frames. 13 points have been selected in correspondence of joints. A point on the left wrist was lost quite immediately because of occlusion after 9 frames; also the points on the left leg were lost due to occlusion. When occlusions occur, a point can be wrongly matched and from the analysis of the cross-correlation results, is possible to remove the outlier and to track again this point after the occlusion. The points on the leg have been recovered after occlusion using a cross-correlation process(fig.6.1.9). A template around the point in the last image where is visible is used; the search area is acquired in the image where the point reappear (the user must select both images). The point is found in correspondence of the center of the window with bigger cross-correlation coefficient. Then the LSM algorithm can track the recovered points in the other frames (fig.6.1.8). Fig.6.1.8: Some frames of the sequence with recovered points after occlusions 21
22 Fig.6.1.9: Cross-correlation procedure to recover a point lost because of occlusion The mean cross-correlation coefficient of the points in all the sequence is 0.88 and the differences of the distances between joints are in the interval [-2.5,+2.5]. The graph of the differences of the distances is shown in fig Fig : Differences of the distances between some joints 22
23 A final visualization of the sequence with reconstructed human skeleton is show in fig Y Frames X Fig : Visualization (every 3 frames) of the skeleton built with the tracked points The last sequence has been acquired from a VHS tape; the camera was moving following the running man and 9 frames were used. 12 points have been selected in the first frame and 7 points have been tracked in all the sequence. The LSM sigma 0 was equal 30 while the cross-correlation coefficient had an average of In fig are presented some frames of the sequence with overlapped the stylized skeleton. Frame nr.1 Frame nr.5 Frame nr.9 Fig : A low resolution sequence of 9 frames: the camera is moving following the running man. 7 points have been tracked in all the frames. 23
24 6.2 Shi-Tomasi-Kanade tracker The core of the STK algorithm was already available on the web; A GUI to select and visualize the tracked points and a routine to compute the process for a whole sequence have been added. Given two consecutive frames I(x,t) and J(x,t+1), the principal steps of the program are: - compute matrix T (or G) and a (or e) of eq. (4.12): the image gradient in both windows (for fast converges) are computed with a gaussian kernel; - compute translation d (in the first few iterations) and affine parameters (in the last iterations) such that SSD difference of I(Ax+d) - J(x) is minimized (equation 4.11); - re-warp J with sub-pixel 2D bilinear interpolation using the computed affine motion; - check the SSD error. For every point, the algorithm computes n iterations and selects the affine motion parameters with smaller SSD error. The algorithm is very time consuming. In the first sequence of 24 frames, all the points were tracked (recovering those lost for occlusions with the cross-correlation process previously described). The results are shown in fig Frame nr.1 Frame nr.9 Frame nr.15 Frame nr.21 Fig.6.2.1: Four frames of the sequence. In red the points tracked between consecutive frames, in yellow the reconstructed human skeleton. 24
25 The cross-correlation coefficient ρ between consecutive frames has an average of The mean SSD (Sum of Square Differences) error in all the sequence was equal to while the differences of the distances between selected joints is in the interval [-3,+2.3] pixels (except a big outlier of 4 pixels) (see fig.6.2.2). Fig.6.2.2: Graph with the computed differences of the distances between selected joints. Only one big outlier is present while the other values belong to the interval [-3,+2.3] pixel In the successive sequence, 30 frames have been used to validate the algorithm. In the first frame (fig a) 14 points have been selected in correspondence of human joints; in the last frames (fig d,e) 10 points were still tracked while the others were lost due to small cross-correlation coefficient and big SSD. (a) (b) (c) (d) (e) (a) frame nr.1 (b) frame nr.5 (c) frames nr.9 (d) frame nr.20 (e) frame nr.30 Fig.6.2.3: Some frames of the sequence with the points tracked with STK algorithm. In yellow the reconstructed human skeleton 25
26 From the results shown in fig.6.2.3, we can see that the point on the lower left border of the sweater seems to be not correct in the last frames; but it is well visible a movement of the sweater which follows the lifting of the arms. Nevertheless the cross correlation coefficient of that point through the sequence is With the sequences acquired from VHS tape, the STK algorithm didn t give very good results; the selected points were tracked just for 2-3 frames with reliable precision and then were lost or mismatched. The STK algorithm needs very good features (in particular in case of movements) and very good texture around the point that must be tracked. 26
27 6.3 Detection and tracking of moving objects The detection and tracking of moving objects has been tested on two sequences where two people were walking. The program can work with a sequence of n-frames and gives as output the images with the different moving objects in colored boxes. The first sequence (100 frames) shows motions that are roughly on a linear path. Trajectories are linear and parallel to the camera plane and there are occlusions as the two men are directly one over the other. In the results (fig.6.3.1), there are two color-coded boxes, one for each tracked object. Frame nr.9 Frame nr.45 Frame nr.47 Frame nr.71 Fig.6.3.1: Results of moving people detection: tracking before, during and after occlusions 27
28 In fig.6.3.1, the first column show the projections of the pixels along the vertical and horizontal axes; it is easy to divide the vertical projections into its components when there are no occlusions, but when they occur, can be difficult to distinguish the two parts (picks) of the projections. To avoid this problem, the center of gravity of the projections is computed and is used to assign the bounding to the correct object. In the middle column of fig the computed bounding boxes projected on the image differences are shown while the last column presents the projections of the boxes on the original image. Occlusions are visible in the second and third row: the bounding boxes are not very precise because there is overlap between the vertical projections and the limits of the boxes are based only on these projections. More sophisticated computation as temporal analysis or trajectory prediction can be implemented. The moving foreground regions can be visualized with background subtraction. This part of the process is not automatic but could be so if a model of the empty scene is available [17]: once the bounding boxes have been computed, an image where the area inside the boxes is just background is selected. Then a subtraction between the two windows is performed and the moving foreground can be reconstructed with few processes of erosion and dilation (fig.6.3.2). Frame nr.9 Frame nr.45 Frame nr.71 Fig.6.3.2: Foreground moving regions detected by background subtraction In the second sequence (fig.6.3.3), 50 frames were available; two people were walking towards the stationary camera and their trajectories were not perpendicular to the camera. 28
29 Frame nr.5 Frame nr.25 Frame nr.45 Fig.6.3.3: Bounding boxes of two moving people walking towards the camera The computed bounding boxes depend on the sliced projections and the size of the slices can be adapted to the specific conditions of the frames. The projections depend on the image differences, therefore in some frames, small movements of the humans (i.e. feet) are not included in the boxes. Frame nr.45 Fig.6.3.4: Foreground regions detected by background subtraction 29
30 6.4. Object tracking To complete the tracking procedure, once few points of an object have been tracked all over a sequence, it is possible to extract and visualize the whole moving object by establishing many correspondences in some images starting from few tracked points. A clouds of 2D points is obtained and visualized displaying the matched grey value of the image. Using the sequence of fig.3.1, groups of three frames have been created and the middle frame has been used as template image. The seed points have been tracked with LSM tracker (fig upper row) in the all sequence and then used to establish the correspondences. A cloud of points was computed in every frame and then projected onto the image (fig second row). In fig all the matched grey value of the triplet are displayed. Fig.6.4.1: Computed correspondences in a triplet of images Fig.6.4.2: Object extraction from the computed 2D correspondences in a triplet of images. 30
31 The central image of fig is the template image: the number of matched correspondences is bigger than in the search images where much more holes are present due to not analyzed areas. The gaps can be due to poor texture, low contrast of the area, or wrong matching. In figure 6.4.3, some b/w and color results of the sequence are shown. Frame nr.2 Frame nr.14 Frame nr.23 Fig.6.4.3: Central frames of the triplets: visualized matched points representing the computed 2D correspondences extracted from b/w images (upper rows). Tracked object in color images (bottom row). 31
32 A problem encountered in object tracking is the texture: even if the image has high resolution, the matching process does not work with low texture giving big holes located in those regions where there is uniform texture of the subject (central part of the trousers or of the sweater). Some indicators that evaluate the quality of the results are shown in table 2 as average of all the sequence: Table 2: Some indicators to evaluate the quality of the process mean sigma std. dev mean sigma x std. dev mean sigma y std. dev In the second sequence, consisting of 10 frames, the process worked quite well but many gaps in the results occurred (fig.6.4.4). The seed points used for the measurement have been computed using the LSM tracker; there were 18 points in the first triplet but in the successive frames the number decreased because of not corrected matching or occlusions; therefore in the next triplets some seed points have been added. The holes in the tracked object are bigger than other sequences because of the low resolution of the images (fig.6.4.5). Fig.6.4.4: A triplet of the sequence: in the middle column the template image with the matched grey value, at the border the correspondences found in the search images. In table 3 the indicators of the process are presented. Table 3: Some indicators to evaluate the quality of the matching: an average of all the 9 frames mean sigma std. dev mean sigma x std. dev mean sigma y std. dev
33 Fig.6.4.5: Central template of the next three triplets of the sequence: big lacks of texture on the tracked object are visible because of few seed points and not found correspondences In fig and other triplets are shown. In this sequence the tracked model was moving only his arms: in the first experiments, only 14 points were selected as seed points. Fig.6.4.6: Triplets of a sequence: template image with 16 seed points (upper row). Central template image and search images at the border with the computed 2D clouds of correspondences (lower row) But big holes occurred because of not matched points (high sigma 0 ) in regions of uniform texture. It was necessary to add other two points on the torso of the man in order to extract the whole body (fig.6.4.6, 6.4-7). Also here the matching algorithm failed in regions with low contrast or homoge- 33
34 neous texture (fig.6.4.7, 6.4.8), as homologous points can not be assigned reliably or corresponding points can not be found at all in the images.. Fig.6.4.7: Object extraction from the triplet of images. In order: first search image (frame t-1 ), template image (frame t ), second search image (frame t+1 ) Fig.6.4.8: Central template image of the next three triplet with found correspondences. 34
35 7. Conclusions An overview of some methods for human movements detection and tracking in image space has been presented. Two algorithms that track points in image sequences have been used; the first is based on classic photogrammetric least square matching, the other one is based on a model of affine image changes proposed by Shi, Tomasi and Kanade and available on the net. Both algorithms have been tested on different sequences and the best results came from the LSM tracking. This algorithm can work with longer sequences, with bigger precision and is more reliable than the other one; moreover the LSM tracker can work also with low texture images and, if no occlusions occurs, no big outliers are present. On the other hand, the STK algorithm needs very good texture around the points to track and an efficient outlier rejection scheme too; indeed this algorithm is a very good tracker for indoor sequences full of features (corners) with high texture, but is very time consuming. The object tracking algorithm produced nice results when the images have good and not uniform texture and the seed points were well spread on the object to measure; in low resolution images many holes occurred in the results. It can be considered as a process for object extraction based on tracked points and image matching. The detection algorithm is an automatic process to determine the bounding boxes of moving people in a sequence of frames; it is a very simple implementation but could work with long sequences and avoid problems of occlusions. The precision of the boxes depended on the projections of the pixels and their slices; therefore was very important the choice of the threshold value to compute the image difference. 8. Future works 1. The LSM tracker must be improved in the outliers rejection. The cross-correlation process should be integrated in the main algorithm to reject the mismatched points in real time and not in post-processing. 2. A more accurate and refined process to detect and track objects in case of occlusions should be added. Occlusion can be predicted and avoided with sophisticated algorithms while foreground extraction can be performed with a better background subtraction technique. 3. A camera model could be defined to reconstruct the 3D world from the image coordinate extracted with the tracking process. 4. The object tracking algorithm can be improved adding neighborhoods information in the matching process to close the gaps occurred in the results. 35
Motion Estimation. There are three main types (or applications) of motion estimation:
Members: D91922016 朱威達 R93922010 林聖凱 R93922044 謝俊瑋 Motion Estimation There are three main types (or applications) of motion estimation: Parametric motion (image alignment) The main idea of parametric motion
More informationFeature Tracking and Optical Flow
Feature Tracking and Optical Flow Prof. D. Stricker Doz. G. Bleser Many slides adapted from James Hays, Derek Hoeim, Lana Lazebnik, Silvio Saverse, who 1 in turn adapted slides from Steve Seitz, Rick Szeliski,
More informationImage processing and features
Image processing and features Gabriele Bleser gabriele.bleser@dfki.de Thanks to Harald Wuest, Folker Wientapper and Marc Pollefeys Introduction Previous lectures: geometry Pose estimation Epipolar geometry
More informationFeature Tracking and Optical Flow
Feature Tracking and Optical Flow Prof. D. Stricker Doz. G. Bleser Many slides adapted from James Hays, Derek Hoeim, Lana Lazebnik, Silvio Saverse, who in turn adapted slides from Steve Seitz, Rick Szeliski,
More informationCS201: Computer Vision Introduction to Tracking
CS201: Computer Vision Introduction to Tracking John Magee 18 November 2014 Slides courtesy of: Diane H. Theriault Question of the Day How can we represent and use motion in images? 1 What is Motion? Change
More informationChapter 9 Object Tracking an Overview
Chapter 9 Object Tracking an Overview The output of the background subtraction algorithm, described in the previous chapter, is a classification (segmentation) of pixels into foreground pixels (those belonging
More informationProf. Fanny Ficuciello Robotics for Bioengineering Visual Servoing
Visual servoing vision allows a robotic system to obtain geometrical and qualitative information on the surrounding environment high level control motion planning (look-and-move visual grasping) low level
More informationAugmented Reality VU. Computer Vision 3D Registration (2) Prof. Vincent Lepetit
Augmented Reality VU Computer Vision 3D Registration (2) Prof. Vincent Lepetit Feature Point-Based 3D Tracking Feature Points for 3D Tracking Much less ambiguous than edges; Point-to-point reprojection
More informationChapter 3 Image Registration. Chapter 3 Image Registration
Chapter 3 Image Registration Distributed Algorithms for Introduction (1) Definition: Image Registration Input: 2 images of the same scene but taken from different perspectives Goal: Identify transformation
More informationEE795: Computer Vision and Intelligent Systems
EE795: Computer Vision and Intelligent Systems Spring 2012 TTh 17:30-18:45 FDH 204 Lecture 14 130307 http://www.ee.unlv.edu/~b1morris/ecg795/ 2 Outline Review Stereo Dense Motion Estimation Translational
More informationAccurate 3D Face and Body Modeling from a Single Fixed Kinect
Accurate 3D Face and Body Modeling from a Single Fixed Kinect Ruizhe Wang*, Matthias Hernandez*, Jongmoo Choi, Gérard Medioni Computer Vision Lab, IRIS University of Southern California Abstract In this
More informationPERFORMANCE CAPTURE FROM SPARSE MULTI-VIEW VIDEO
Stefan Krauß, Juliane Hüttl SE, SoSe 2011, HU-Berlin PERFORMANCE CAPTURE FROM SPARSE MULTI-VIEW VIDEO 1 Uses of Motion/Performance Capture movies games, virtual environments biomechanics, sports science,
More informationRange Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation
Obviously, this is a very slow process and not suitable for dynamic scenes. To speed things up, we can use a laser that projects a vertical line of light onto the scene. This laser rotates around its vertical
More informationMotion Tracking and Event Understanding in Video Sequences
Motion Tracking and Event Understanding in Video Sequences Isaac Cohen Elaine Kang, Jinman Kang Institute for Robotics and Intelligent Systems University of Southern California Los Angeles, CA Objectives!
More informationLecture 20: Tracking. Tuesday, Nov 27
Lecture 20: Tracking Tuesday, Nov 27 Paper reviews Thorough summary in your own words Main contribution Strengths? Weaknesses? How convincing are the experiments? Suggestions to improve them? Extensions?
More informationRobert Collins CSE598G. Intro to Template Matching and the Lucas-Kanade Method
Intro to Template Matching and the Lucas-Kanade Method Appearance-Based Tracking current frame + previous location likelihood over object location current location appearance model (e.g. image template,
More informationFactorization with Missing and Noisy Data
Factorization with Missing and Noisy Data Carme Julià, Angel Sappa, Felipe Lumbreras, Joan Serrat, and Antonio López Computer Vision Center and Computer Science Department, Universitat Autònoma de Barcelona,
More informationVisual Tracking. Image Processing Laboratory Dipartimento di Matematica e Informatica Università degli studi di Catania.
Image Processing Laboratory Dipartimento di Matematica e Informatica Università degli studi di Catania 1 What is visual tracking? estimation of the target location over time 2 applications Six main areas:
More informationMotion. 1 Introduction. 2 Optical Flow. Sohaib A Khan. 2.1 Brightness Constancy Equation
Motion Sohaib A Khan 1 Introduction So far, we have dealing with single images of a static scene taken by a fixed camera. Here we will deal with sequence of images taken at different time intervals. Motion
More informationCapturing, Modeling, Rendering 3D Structures
Computer Vision Approach Capturing, Modeling, Rendering 3D Structures Calculate pixel correspondences and extract geometry Not robust Difficult to acquire illumination effects, e.g. specular highlights
More informationBSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy
BSB663 Image Processing Pinar Duygulu Slides are adapted from Selim Aksoy Image matching Image matching is a fundamental aspect of many problems in computer vision. Object or scene recognition Solving
More informationMarcel Worring Intelligent Sensory Information Systems
Marcel Worring worring@science.uva.nl Intelligent Sensory Information Systems University of Amsterdam Information and Communication Technology archives of documentaries, film, or training material, video
More informationVisual Tracking. Antonino Furnari. Image Processing Lab Dipartimento di Matematica e Informatica Università degli Studi di Catania
Visual Tracking Antonino Furnari Image Processing Lab Dipartimento di Matematica e Informatica Università degli Studi di Catania furnari@dmi.unict.it 11 giugno 2015 What is visual tracking? estimation
More informationCHAPTER 5 MOTION DETECTION AND ANALYSIS
CHAPTER 5 MOTION DETECTION AND ANALYSIS 5.1. Introduction: Motion processing is gaining an intense attention from the researchers with the progress in motion studies and processing competence. A series
More informationPeripheral drift illusion
Peripheral drift illusion Does it work on other animals? Computer Vision Motion and Optical Flow Many slides adapted from J. Hays, S. Seitz, R. Szeliski, M. Pollefeys, K. Grauman and others Video A video
More informationELEC Dr Reji Mathew Electrical Engineering UNSW
ELEC 4622 Dr Reji Mathew Electrical Engineering UNSW Review of Motion Modelling and Estimation Introduction to Motion Modelling & Estimation Forward Motion Backward Motion Block Motion Estimation Motion
More informationMatching. Compare region of image to region of image. Today, simplest kind of matching. Intensities similar.
Matching Compare region of image to region of image. We talked about this for stereo. Important for motion. Epipolar constraint unknown. But motion small. Recognition Find object in image. Recognize object.
More informationDense Image-based Motion Estimation Algorithms & Optical Flow
Dense mage-based Motion Estimation Algorithms & Optical Flow Video A video is a sequence of frames captured at different times The video data is a function of v time (t) v space (x,y) ntroduction to motion
More informationLecture 16: Computer Vision
CS4442/9542b: Artificial Intelligence II Prof. Olga Veksler Lecture 16: Computer Vision Motion Slides are from Steve Seitz (UW), David Jacobs (UMD) Outline Motion Estimation Motion Field Optical Flow Field
More informationLecture 16: Computer Vision
CS442/542b: Artificial ntelligence Prof. Olga Veksler Lecture 16: Computer Vision Motion Slides are from Steve Seitz (UW), David Jacobs (UMD) Outline Motion Estimation Motion Field Optical Flow Field Methods
More informationStereo Vision. MAN-522 Computer Vision
Stereo Vision MAN-522 Computer Vision What is the goal of stereo vision? The recovery of the 3D structure of a scene using two or more images of the 3D scene, each acquired from a different viewpoint in
More informationDetecting and Identifying Moving Objects in Real-Time
Chapter 9 Detecting and Identifying Moving Objects in Real-Time For surveillance applications or for human-computer interaction, the automated real-time tracking of moving objects in images from a stationary
More informationFace Tracking : An implementation of the Kanade-Lucas-Tomasi Tracking algorithm
Face Tracking : An implementation of the Kanade-Lucas-Tomasi Tracking algorithm Dirk W. Wagener, Ben Herbst Department of Applied Mathematics, University of Stellenbosch, Private Bag X1, Matieland 762,
More informationRobot vision review. Martin Jagersand
Robot vision review Martin Jagersand What is Computer Vision? Computer Graphics Three Related fields Image Processing: Changes 2D images into other 2D images Computer Graphics: Takes 3D models, renders
More information(Refer Slide Time 00:17) Welcome to the course on Digital Image Processing. (Refer Slide Time 00:22)
Digital Image Processing Prof. P. K. Biswas Department of Electronics and Electrical Communications Engineering Indian Institute of Technology, Kharagpur Module Number 01 Lecture Number 02 Application
More informationC E N T E R A T H O U S T O N S C H O O L of H E A L T H I N F O R M A T I O N S C I E N C E S. Image Operations II
T H E U N I V E R S I T Y of T E X A S H E A L T H S C I E N C E C E N T E R A T H O U S T O N S C H O O L of H E A L T H I N F O R M A T I O N S C I E N C E S Image Operations II For students of HI 5323
More informationCS 4495 Computer Vision Motion and Optic Flow
CS 4495 Computer Vision Aaron Bobick School of Interactive Computing Administrivia PS4 is out, due Sunday Oct 27 th. All relevant lectures posted Details about Problem Set: You may *not* use built in Harris
More informationSegmentation and Tracking of Partial Planar Templates
Segmentation and Tracking of Partial Planar Templates Abdelsalam Masoud William Hoff Colorado School of Mines Colorado School of Mines Golden, CO 800 Golden, CO 800 amasoud@mines.edu whoff@mines.edu Abstract
More informationComparison between Motion Analysis and Stereo
MOTION ESTIMATION The slides are from several sources through James Hays (Brown); Silvio Savarese (U. of Michigan); Octavia Camps (Northeastern); including their own slides. Comparison between Motion Analysis
More informationMotion Analysis. Motion analysis. Now we will talk about. Differential Motion Analysis. Motion analysis. Difference Pictures
Now we will talk about Motion Analysis Motion analysis Motion analysis is dealing with three main groups of motionrelated problems: Motion detection Moving object detection and location. Derivation of
More informationMotion and Tracking. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE)
Motion and Tracking Andrea Torsello DAIS Università Ca Foscari via Torino 155, 30172 Mestre (VE) Motion Segmentation Segment the video into multiple coherently moving objects Motion and Perceptual Organization
More informationEXAM SOLUTIONS. Image Processing and Computer Vision Course 2D1421 Monday, 13 th of March 2006,
School of Computer Science and Communication, KTH Danica Kragic EXAM SOLUTIONS Image Processing and Computer Vision Course 2D1421 Monday, 13 th of March 2006, 14.00 19.00 Grade table 0-25 U 26-35 3 36-45
More informationDisplacement estimation
Displacement estimation Displacement estimation by block matching" l Search strategies" l Subpixel estimation" Gradient-based displacement estimation ( optical flow )" l Lukas-Kanade" l Multi-scale coarse-to-fine"
More informationOutdoor Scene Reconstruction from Multiple Image Sequences Captured by a Hand-held Video Camera
Outdoor Scene Reconstruction from Multiple Image Sequences Captured by a Hand-held Video Camera Tomokazu Sato, Masayuki Kanbara and Naokazu Yokoya Graduate School of Information Science, Nara Institute
More informationVisual Tracking (1) Feature Point Tracking and Block Matching
Intelligent Control Systems Visual Tracking (1) Feature Point Tracking and Block Matching Shingo Kagami Graduate School of Information Sciences, Tohoku University swk(at)ic.is.tohoku.ac.jp http://www.ic.is.tohoku.ac.jp/ja/swk/
More informationEstimating Human Pose in Images. Navraj Singh December 11, 2009
Estimating Human Pose in Images Navraj Singh December 11, 2009 Introduction This project attempts to improve the performance of an existing method of estimating the pose of humans in still images. Tasks
More informationProblem definition Image acquisition Image segmentation Connected component analysis. Machine vision systems - 1
Machine vision systems Problem definition Image acquisition Image segmentation Connected component analysis Machine vision systems - 1 Problem definition Design a vision system to see a flat world Page
More informationUNIT-2 IMAGE REPRESENTATION IMAGE REPRESENTATION IMAGE SENSORS IMAGE SENSORS- FLEX CIRCUIT ASSEMBLY
18-08-2016 UNIT-2 In the following slides we will consider what is involved in capturing a digital image of a real-world scene Image sensing and representation Image Acquisition Sampling and quantisation
More informationEdge and local feature detection - 2. Importance of edge detection in computer vision
Edge and local feature detection Gradient based edge detection Edge detection by function fitting Second derivative edge detectors Edge linking and the construction of the chain graph Edge and local feature
More informationCS4442/9542b Artificial Intelligence II prof. Olga Veksler
CS4442/9542b Artificial Intelligence II prof. Olga Veksler Lecture 8 Computer Vision Introduction, Filtering Some slides from: D. Jacobs, D. Lowe, S. Seitz, A.Efros, X. Li, R. Fergus, J. Hayes, S. Lazebnik,
More informationInterpolation is a basic tool used extensively in tasks such as zooming, shrinking, rotating, and geometric corrections.
Image Interpolation 48 Interpolation is a basic tool used extensively in tasks such as zooming, shrinking, rotating, and geometric corrections. Fundamentally, interpolation is the process of using known
More informationCS4442/9542b Artificial Intelligence II prof. Olga Veksler
CS4442/9542b Artificial Intelligence II prof. Olga Veksler Lecture 2 Computer Vision Introduction, Filtering Some slides from: D. Jacobs, D. Lowe, S. Seitz, A.Efros, X. Li, R. Fergus, J. Hayes, S. Lazebnik,
More informationAutonomous Navigation for Flying Robots
Computer Vision Group Prof. Daniel Cremers Autonomous Navigation for Flying Robots Lecture 7.1: 2D Motion Estimation in Images Jürgen Sturm Technische Universität München 3D to 2D Perspective Projections
More informationMotion Estimation for Video Coding Standards
Motion Estimation for Video Coding Standards Prof. Ja-Ling Wu Department of Computer Science and Information Engineering National Taiwan University Introduction of Motion Estimation The goal of video compression
More informationComputer Vision 2. SS 18 Dr. Benjamin Guthier Professur für Bildverarbeitung. Computer Vision 2 Dr. Benjamin Guthier
Computer Vision 2 SS 18 Dr. Benjamin Guthier Professur für Bildverarbeitung Computer Vision 2 Dr. Benjamin Guthier 1. IMAGE PROCESSING Computer Vision 2 Dr. Benjamin Guthier Content of this Chapter Non-linear
More informationEE795: Computer Vision and Intelligent Systems
EE795: Computer Vision and Intelligent Systems Spring 2012 TTh 17:30-18:45 FDH 204 Lecture 11 140311 http://www.ee.unlv.edu/~b1morris/ecg795/ 2 Outline Motion Analysis Motivation Differential Motion Optical
More informationColour Segmentation-based Computation of Dense Optical Flow with Application to Video Object Segmentation
ÖGAI Journal 24/1 11 Colour Segmentation-based Computation of Dense Optical Flow with Application to Video Object Segmentation Michael Bleyer, Margrit Gelautz, Christoph Rhemann Vienna University of Technology
More informationSUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS
SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS Cognitive Robotics Original: David G. Lowe, 004 Summary: Coen van Leeuwen, s1460919 Abstract: This article presents a method to extract
More informationLow Cost Motion Capture
Low Cost Motion Capture R. Budiman M. Bennamoun D.Q. Huynh School of Computer Science and Software Engineering The University of Western Australia Crawley WA 6009 AUSTRALIA Email: budimr01@tartarus.uwa.edu.au,
More information1 (5 max) 2 (10 max) 3 (20 max) 4 (30 max) 5 (10 max) 6 (15 extra max) total (75 max + 15 extra)
Mierm Exam CS223b Stanford CS223b Computer Vision, Winter 2004 Feb. 18, 2004 Full Name: Email: This exam has 7 pages. Make sure your exam is not missing any sheets, and write your name on every page. The
More informationDynamic Time Warping for Binocular Hand Tracking and Reconstruction
Dynamic Time Warping for Binocular Hand Tracking and Reconstruction Javier Romero, Danica Kragic Ville Kyrki Antonis Argyros CAS-CVAP-CSC Dept. of Information Technology Institute of Computer Science KTH,
More informationComputer Vision I. Announcements. Fourier Tansform. Efficient Implementation. Edge and Corner Detection. CSE252A Lecture 13.
Announcements Edge and Corner Detection HW3 assigned CSE252A Lecture 13 Efficient Implementation Both, the Box filter and the Gaussian filter are separable: First convolve each row of input image I with
More informationVisual Tracking (1) Tracking of Feature Points and Planar Rigid Objects
Intelligent Control Systems Visual Tracking (1) Tracking of Feature Points and Planar Rigid Objects Shingo Kagami Graduate School of Information Sciences, Tohoku University swk(at)ic.is.tohoku.ac.jp http://www.ic.is.tohoku.ac.jp/ja/swk/
More informationAnno accademico 2006/2007. Davide Migliore
Robotica Anno accademico 6/7 Davide Migliore migliore@elet.polimi.it Today What is a feature? Some useful information The world of features: Detectors Edges detection Corners/Points detection Descriptors?!?!?
More informationStructure from Motion. Prof. Marco Marcon
Structure from Motion Prof. Marco Marcon Summing-up 2 Stereo is the most powerful clue for determining the structure of a scene Another important clue is the relative motion between the scene and (mono)
More informationMulti-stable Perception. Necker Cube
Multi-stable Perception Necker Cube Spinning dancer illusion, Nobuyuki Kayahara Multiple view geometry Stereo vision Epipolar geometry Lowe Hartley and Zisserman Depth map extraction Essential matrix
More informationRuch (Motion) Rozpoznawanie Obrazów Krzysztof Krawiec Instytut Informatyki, Politechnika Poznańska. Krzysztof Krawiec IDSS
Ruch (Motion) Rozpoznawanie Obrazów Krzysztof Krawiec Instytut Informatyki, Politechnika Poznańska 1 Krzysztof Krawiec IDSS 2 The importance of visual motion Adds entirely new (temporal) dimension to visual
More informationMassachusetts Institute of Technology Department of Computer Science and Electrical Engineering 6.801/6.866 Machine Vision QUIZ II
Massachusetts Institute of Technology Department of Computer Science and Electrical Engineering 6.801/6.866 Machine Vision QUIZ II Handed out: 001 Nov. 30th Due on: 001 Dec. 10th Problem 1: (a (b Interior
More informationLocal Feature Detectors
Local Feature Detectors Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr Slides adapted from Cordelia Schmid and David Lowe, CVPR 2003 Tutorial, Matthew Brown,
More informationBiometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong)
Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong) References: [1] http://homepages.inf.ed.ac.uk/rbf/hipr2/index.htm [2] http://www.cs.wisc.edu/~dyer/cs540/notes/vision.html
More informationCS664 Lecture #18: Motion
CS664 Lecture #18: Motion Announcements Most paper choices were fine Please be sure to email me for approval, if you haven t already This is intended to help you, especially with the final project Use
More informationA Vision System for Automatic State Determination of Grid Based Board Games
A Vision System for Automatic State Determination of Grid Based Board Games Michael Bryson Computer Science and Engineering, University of South Carolina, 29208 Abstract. Numerous programs have been written
More informationEE795: Computer Vision and Intelligent Systems
EE795: Computer Vision and Intelligent Systems Spring 2012 TTh 17:30-18:45 WRI C225 Lecture 04 130131 http://www.ee.unlv.edu/~b1morris/ecg795/ 2 Outline Review Histogram Equalization Image Filtering Linear
More informationComputer Vision Lecture 20
Computer Perceptual Vision and Sensory WS 16/17 Augmented Computing Computer Perceptual Vision and Sensory WS 16/17 Augmented Computing Computer Perceptual Vision and Sensory WS 16/17 Augmented Computing
More informationProduct information. Hi-Tech Electronics Pte Ltd
Product information Introduction TEMA Motion is the world leading software for advanced motion analysis. Starting with digital image sequences the operator uses TEMA Motion to track objects in images,
More informationComputer Vision Lecture 20
Computer Perceptual Vision and Sensory WS 16/76 Augmented Computing Many slides adapted from K. Grauman, S. Seitz, R. Szeliski, M. Pollefeys, S. Lazebnik Computer Vision Lecture 20 Motion and Optical Flow
More informationTowards the completion of assignment 1
Towards the completion of assignment 1 What to do for calibration What to do for point matching What to do for tracking What to do for GUI COMPSCI 773 Feature Point Detection Why study feature point detection?
More informationImage Processing Fundamentals. Nicolas Vazquez Principal Software Engineer National Instruments
Image Processing Fundamentals Nicolas Vazquez Principal Software Engineer National Instruments Agenda Objectives and Motivations Enhancing Images Checking for Presence Locating Parts Measuring Features
More informationOptical flow and tracking
EECS 442 Computer vision Optical flow and tracking Intro Optical flow and feature tracking Lucas-Kanade algorithm Motion segmentation Segments of this lectures are courtesy of Profs S. Lazebnik S. Seitz,
More informationCS4733 Class Notes, Computer Vision
CS4733 Class Notes, Computer Vision Sources for online computer vision tutorials and demos - http://www.dai.ed.ac.uk/hipr and Computer Vision resources online - http://www.dai.ed.ac.uk/cvonline Vision
More informationCOMPARATIVE STUDY OF DIFFERENT APPROACHES FOR EFFICIENT RECTIFICATION UNDER GENERAL MOTION
COMPARATIVE STUDY OF DIFFERENT APPROACHES FOR EFFICIENT RECTIFICATION UNDER GENERAL MOTION Mr.V.SRINIVASA RAO 1 Prof.A.SATYA KALYAN 2 DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING PRASAD V POTLURI SIDDHARTHA
More informationIntroduction to behavior-recognition and object tracking
Introduction to behavior-recognition and object tracking Xuan Mo ipal Group Meeting April 22, 2011 Outline Motivation of Behavior-recognition Four general groups of behaviors Core technologies Future direction
More informationAutomatic Generation of Animatable 3D Personalized Model Based on Multi-view Images
Automatic Generation of Animatable 3D Personalized Model Based on Multi-view Images Seong-Jae Lim, Ho-Won Kim, Jin Sung Choi CG Team, Contents Division ETRI Daejeon, South Korea sjlim@etri.re.kr Bon-Ki
More informationVisual motion. Many slides adapted from S. Seitz, R. Szeliski, M. Pollefeys
Visual motion Man slides adapted from S. Seitz, R. Szeliski, M. Pollefes Motion and perceptual organization Sometimes, motion is the onl cue Motion and perceptual organization Sometimes, motion is the
More informationComputer Vision I - Filtering and Feature detection
Computer Vision I - Filtering and Feature detection Carsten Rother 30/10/2015 Computer Vision I: Basics of Image Processing Roadmap: Basics of Digital Image Processing Computer Vision I: Basics of Image
More informationSIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014
SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014 SIFT SIFT: Scale Invariant Feature Transform; transform image
More informationModel-Based Human Motion Capture from Monocular Video Sequences
Model-Based Human Motion Capture from Monocular Video Sequences Jihun Park 1, Sangho Park 2, and J.K. Aggarwal 2 1 Department of Computer Engineering Hongik University Seoul, Korea jhpark@hongik.ac.kr
More informationIntroduction to Medical Imaging (5XSA0) Module 5
Introduction to Medical Imaging (5XSA0) Module 5 Segmentation Jungong Han, Dirk Farin, Sveta Zinger ( s.zinger@tue.nl ) 1 Outline Introduction Color Segmentation region-growing region-merging watershed
More informationFiltering Images. Contents
Image Processing and Data Visualization with MATLAB Filtering Images Hansrudi Noser June 8-9, 010 UZH, Multimedia and Robotics Summer School Noise Smoothing Filters Sigmoid Filters Gradient Filters Contents
More informationComplex Sensors: Cameras, Visual Sensing. The Robotics Primer (Ch. 9) ECE 497: Introduction to Mobile Robotics -Visual Sensors
Complex Sensors: Cameras, Visual Sensing The Robotics Primer (Ch. 9) Bring your laptop and robot everyday DO NOT unplug the network cables from the desktop computers or the walls Tuesday s Quiz is on Visual
More informationBIL Computer Vision Apr 16, 2014
BIL 719 - Computer Vision Apr 16, 2014 Binocular Stereo (cont d.), Structure from Motion Aykut Erdem Dept. of Computer Engineering Hacettepe University Slide credit: S. Lazebnik Basic stereo matching algorithm
More informationBasic relations between pixels (Chapter 2)
Basic relations between pixels (Chapter 2) Lecture 3 Basic Relationships Between Pixels Definitions: f(x,y): digital image Pixels: q, p (p,q f) A subset of pixels of f(x,y): S A typology of relations:
More informationLecture 4: Spatial Domain Transformations
# Lecture 4: Spatial Domain Transformations Saad J Bedros sbedros@umn.edu Reminder 2 nd Quiz on the manipulator Part is this Fri, April 7 205, :5 AM to :0 PM Open Book, Open Notes, Focus on the material
More informationDense 3-D Reconstruction of an Outdoor Scene by Hundreds-baseline Stereo Using a Hand-held Video Camera
Dense 3-D Reconstruction of an Outdoor Scene by Hundreds-baseline Stereo Using a Hand-held Video Camera Tomokazu Satoy, Masayuki Kanbaray, Naokazu Yokoyay and Haruo Takemuraz ygraduate School of Information
More informationReal-Time Scene Reconstruction. Remington Gong Benjamin Harris Iuri Prilepov
Real-Time Scene Reconstruction Remington Gong Benjamin Harris Iuri Prilepov June 10, 2010 Abstract This report discusses the implementation of a real-time system for scene reconstruction. Algorithms for
More informationEE368 Project Report CD Cover Recognition Using Modified SIFT Algorithm
EE368 Project Report CD Cover Recognition Using Modified SIFT Algorithm Group 1: Mina A. Makar Stanford University mamakar@stanford.edu Abstract In this report, we investigate the application of the Scale-Invariant
More informationDense 3D Reconstruction. Christiano Gava
Dense 3D Reconstruction Christiano Gava christiano.gava@dfki.de Outline Previous lecture: structure and motion II Structure and motion loop Triangulation Today: dense 3D reconstruction The matching problem
More informationUsing temporal seeding to constrain the disparity search range in stereo matching
Using temporal seeding to constrain the disparity search range in stereo matching Thulani Ndhlovu Mobile Intelligent Autonomous Systems CSIR South Africa Email: tndhlovu@csir.co.za Fred Nicolls Department
More informationCS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching
Stereo Matching Fundamental matrix Let p be a point in left image, p in right image l l Epipolar relation p maps to epipolar line l p maps to epipolar line l p p Epipolar mapping described by a 3x3 matrix
More informationCOMPUTER AND ROBOT VISION
VOLUME COMPUTER AND ROBOT VISION Robert M. Haralick University of Washington Linda G. Shapiro University of Washington T V ADDISON-WESLEY PUBLISHING COMPANY Reading, Massachusetts Menlo Park, California
More information