Managing and Mining Video-Sensor Data

Size: px
Start display at page:

Download "Managing and Mining Video-Sensor Data"

Transcription

1 Managing and Mining Video-Sensor Data Edward Y. Chang, Ankur Jain, Navneet Panda, Yuan-Fang Wang, Gang Wu, and Yi Wu Department of Electrical Engineering & Computer Science University of California, Santa Barbara, CA Abstract Sensor research has been undergoing a quiet revolution, promising to have significant impact on a broad range of applications relating to national security, health care, the environment, etc. In this work, we particularly focus our effort on building large-scale video-sensor networks, addressing several large-scale data-management and data-mining issues. We outline our preliminary results in two research areas: sensornetwork resource management, and statistical methods for sensor-data mining. We also discuss future directions that research might take. 1 Introduction Video sensors (video cameras) and wireless networks are becoming ubiquitous features of modern life. The confluence of these two technologies now makes it possible to construct wireless ad hoc networks of multiple video sensors that can be rapidly deployed, dynamically configured, and continuously operated to provide highlyavailable coverage for environment monitoring and security surveillance. While many extended eyes are being installed at an unprecedented pace, the intelligence needed for interpreting video-surveillance events by computers is still rather unsophisticated. In addition, algorithms that can scale well with the number of sensors and high-volume of data are yet to be developed for effectively managing and mining video-data streams. To develop a brain behind a large number of optical eyes to support (semi-) automatic event sensing, we have been developing statistical algorithms to improve the two major phases of a distributed, mobile surveillance system: data management and event mining [35, 122]. The datamanagement phase integrates multi-source data to detect and extract motion trajectories from video sources. (While motion analysis is the charter of the Computer Vision community, our work focuses on resource management and data integration.) The event-mining phase deals with classifying the events as to relevance for the query. The research challenges of the two phases are summarized as follows: Data management. Data management deals with sampling and filtering data at the sensors, transmitting the data (could be noisy or partial) to the server, and fusing distributed data to extract motion trajectories. Data management comes up against two major research challenges: distributed sensor & sensornetwork management and spatio-temporal data fusion. Given a query and its precision requirement, the sensor-network may need to move some cameras and reconfigure itself to see the event of interest. For instance, when a far-field camera detects some movement, a pan/tilt/zoom camera may be instructed to zoom onto the moving object, and at the same time, more resources (network and processing bandwidth) are allocated to these cameras. This reconfiguration must be performed in such a way that most useful data can be collected, given the resource constraints, to answer queries. Once useful data have been collected from cameras, the next challenge is to integrate observations from multiple cameras to build spatio-temporal patterns (to perform spatio-temporal join) that can best describe events in the environment. Such integration is necessary to improve surveillance coverage and reliability, and to deal with transient object-tracking obstacles such as spatial occlusion and scene clutter. Event mining. Event mining deals with mapping motion trajectories (sequence data) to semantics (e.g., benign and suspicious events). Most traditional statistical learning algorithms cannot be directly applied to variable-length sequence data, which may also exhibit temporal ordering. In addition, positive events (i.e., the sought-for hazardous events) are always significantly outnumbered by negative events in the training data. In such an imbalanced set of training data, the class boundary tends to skew toward the minority class and becomes very sensitive to noise. Furthermore, the best feature-to-event mapping is often application-, task-, and user-dependent. To provide useful results, the event recognizer must adapt its distance function to the circumstances as needed.

2 To answer the above challenges, we have been working on four research tasks to advance fundamental theories and develop statistical methods that can significantly improve the operation of video-sensor networks, quality of data fusion, and accuracy of event analysis. 1. Sensor-network resource management (Section 2). We have been developing statistical methods to manage networks for conserving resources, including power at the sensor nodes, as well as network bandwidth and other system resources at the server. 2. Statistical sensor-data fusion (Section 3). We have been devising algorithms to fuse spatially and temporally overlapped data for reliable event detection. Our research focus is on enhancing the reliability of existing object-tracking algorithms by performing both sensor-to-server data fusion and server-to-sensor information dissemination. 3. Sequence-data to event mapping (Section 4). Many abstractions (e.g., Fourier, Wavelets, string descriptors) have been proposed in the past to represent sequence data. However, we believe that the best abstraction should be event-dependent. Therefore, our approach will be first, to extract multi-resolution descriptors from sequence-data, and then rely on the algorithms that we subsequently develop to learn the best descriptor-combination for a target semantic. Furthermore, in contrast to the existing approaches (e.g., widely-used Hidden Markov Models), our statistical methods require a much smaller amount of training data to model a target event. Reducing training-data (or sample complexity) is critical for event detection, since training data for target events are often difficult to collect. 4. Context-based distance function formulation (Section 5). The choice of a good distance function plays a key role in any information retrieval task. We propose to formulate distance functions in a task- and query-dependent way. (Most traditional data-mining tasks employ a distance function such as or universally without considering the characteristics of queries, or the preferences of users.) Our distancefunction formulation method consists of two steps: 1) using active learning for acquiring application- and query-dependent information, and 2) applying kernel alignment for modifying input space in an efficient, non-linear way. The remainder of this paper presents for each of the above four tasks its related work, preliminary results, and future research plans. 2 Sensor-network Resource Management In a distributed sensor network, cameras record continuous high-volume video streams. Because of the high data volume and rapid rate, it is infeasible for an untethered, battery-powered sensor node to transmit a large quantity of raw data to a server for processing [10, 53, 87]. To conserve resources network bandwidth, storage, and CPU many recent papers [3, 7, 30, 95, 113] propose methods to reduce the amount of data delivered to the server. In these schemes, provided that the server can answer queries within specified precision constraints, data communication is not enacted. This research task investigates statistical methods for meeting the specified query precision while consuming a minimum amount of resources. A major shortcoming of the existing solutions is that they are often ad hoc, as explained in [7] by Widom and Motwani, and are highly application-dependent. To seek for a unified solution for managing distributed streams, we treat resource management in a sensor network as fundamentally a filtering problem: an effective stream-filtering algorithm should filter out the maximum amount of data as long as the query-precision constraints are met at the server. We introduce our Dual Kalman Filter ( ) architecture [66] as a general and adaptive solution to the stream-resource-management problem. Figure 1 depicts the role of our proposed (Dual Kalman Filter) model in a typical sensor-network architecture. A user (on the left-hand side of the figure) issues to the server an event query with certain precision constraints. The server activates a, denoted as, and at the same time, the target sensor activates a mirror with the same parameters, denoted as. The dual filters and predict future data values. Only when the filter at sensor fails to predict future data within the precision constraint (thus preventing from making an accurate prediction at the server) does the sensor send updates to. For instance, if no interesting event is taking place at a sensor, no data transmission is made to the server. When multiple events are taking place at a sensor, multiple pairs of and will be invoked to track the events. Significant bandwidth conservation can be achieved if a reliable and accurate data prediction mechanism is employed, and the server resources can be allocated to the sensors where actions are taking place. We have proposed to use the Kalman Filter as such a mechanism for its simplicity, efficiency, and provable optimality under fairly general conditions. Our preliminary results indicate that shows promise in several scenarios with which we experimented [66]. We will evaluate extended Kalman Filter and other models to identify the best solution(s) for given data-stream characteristics. 2.1 Relationship to Current State of Knowledge Algorithms in sensor-network resource management or stream management have received increased attention in the database community over the past two years. Major research directions include conserving computational and communication resources [3, 95], optimal storage algorithms, stream mining [130], query optimizers [10], and query solvers [64]. A comprehensive survey of the issues

3 User Precision Constraint + Smoothing Factor Query Processor Streaming Attribute Values Central Server Server KF KF 1 s Sever KF KF 2 s Server KF KF t s Smoothing Factor Covariance Matrix State Transition Matrix Communication Network Filtered Data Filtered Data Filtered Data Filtered Data Mirror KF KFm 1 δ 1 Mirror KF KFm 2 δ 2 Filtered Data Mirror KF KFm t δ t Remote Source 1 Smooth Data Smoothing KF Noisy Data KFc 1 F 1 Remote Source 2 Smooth Data Smoothing KF Noisy Data KFc 2 F 2 Remote Source t Smooth Data Smoothing KF Noisy Data KFc t F t Figure 1: The in data streams is presented in [10, 53]. Data streams have also been treated as a time series. Ideas from control theory were borrowed for the purposes of approximation [23] and mining [130]. Table 1 presents a comparative overview of our work contrasted with three other major data stream projects: STREAM [7, 95], AURORA [3, 113], and COUGAR [129, 26]. None of the compared approaches uses a prediction scheme, nor do they seem to degrade gracefully when the input data are noisy. Furthermore, they cannot exploit partial information about the stream arrival characteristics (if available) to boost their performance. Our general framework, however, can be applied to any streaming application by simply modifying the state transition matrix used in the Kalman Filter formulation. We advocate the use of the Kalman Filter ( ) for stream-filtering, since has been well studied and widely applied to many data-filtering and smoothing problems for over years. The Kalman Filter can be easily customized to handle varying stream characteristics, sensor noise, and time variance to meet the requirements specified in [87]. The same filtering framework can be adapted to address a large variety of stream resource management problems, providing a unified paradigm that is both powerful and versatile. Due to the space limitations, we refer the readers to [66] for an in-depth discussion on the rationale behind our choice of using the Kalman Filter. 2.2 Preliminary Results We have employed in our multi-camera surveillance prototype for analyzing vehicle/human behavior in a parking lot. One experiment that we have conducted is on moving-vehicle tracking. In our experiment, each moving object had two attributes: namely, location (in terms of and coordinates) and velocity (in terms speed and angle of direction). We used a uniform random-number generator to generate different slopes of the velocity vector at random intervals of time. We generated different speeds of the object at random time intervals in a similar manner. Thus the object could randomly change its speed and heading, and then continue on that linear path for a randomly generated length of time. The maximum speed model of the object was limited to 500 units, whereas the slope could arbitrarily change by any amount. We constructed a data set shown in Figure 2(a), using the above model containing data points at a sampling rate of ms. We tested the performance of the Kalman Filter approach on two different state models : Constant model: The system is modeled such that the latest updated value is the best prediction for the future. This model is conceptually similar to the standard cached approximation model. The measurement consists of just the position of the object in the two-dimensional space, i.e., coordinate and coordinate. Linear model: Here we take the rate of change of the position into consideration when predicting future values. Figure 2(b) shows comparative results of the two Kalman Filter models with the cached approximation scheme. Measurements are taken in the form of position. Given a precision constraint, point is updated to the server if an error in either X or Y value is greater than. In both models, only the position is recorded, not a measurement of the rate of change of coordinate values. As evident from Figure 2(b), the percentage of updates is the same, whether using caching or the constant model. This is because the constant model is similar to the caching scheme when the rate of change of values is not considered. However, if we use the linear model, we see that utilization of the communication resource is cut down by approximately at a moderate precision width of units. As the precision width increases, the communication resource utilization drops, and all three models show comparable performances. We also observe that the performs at least as well as the caching scheme, even in a worst-case scenario. 2.3 Future Work Many variations are possible in formulating a resourcemanagement problem. What is unique about the Kalman Filter, as we explained in detail in [66], is that the Kalman Filter can be customized to form workable solutions for all these formulations. The adaptation involves both simplification (e.g., static Kalman Filter or recursive least squares)

4 Stream System Proposed Solution Advantages of using Kalman Filter STREAM Adaptive precision bounds, approximate value cached at server, does not work for noisy data (no data smoothing), dynamic precision widths are cached at the server, best estimate for future is the last cached value at the server. Prediction algorithm can be used to reduce communication overhead even further, on-line data smoothing helps to provide query answers even for noisy data. AURORA COUGAR Static precision widths, resource management using dynamic sampling rates based on loss/gain ratios, bounds do not change with input characteristics of the stream. Partial query processing in the wireless network to prevent unnecessary data being forwarded (load shedding). Does not use any approximation or caching scheme. Prediction mechanism is based on input characteristics, output is sensitive to input values. Prediction scheme gives better results, reducing load adaptively rather than dropping chunks of data indiscriminately. Table 1: Summary of existing solutions and advantages of using Kalman Filter (a) Moving-object data set (b) Number of updates Figure 2: A Resource Conservation Example Using Kalman Filter. and generalization (e.g., extended Kalman Filter). Our future work will further tap into the strengths of the Kalman Filter, and we will investigate the following issues and incorporate their solutions into our model: Tuning system parameters for multiple queries with multiple attributes. Developing solutions for adaptively adjusting the sampling rate based on the innovation sequence. Evaluating models for handling data streams that exhibit non-linear patterns. Some candidate models to evaluate are particle filters and several extended Kalman filters. 3 Statistical Sensor-Data Fusion The server receives video streams from distributed cameras, each of which has limited spatial and temporal coverage, is potentially noisy, and is susceptible to occlusion and scene clutter. (To conserve bandwidth, a video camera can send just detected motion patterns, not raw video frames, to the server.) To achieve wide-area coverage, data from cameras must be fused. Fusing spatially and temporally overlapped data is a challenging task, since cameras may have different sampling rates and resolutions, and some cameras may be mobile. We propose here a hierarchical master-slave fusion scheme. Referring to Fig. 3, at the bottom level, each slave station tracks the movements of scene objects semiindependently. The local trajectories are then relayed to a master station for fusing into a consistent, global representation. This represents a bottom-up analysis paradigm. Furthermore, as each individual camera has a limited field Figure 3: Two-level Kalman Filter configuration. of view, and occlusion occurs due to scene clutter, we also employ a top-down analysis module that disseminates fused information from the master station to slave stations. This top-down information dissemination process assists in tracking, cross validation, and error recovery if the camera should lose track of an object. 3.1 Relationship to Current State of Knowledge To construct descriptions of motion events we must be able to track object movement. Object tracking has been extensively studied in the Computer Vision community, (e.g., [6, 12, 36, 37, 49, 51, 62, 63, 75, 101, 99, 100, 108, 124, 126, 128]). For this database project, we will not develop new object-tracking algorithms. Rather, we will enhance the reliability of existing algorithms by performing data fusion.

5 Sensor data fusion refers to the task of combining multiple-sensor data in a complementary and synergistic way to improve data availability, reduce noise, and improve robustness in the analysis. Sensor data can be fused for multiple sensors of the same or different types. This fusion can be done at data, feature, or decision levels. Data and feature fusion strategies are often used for combining heterogeneous sensor data, (e.g., in fusing inertia, ultrasonic, and vision sensors for mobile robotics applications [18, 31, 82, 83, 88, 94, 119]), and in fusing multi-image modalities (e.g., infrared and vision sensors) for target recognition and scene interpretation [86, 90, 68, 69, 54]. IBR (image-based-rendering) techniques [29, 38, 81, 25, 85, 91, 92, 110, 112, 105] can also be considered a data fusion strategy where single or multiple sensors, often of the same kind, are used to construct an environment map. Decision fusion strategies have their roots in pattern recognition [45, 50, 117]. These strategies have many well-established algorithms [14, 20, 21, 44, 46, 76, 57, 103, 104, 43] that are readily applicable to sensor data fusion. Our unique contribution is in using two-level Kalman Filters with both bottom-up and top-down analysis for data fusion and information dissemination from and to multiple sensors, thus improving tracking reliability. 3.2 Preliminary Results We used the Kalman Filter [22, 84] as the tool for fusing information spatially and temporally from multiple cameras to detect motion events. Suppose that a vehicle (or a person) is moving in the parking lot. Its trajectory is described in the global reference system by. The trajectory may be observed in camera, as, where ( is the number of cameras used). The goal is then to optimally track, correlate, and fuse individual camera trajectories into a consistent, global description. 1 We formulate the solution as a two-level hierarchy of the Kalman Filters. Referring to Fig. 3, at the bottom level of the hierarchy, we employ for each camera a Kalman Fil- ter to estimate, independently, the position, velocity, and acceleration of the vehicle, based on the tracked image trajectory of the vehicle in the local camera 1 There are two issues that need to be addressed here: registration and correspondence. First, to fuse measurements from multiple sensors into one global estimate, two registration processes are needed: spatial registration to establish the transformation among different camera coordinate frames, and temporal registration to synchronize multiple local clocks. These techniques are well established in the literature, [32, 42, 49, 55, 58, 109, 128, 131], and we have developed algorithms to accomplish both spatial and temporal registration [67]. Second, it may be difficult to synchronize the activities observed in multiple cameras. The question is how to disambiguate the correspondence of multiple trajectories. Spatial and temporal trajectory correspondence can be established through the camera registration and stereopsis correspondence processes which are well established techniques in photogrammetry and computer vision. For our discussion, we will assume that these problems can be solved and we can achieve spatial and temporal registration of trajectories and disambiguate among multiple trajectories. (Interested readers are referred to our recent paper [67] for more details.) reference frame. Or in the Kalman Filter jargon, the position, velocity, and acceleration vectors establish the state of the system while the image trajectory serves as the observation of the system state. At the top level of the hierarchy, we use a single Kalman Filter to estimate the vehicle s position, velocity, and acceleration in the global world reference frame this time, using the estimated positions, velocities, and accelerations from multiple cameras (,, ) as observations (the solid feed-upward lines in Fig. 3). This is possible because camera calibration and registration [32, 49, 58, 128, 131] are used for deriving the transform matrices ( "!"#$&%(')+*-, and.%(')+*-,"$!/# in Fig. 3). These matrices allow, measured in the reference frame of an individual camera, to be related to P in the global world system. An interesting scenario occurs when one (or more) cameras in the sensor network loses track of an object. This can happen because of scene clutter, self- and mutualocclusion, or the tracked objects exiting the field-of-view of a camera, among many other possibilities. The camera could switch from a track mode into a re-acquire mode by searching the whole image for telltale signs of the object; however, doing so inevitably slows down eventprocessing and introduces a high degree of uncertainty in the resulting event description. Instead, we allow the dissemination of fused information to individual cameras (the dashed feed-downward lines in Fig. 3) to help guide the reacquisition process. The Kalman Filter, being a flexible information-fusion algorithm, can readily use the fused information (instead of sensor data) for maintaining and updating state vectors. This hierarchical feed-upward (for sensor data fusion) and feed-downward (for information dissemination) filter structure thus provides a powerful and flexible mechanism for joining sensor data spatially. We have collected hours of video using multiple video cameras in a parking lot. The video frames depicted both human and vehicular motion. The motion patterns for vehicles included entering, exiting, turning, backing up, circling, zig-zag driving, and many more. For human motion, we recorded actions involving both individuals and groups, with patterns such as following, following-andgaining, stalking, congregating, splitting, and loitering, among many others. Some of these patterns (like zig-zag driving and stalking) were acted out by our group members, while others represented behaviors commonly observed in the parking lot. Due to space limitations, we show only two sample results here. Sample results for tracking the movements of people in a parking lot are shown in Fig. 4(a) and (b). Of the three cameras we used, the views of two were partially occluded by parked cars 2. The individual camera trajectories could therefore be broken. However, by using our two-level filter structure, we were able to fill in the gap, smooth out sensor noise, and fuse individual trajectories into a complete, global descrip- 2 The camera positions in these figures indicate only the general directions of camera placement. The actual cameras were placed much farther away from the scene and always pointed to the parking lot.

6 (a) (c) (b) Figure 4: (a) A simulated stalking behavior in a parking lot and (b) trajectories of the sample stalking behavior. (c) and (d): similar data fusion results for vehicular motion. In these figures, the - is the fused trajectory;. is the tracked trajectory from camera 1; x is the tracked trajectory from camera 2; and o is the tracked trajectory from camera 3. (d) tion. Fig. 4(c) and (d) show the analysis of a vehicle s driving pattern when two cameras were used. Note that even with a very small overlap in the fields-of-view of the two cameras and a circling motion covering a large spatial area (hence, each camera observed only a part of the motion trajectory), we were able to fuse the individual camera trajectories to arrive at a complete description. 3.3 Future Work We plan to investigate the following issues and incorporate their solutions into our existing model: Modifying the hierarchical Kalman-Filter model to perform spatial join for mobile sources. Improving the robustness of the Kalman Filter by allowing tracking of multiple state-estimates per object. While the Kalman Filter is a simple and powerful mechanism for state estimation, its validity is questionable if the assumptions on the prior and noise are not valid. Furthermore, there are situations where multiple hypotheses have to be kept until a later time when more visual evidence has been gathered to validate some and discredit others. For example, if two or a group of persons enter the field of view of a camera in such a way that their silhouettes overlap significantly or one completely occludes the other, it can be difficult for the tracking algorithm to discern if such a moving region corresponds to a single person or several persons. The single-person hypothesis might be kept until it can be safely discredited. We will address this problem using hypothesis-andverification paradigms through sampling [75, 97, 62, 63]. 4 Sequence-data to Event Mapping To interpret video-sensor data, we need to map sequence data (motion trajectories) to events. A sequence data is defined as an ordered set of items: These items are logically contiguous and each item denotes a set of attributes varying according to different applications. Given a set of sequences, that can be partitioned into a labeled subset and an unlabeled subset, the task of sequence-data learning is to learn a discriminative function from set using algorithm. Then, using, we can predict the label for unlabeled sequence. To conduct supervised learning with a small number of training instances, the discriminant approach has been shown to be much more effective [65] than the generative approach (as in HMMs). In particular, SVMs require only those boundary instances (support vectors) to participate in a class prediction, and hence require a much smaller amount of training data than the other methods. Unfor-

7 tunately, traditional kernel functions (such as polynomial and RBF functions) that have been employed with SVMs assume a feature space of fixed dimensions. They cannot be applied to sequence data, which are variable-length in nature. We will design kernel functions that can effectively handle variable-length sequence data. To conduct supervised learning, we need first to extract useful information (features) from sequence data to form representations [132]. Although many representations have been proposed in the past (see Section C.1 for detailed discussion), we believe that the best representation should be event-dependent. Therefore, our approach will be first, to extract multi-resolution descriptors from sequence-data, and then rely on the algorithms that we subsequently develop to learn the best descriptorcombination for a target semantic. For instance, a motion pattern can be depicted as a sequence of symbolic strings at the coarse level, yet detailed information such as velocity and acceleration is recorded at the refined levels. If an event concerns only the turning pattern of a vehicle, then the coarse-level symbolic representation may be adequate; otherwise, proper secondary structures should be used. To support multi-resolution learning, we will 1) design kernel functions to characterize similarity at individual resolution-levels, and 2) research kernel-fusion mechanisms to integrate kernels at multiple levels. For both individual kernel design and kernel fusion, we will prove the kernels to be mathematically valid and verify them to be effective. 4.1 Relationship to Current State of Knowledge Sequence-data learning involves two major subtasks: description of sequence data and development of a sequencedata learning algorithm. Related work in these two areas is summarized as follows Sequence-data Representation Many sequence-data descriptions have been introduced in the past. Figure 5 summarizes the more popular sequencedata descriptions in the literature. Traditional descriptions can be roughly divided into numeric-valued and symbolicvalued types. Numeric-based descriptions represent raw sequences as a sequence of transformed numeric values. For example, Discrete Fourier Transform (DFT) [48] uses Fourier transformation to represent original sequences. Discrete Wavelet Transform (DWT) [60] applies wavelet transform as sequence representation. Singular Value Decomposition (SVD) [77] uses eigenvalues to provide information about the underlying structure of the data. Piecewise linear approximation (PLA) [96] approximates each subsequence by a linear function. Then the coefficients of each linear function are concatenated as a new sequence representation. Piecewise aggregate approximation (PAA) [70] segments each sequence into a fixed number of subsequences, and then the mean values of each subsequence are concatenated as a data-reduced representation. Symbolic representations such as natural language [56] and Discrete Discrete Fourier Wavelet Transform Transform Numeric valued Representation Singular Piece wise Value Linear Decomposition Approximation Sequence Representation Symbolic valued Representation Piece wise Natural String Aggregate Language Approximation Figure 5: Sequence Representation Literature. strings [28] are treated as a simplified transformation of the original data that retains much of the important temporal information. Although symbolic representation does not retain much numeric detail, it enjoys the advantage of improved computational efficiency. Also the analysis of symbolic data is often less sensitive to measurement noise [41]. For a specific application, choosing the best representation is a challenging research problem. Keogh s paper reports a comprehensive series of experiments to show that there is no sequence-data representation that works best for all kinds of datasets [71]. The best sequence-data representation is actually application-dependent. The goal of this research task is to find the best representation (the best combination of descriptors) in an application-dependent and query-dependent manner Sequence-data Learning Being able to measure similarity between data instances accurately is fundamental to learning. Traditional sequence-data similarity measurements include Minkowski metrics [77] and Non-Minkowski metrics [74]. More sophisticated sequence distance measures such as dynamic time warping (DTW) [9, 72], piecewise normalization [61], suffix tree [59], edit distance [19], and cosine wavelets [60] have been investigated for sequences with variable length. Another popular method for modeling and predicting temporal sequences is based on HMMs [98, 16]. HMMs model sequential dependencies by treating the sequence as a Markov chain. HMMs have been shown to be effective in applications such as speech recognition. Despite their success, HMMs may require significant numbers of training data when the model size is large. 4.2 Preliminary Results The kernel design task is to find a valid and meaningful kernel for sequence data in two steps. The first step is to design a kernel for each sequence descriptor, and the second step to fuse multiple kernels in an optimal way Individual Kernel Design In this thread, we design new kernels for sequence-data learning. SVMs [116] are the most popular kernel-based methods, but SVMs can be applied only to training data that reside in a vector space. The basic form of an SVM

8 ,, = kernel function which classifies an input data is expressed as where is a mapping function which maps input vectors (1) into the feature space; operator denotes the inner product operator; is the training sample; is its class label; and is its Lagrange multiplier. A kernel function is represented by, and the bias by. For sequence data, in particular variable-length sequences, we lack the basis function for mapping sequences with various lengths to spaces of different dimensions. Fortunately, the embedding of a finite set of points is entirely specified by writing a finite-dimensional kernel matrix. Put another way, as long as we have a positive definite kernel matrix, which characterizes the sequencedata similarity, we can use kernel methods [80]. Hence, the design task is reduced to formulating a kernel matrix satisfying two requirements: the semantic requirement and the mathematical requirement. Regarding the semantic requirement, Kernel matrix must capture the similarity in local and global structure between the sequence data. As to the mathematical requirement, a valid kernel matrix must be symmetric and positive semi-definite [106] to ensure that the projected feature space does exist. P2 Figure 6: Example of Transitive Similarity. A natural way to define the similarity between two sequences is by using pair-wise string alignment scores [93, 111]. Two sequences with variable lengths can be aligned by matching symbols at corresponding positions and inserting - at the unaligned positions. An alignment is a mutual arrangement of two sequences, showing where the two sequences are similar, and where they differ. The more aligned two sequences are, the more similar they are. By performing alignment, given sequences, we can build a matrix, in which is the pairwise similarity between sequences and. However, there is one potential problem with matrix : though is symmetric, it might not be positive semi-definite. To remedy the problem, we initially propose to consider transitive similarity when measuring pair-wise similarity. To motivate our approach, Figure 6 provides an example of considering transitive similarity between data, with each node denoting one data instance in the 2-D P1 P3 space. Assume,! and #" form an equilateral triangle, which means that the distances between them are the same. However, if we take data distribution into consideration, we notice that more data are located between and#, or# and$". More likely, and$" than between and$" belong to the same class, and is an outlier. Therefore, we need to define a kernel matrix which can consider both pair-wise similarity and transitive similarity. Intuitively, a transitive relationship is helpful to characterize the similarity between data more accurately by considering data distribution. Furthermore, we have proved the following two important propositions, which show that when a given similarity is symmetric, taking transitive relationship into consideration will result in a legal kernel. Proposition 4.1 Denote% as the similarity between sequence and by using pair-wise string alignment scores. If a matrix is defined as: '& % $(*)+ -,. (2) $(*)+. 0/ then is a semantically valid kernel, reflecting the similarity relationship between sequences, including transitive similarity. Proposition 4.2 0/ is a mathematically valid kernel, which is symmetric and semi-positive definite Kernel Fusion After formulating individual kernels, the next step is fusing individual kernels. Each individual kernel extracts a specific type of information from given data, thereby providing a partial view of the data. Kernel fusion forms a complete picture of the relationship between different components of the original sequence-data. Assume 1 is a relation between instances and and their parts, i.e., 132 decomposes an instance into a set of D-tuples. Kernel, is the similarity between parts, and,. For different contexts, not all levels descriptors should be considered as having the same importance. We propose kernel fusion to provide the flexibility to learn which level should be more important according to the target learning task. Possible fusion rules are weighted sum and tensor product, since kernels have proven to be closed under sum and product. The weighted sum is formulated as # ;: <: = %, <:, (3), Tensor product formulation is defined as # <: ;: = <: 4.3 Future Work?> > ;: = (4) We have proved that our kernel design and kernel fusion algorithms both produce mathematically valid kernels, and the excitement has just begun. Regarding kernel design, our immediate plan is to verify the effectiveness of the exponential kernel through extensive empirical studies on several datasets including UCI time-series

9 data, video-surveillance motion trajectories, and RNA sequences. In addition to the exponential kernel, there are a couple of promising candidate kernels that we plan to investigate. Regarding kernel fusion, we currently find the best fusion model (e.g., the weights of individual kernels) through cross-validation, which can be time-consuming. We plan to investigate this fusion problem to see if it can be formulated and solved as a convex optimization problem. In addition, we have started investigating a nonlinear kernel fusion scheme in [125, 123], and the result is promising. 5 Context-based Distance Function Formulation Interpreting a video event is a context-dependent task. For instance, a vehicle circling in an empty parking lot in the night-time is suspicious, whereas the same pattern taking place in the day time, in a full or empty parking lot, may be benign. A stalking pattern in a parking lot may raise a security concern, but the same pattern happening in a grocery store can be incidental, and unharmful. Thus, videoevent recognition must consider contextual information. Context-based information access was identified as a key future database research area at a recent National Science Foundation workshop [2]. At the heart of context-based information access is the formulation of an application- and user-dependent distance function for measuring data similarity. An accurate measurement of similarity based on contextual information is essential to personalize many database tasks, such as clustering, indexing, and retrieval [8, 13, 73]. The quality of the distance function significantly affects the success in finding meaningful results [4, 5, 17, 47, 52, 78]. To date, the most widely used distance metric is perhaps the Euclidean distance because of its intuitive nature and ease of computation. In the last two decades, much work was devoted to transforming the Euclidean distance (or more generally, the -norm) by weighting the features based on their importance for a target task [5, 47]. Weighting the features is equivalent to performing a linear transformation on the space formed by the features. Linear models enjoy the twin advantages of simplicity of description and efficiency of computation. However, this same simplicity is insufficient to model similarity for many real-world datasets. For example, it has been shown in image/video retrieval that a query concept is typically a nonlinear combination of perceptual features ( [102, 114]). Linear models can be too restrictive for mapping features to semantics, and hence unsuitable. To support flexible and effective context-based distance-function formulation, we will research a nonlinear feature transformation procedure called kernel alignment. At first it might seem that nonlinear transformation could suffer from high model and computational complexity. Our kernel alignment procedure can successfully avoid these problems by employing the kernel trick. The kernel trick lets us generalize distance-based algorithms to operate in the feature space (defined next), usually nonlinearly related to the input space. The input space (denoted as ) is the original space in which data vectors are located, and the feature space (denoted as ) is that space to which the data vectors are projected, linearly or nonlinearly. The advantage of using the kernel trick is that, instead of explicitly determining the coordinates of the data vectors in the feature space, the distance computation in can be efficiently performed in through a kernel function. Specifically, given two vectors ( and, kernel function is defined as the inner product of and $, where is a basis function that maps the vectors and $ from to. The inner product between two vectors can be thought of as a # measure of their similarity. Therefore, returns the similarity of and $ in. The distance between and in terms of the kernel is defined as # # # # Since a kernel function can be either linear or nonlinear, the traditional feature-weighting approach (e.g., [5]) is just a special case of kernel alignment. What is a good distance function and how do we consider contextual information in formulating one such function to interpret video events? We aim to answer these questions in two thrusts, theory and algorithm. Theory thrust. We will derive theories to show how to optimally transform a given distance function by using contextual information. We are particularly interested in designing transformation methods that are both efficient and flexible in modeling complex distance functions. Algorithm thrust. We will develop models and algorithms to effectively collect contextual information, and then use the contextual information to transform distance functions. 5.1 Relationship to Current State of Knowledge Distance function learning approaches can broadly be divided into metric-learning and kernel-learning approaches. In the rest of this section we discuss representative work using these two approaches Metric Learning Metric learning attempts to find the optimal linear transformation for the given set of data vectors to better characterize the similarity between the vectors. The transformation by itself is linear, but the data vectors may first be mapped to a new set of vectors using a nonlinear function. The transformation of the data vectors is equivalent to assigning weights to the features of the vectors; therefore, metric learning is often called feature weighting. The metric learning approach is given a set of data vectors ( and similarity information in the form of

10 # (a similar set), if and are similar. Metric learning aims to learn a distance metric # between data vectors and that respects the similarity information. Mathematically the distance metric can be represented as # # (5) # # where needs to be positive (semi-) definite so as to satisfy metric properties non-negativity and triangle inequality. The choice of the basis function and the scaling matrix will differentiate the various metric learning algorithms. Wettschereck et al. [120] provide a review of the performance of feature-weighting algorithms with emphasis on their performance for the -nearest neighbor classifier. Here, we discuss only a few representative algorithms. (For the other algorithms, please refer to [120].) A number of papers in the current literature address the problem of learning distance metrics by using side information in the form of groups of similar vectors [11, 127]. Side information can be user-provided information on the similarity characteristics of a subset of data. The work of [11] uses Relevant Component Analysis to efficiently learn a full-rank Mahalanobis metric [89]. The authors use equivalence relations for the side information. They compute where is the mean of the group of vectors, and where and denote the number of groups and the number of samples in the -th group, respectively. The matrix 2 is used for transformation and the inverse of as the Mahalanobis matrix. Xing et al. [127] treat the same problem as a convex optimization problem, thus producing local-optima-free algorithms. They present techniques for learning the weighting matrix for both the diagonal and the full matrix cases. The major difference between the two approaches lies in the fact that RCA uses closed-form expressions, whereas [127] uses iterative methods that can be sensitive to parameter tuning but are computationally expensive. C. Aggarwal [5] discusses a systematic framework for designing distance functions sensitive to particular characteristics of the data. The models used are the parametric Minkowski model and the the parametric cosine model. Both these models attempt to minimize the error with respect to each. The parametric Minkowski model can be thought of as feature weighting in, and the parametric cosine model as the inner product in. In summary, metric learning aims to learn a good distance function by computing the optimal feature weightings in the input space. Clearly, this linear transformation is restrictive in terms of modeling complex semantics. Although one can perform nonlinear transformation on the features via a basis function in, the resulting computational complexity renders this approach impractical. The kernel learning approach, which we discuss next, successfully addresses the concern about computational complexity Kernel Learning Kernel-based methods attempt to implicitly map a set of data vectors in to some other highdimensional (possibly infinite) feature space, using a basis function (usually nonlinear):. Kernel is defined as an inner product between two basis functions and $ in,! #" # # Kernel-based methods use these inner products ( ) as a similarity measure. (Theoretical justifications are presented in [107].) The kernel provides an elegant way of dealing with nonlinear algorithms by reducing them to linear ones in [107]. Any linear algorithm that can be carried out in terms of inner products can be made nonlinear by substituting an a priori kernel. A typical example is Support Vector Machines [24]. The requirement for choosing a valid is that it should be positive (semi-) definite and symmetric. The distance between and $ in can then be computed via the kernel trick ( $ # # ( ( An important advantage of using kernels stems from the ease of computing the inner product (similarity measure) in without actually having to know, as long as the chosen kernel is a positive (semi-) definite function. Much work [15, 24, 107] has been done in classification, clustering, and regression methods, using indirect computations of the kernel # and hence the distance #. Due to the central role of the kernel, a poor kernel function can lead to significantly poor performance. Cristianini et al. [40] introduced the notion of kernel alignment to measure the similarity between two kernels or between a given kernel and a target function. Geometrically, when given a set of training instances, the alignment score is defined as the cosine of the angle between the two kernel matrices 3, after flattening the two matrices into vectors. They also proposed the notion of ideal kernel (&% ), toward which any given kernel is supposed to be aligned. (We will discuss further in Section D.2.) Based on the idea of kernel alignment and ideal kernel, recently, a couple of researchers have begun to develop a process for learning a kernel directly from the 3 Given a kernel function ' and a set of instances (, the kernel matrix (Gram matrix) is the matrix of all possible inner-products of pairs from (, )+*-,/.$0 1324*5'!,7680:9;6<12.

11 training data, instead of choosing a prior kernel. Cristianini et al. [40] proposed a transductive learning [118] method to learn the kernel matrix by optimizing the eigenvalue coefficients for the spectral decomposition of the full kernel matrix on both training and test data. Crammer et al. [39] argued that the alignment may not be a good measure when the magnitude # is large. Therefore, they formulated the kernel matrix learning through a boosting process [34], so that a good kernel matrix can be constructed by weighting simple base kernels obtained from solving the generalized eigenvector problem. Both methods are transductive and may not generalize to unseen instances, unless they repeat the algorithm by incorporating those new instances into the training procedure (a rather inefficient method to classify an unseen data instance). In summary, though these recent methods suggest interesting ways to modify a given kernel function, some important questions still remain to be answered, such as Is the ideal kernel really ideal? Is the modification optimal? and Is the resulting kernel a valid one? We have recently developed a kernel alignment algorithm to perform linear transformation on prior kernel matrix. We have theoretically proven that our method leads to a valid distance function. We report our preliminary results and discuss future work in the next sections, D.2 and D Preliminary Results, where Let us consider a two-class classification problem with a training dataset and. We can consider as a similarity measure between instances and. For instance, when an RBF function is employed, the value of ( ranges from to, where when ( and are infinitely far away (dissimilar) in the input space, but when and are infinitely close (similar). The parameters associated with a kernel determine pairwise similarity measures. Thus, the choice of a good kernel and its parameters is equivalent to the choice of a good distance function for measuring similarity. When we have perfect knowledge about the class membership (denoted as ) of each instance, we can write the ideal similarity matrix generated by the ideal kernel % given in [40] as follows % &, An ideal kernel can be constructed on a training dataset where class labels are known. Unfortunately, the ideal kernel overfits the training data, so it cannot propagate the prior knowledge to unseen data. The work of [40] proposes to measure the alignment of a kernel with the ideal kernel using their Frobenius product +%. Kernel-alignment uses the alignment score to the ideal kernel % to indicate the degree to which the kernel fits the training data. However, since (7) the ideal kernel itself is a trivial kernel, aligning a function toward the ideal kernel ([79]) may not lead to improved results. We have devised a kernel alignment algorithm, which performs linear transformation on a prior kernel matrix. We have theoretically proven that our method leads to a valid distance function. More importantly, we have also shown that since the optimization is performed in a convex space, the solution obtained is globally optimal. Consequently, given a prior kernel and side information, the alignment needs to be performed just once. Our empirical results show that our proposed method outperforms competing methods on a variety of testbeds [121]. More specifically, we use a linear transformation model in, not in, to idealize. The kernel matrix is then modified as follows (where! ) &., (8)!! In what follows, we present two important propositions for which we have provided proof. Proposition 5.1 demonstrates that under some constraints on and $, our proposed idealized kernel in Eq. 8 is a valid kernel. Proposition 5.2 mathematically demonstrates that under the constraints from Proposition 5.1, the idealized in Eq. 8 guarantees a better alignment to the ideal % than to the prior. In both propositions, we assume that. This constraint means that we place more emphasis on decreasing kernel value for dissimilar instance-pairs. This constraint is in line with the spirit of achieving maximum distance between dissimilar pairs to keep the separating margin large. is positive definite if the prior kernel is positive definite. Proposition 5.1 Under the assumption that!, the idealized kernel Proposition 5.2 The kernel matrix of the idealized kernel obtains a better alignment than the prior kernel ma-. trix to the ideal kernel matrix %, if Moreover, a smaller or would induce a higher alignment score. 5.3 Future Work We plan to further address the following three research issues: Ideal kernel. We are not satisfied with the ideal kernel proposed by Cristianini et al. [40]. We will investigate what a better ideal kernel might entail. Our preliminary conjecture is that an ideal kernel needs to take data distribution into consideration in order to avoid the problem of overfitting. We believe that the exponential kernel (discussed in conjunction with task B), which has a strong link to the graph theory, can potentially provide a data-dependent ideal kernel. We will identify other candidates and conduct extensive experiments to validate

Event Sensing on Distributed Video Sensor Networks

Event Sensing on Distributed Video Sensor Networks Event Sensing on Distributed Video Sensor Networks Edward Chang Associate Professor Department of Electrical and Computer Engineering University of California Collaborator Prof. Yuan-Fang Wang Industry

More information

Fast trajectory matching using small binary images

Fast trajectory matching using small binary images Title Fast trajectory matching using small binary images Author(s) Zhuo, W; Schnieders, D; Wong, KKY Citation The 3rd International Conference on Multimedia Technology (ICMT 2013), Guangzhou, China, 29

More information

Multi-view Sequence-data Representation and Non-metric Distance-function Learning

Multi-view Sequence-data Representation and Non-metric Distance-function Learning 1 Multi-view Sequence-data Representation and Non-metric Distance-function Learning Yi Wu, Gang Wu and Edward Y. Chang 2 Abstract Sequence-data analysis plays a key role in many scientific studies and

More information

I How does the formulation (5) serve the purpose of the composite parameterization

I How does the formulation (5) serve the purpose of the composite parameterization Supplemental Material to Identifying Alzheimer s Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis I How does the formulation (5)

More information

Evaluation of Moving Object Tracking Techniques for Video Surveillance Applications

Evaluation of Moving Object Tracking Techniques for Video Surveillance Applications International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2015INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Evaluation

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

Bumptrees for Efficient Function, Constraint, and Classification Learning

Bumptrees for Efficient Function, Constraint, and Classification Learning umptrees for Efficient Function, Constraint, and Classification Learning Stephen M. Omohundro International Computer Science Institute 1947 Center Street, Suite 600 erkeley, California 94704 Abstract A

More information

Accelerometer Gesture Recognition

Accelerometer Gesture Recognition Accelerometer Gesture Recognition Michael Xie xie@cs.stanford.edu David Pan napdivad@stanford.edu December 12, 2014 Abstract Our goal is to make gesture-based input for smartphones and smartwatches accurate

More information

This research aims to present a new way of visualizing multi-dimensional data using generalized scatterplots by sensitivity coefficients to highlight

This research aims to present a new way of visualizing multi-dimensional data using generalized scatterplots by sensitivity coefficients to highlight This research aims to present a new way of visualizing multi-dimensional data using generalized scatterplots by sensitivity coefficients to highlight local variation of one variable with respect to another.

More information

Chapter 2 Basic Structure of High-Dimensional Spaces

Chapter 2 Basic Structure of High-Dimensional Spaces Chapter 2 Basic Structure of High-Dimensional Spaces Data is naturally represented geometrically by associating each record with a point in the space spanned by the attributes. This idea, although simple,

More information

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation Optimization Methods: Introduction and Basic concepts 1 Module 1 Lecture Notes 2 Optimization Problem and Model Formulation Introduction In the previous lecture we studied the evolution of optimization

More information

International Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERA

International Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERA International Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERATIONS KAM_IL SARAC, OMER E GEC_IO GLU, AMR EL ABBADI

More information

Structural and Syntactic Pattern Recognition

Structural and Syntactic Pattern Recognition Structural and Syntactic Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent

More information

Contents. Preface to the Second Edition

Contents. Preface to the Second Edition Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................

More information

Geometric Considerations for Distribution of Sensors in Ad-hoc Sensor Networks

Geometric Considerations for Distribution of Sensors in Ad-hoc Sensor Networks Geometric Considerations for Distribution of Sensors in Ad-hoc Sensor Networks Ted Brown, Deniz Sarioz, Amotz Bar-Noy, Tom LaPorta, Dinesh Verma, Matthew Johnson, Hosam Rowaihy November 20, 2006 1 Introduction

More information

Linear methods for supervised learning

Linear methods for supervised learning Linear methods for supervised learning LDA Logistic regression Naïve Bayes PLA Maximum margin hyperplanes Soft-margin hyperplanes Least squares resgression Ridge regression Nonlinear feature maps Sometimes

More information

Image Processing. Image Features

Image Processing. Image Features Image Processing Image Features Preliminaries 2 What are Image Features? Anything. What they are used for? Some statements about image fragments (patches) recognition Search for similar patches matching

More information

Video Alignment. Literature Survey. Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin

Video Alignment. Literature Survey. Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin Literature Survey Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin Omer Shakil Abstract This literature survey compares various methods

More information

Multiple-Choice Questionnaire Group C

Multiple-Choice Questionnaire Group C Family name: Vision and Machine-Learning Given name: 1/28/2011 Multiple-Choice naire Group C No documents authorized. There can be several right answers to a question. Marking-scheme: 2 points if all right

More information

The Comparative Study of Machine Learning Algorithms in Text Data Classification*

The Comparative Study of Machine Learning Algorithms in Text Data Classification* The Comparative Study of Machine Learning Algorithms in Text Data Classification* Wang Xin School of Science, Beijing Information Science and Technology University Beijing, China Abstract Classification

More information

Fusion of Radar and EO-sensors for Surveillance

Fusion of Radar and EO-sensors for Surveillance of Radar and EO-sensors for Surveillance L.J.H.M. Kester, A. Theil TNO Physics and Electronics Laboratory P.O. Box 96864, 2509 JG The Hague, The Netherlands kester@fel.tno.nl, theil@fel.tno.nl Abstract

More information

Dimension Reduction CS534

Dimension Reduction CS534 Dimension Reduction CS534 Why dimension reduction? High dimensionality large number of features E.g., documents represented by thousands of words, millions of bigrams Images represented by thousands of

More information

Machine Learning Techniques for Data Mining

Machine Learning Techniques for Data Mining Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already

More information

An Introduction to Content Based Image Retrieval

An Introduction to Content Based Image Retrieval CHAPTER -1 An Introduction to Content Based Image Retrieval 1.1 Introduction With the advancement in internet and multimedia technologies, a huge amount of multimedia data in the form of audio, video and

More information

Eye Detection by Haar wavelets and cascaded Support Vector Machine

Eye Detection by Haar wavelets and cascaded Support Vector Machine Eye Detection by Haar wavelets and cascaded Support Vector Machine Vishal Agrawal B.Tech 4th Year Guide: Simant Dubey / Amitabha Mukherjee Dept of Computer Science and Engineering IIT Kanpur - 208 016

More information

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering SYDE 372 - Winter 2011 Introduction to Pattern Recognition Clustering Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 5 All the approaches we have learned

More information

A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis

A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis Meshal Shutaywi and Nezamoddin N. Kachouie Department of Mathematical Sciences, Florida Institute of Technology Abstract

More information

Toward Building a Robust and Intelligent Video Surveillance System: A Case Study Edward Chang and Yuan-Fang Wang

Toward Building a Robust and Intelligent Video Surveillance System: A Case Study Edward Chang and Yuan-Fang Wang Toward Building a Robust and Intelligent Video Surveillance System: A Case Study Edward Chang and Yuan-Fang Wang Douglas R. Lanman CS 295-1: Sensor Data Management 28 Sept. 2005 1 Outline Introduction

More information

Data mining with sparse grids

Data mining with sparse grids Data mining with sparse grids Jochen Garcke and Michael Griebel Institut für Angewandte Mathematik Universität Bonn Data mining with sparse grids p.1/40 Overview What is Data mining? Regularization networks

More information

Unsupervised learning in Vision

Unsupervised learning in Vision Chapter 7 Unsupervised learning in Vision The fields of Computer Vision and Machine Learning complement each other in a very natural way: the aim of the former is to extract useful information from visual

More information

Energy Conservation of Sensor Nodes using LMS based Prediction Model

Energy Conservation of Sensor Nodes using LMS based Prediction Model Energy Conservation of Sensor odes using LMS based Prediction Model Anagha Rajput 1, Vinoth Babu 2 1, 2 VIT University, Tamilnadu Abstract: Energy conservation is one of the most concentrated research

More information

Prof. Fanny Ficuciello Robotics for Bioengineering Visual Servoing

Prof. Fanny Ficuciello Robotics for Bioengineering Visual Servoing Visual servoing vision allows a robotic system to obtain geometrical and qualitative information on the surrounding environment high level control motion planning (look-and-move visual grasping) low level

More information

Exploring Curve Fitting for Fingers in Egocentric Images

Exploring Curve Fitting for Fingers in Egocentric Images Exploring Curve Fitting for Fingers in Egocentric Images Akanksha Saran Robotics Institute, Carnegie Mellon University 16-811: Math Fundamentals for Robotics Final Project Report Email: asaran@andrew.cmu.edu

More information

Unsupervised Learning

Unsupervised Learning Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

Combine the PA Algorithm with a Proximal Classifier

Combine the PA Algorithm with a Proximal Classifier Combine the Passive and Aggressive Algorithm with a Proximal Classifier Yuh-Jye Lee Joint work with Y.-C. Tseng Dept. of Computer Science & Information Engineering TaiwanTech. Dept. of Statistics@NCKU

More information

The Curse of Dimensionality

The Curse of Dimensionality The Curse of Dimensionality ACAS 2002 p1/66 Curse of Dimensionality The basic idea of the curse of dimensionality is that high dimensional data is difficult to work with for several reasons: Adding more

More information

CS 229 Midterm Review

CS 229 Midterm Review CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask

More information

Robust PDF Table Locator

Robust PDF Table Locator Robust PDF Table Locator December 17, 2016 1 Introduction Data scientists rely on an abundance of tabular data stored in easy-to-machine-read formats like.csv files. Unfortunately, most government records

More information

CS 521 Data Mining Techniques Instructor: Abdullah Mueen

CS 521 Data Mining Techniques Instructor: Abdullah Mueen CS 521 Data Mining Techniques Instructor: Abdullah Mueen LECTURE 2: DATA TRANSFORMATION AND DIMENSIONALITY REDUCTION Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major Tasks

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-

More information

Random Forest A. Fornaser

Random Forest A. Fornaser Random Forest A. Fornaser alberto.fornaser@unitn.it Sources Lecture 15: decision trees, information theory and random forests, Dr. Richard E. Turner Trees and Random Forests, Adele Cutler, Utah State University

More information

MSA220 - Statistical Learning for Big Data

MSA220 - Statistical Learning for Big Data MSA220 - Statistical Learning for Big Data Lecture 13 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Clustering Explorative analysis - finding groups

More information

Data Preprocessing. Slides by: Shree Jaswal

Data Preprocessing. Slides by: Shree Jaswal Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data

More information

Locally Weighted Learning for Control. Alexander Skoglund Machine Learning Course AASS, June 2005

Locally Weighted Learning for Control. Alexander Skoglund Machine Learning Course AASS, June 2005 Locally Weighted Learning for Control Alexander Skoglund Machine Learning Course AASS, June 2005 Outline Locally Weighted Learning, Christopher G. Atkeson et. al. in Artificial Intelligence Review, 11:11-73,1997

More information

Machine Learning / Jan 27, 2010

Machine Learning / Jan 27, 2010 Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,

More information

Kernel Methods & Support Vector Machines

Kernel Methods & Support Vector Machines & Support Vector Machines & Support Vector Machines Arvind Visvanathan CSCE 970 Pattern Recognition 1 & Support Vector Machines Question? Draw a single line to separate two classes? 2 & Support Vector

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

Object and Action Detection from a Single Example

Object and Action Detection from a Single Example Object and Action Detection from a Single Example Peyman Milanfar* EE Department University of California, Santa Cruz *Joint work with Hae Jong Seo AFOSR Program Review, June 4-5, 29 Take a look at this:

More information

Bagging for One-Class Learning

Bagging for One-Class Learning Bagging for One-Class Learning David Kamm December 13, 2008 1 Introduction Consider the following outlier detection problem: suppose you are given an unlabeled data set and make the assumptions that one

More information

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010 INFORMATICS SEMINAR SEPT. 27 & OCT. 4, 2010 Introduction to Semi-Supervised Learning Review 2 Overview Citation X. Zhu and A.B. Goldberg, Introduction to Semi- Supervised Learning, Morgan & Claypool Publishers,

More information

CS 229 Final Project Report Learning to Decode Cognitive States of Rat using Functional Magnetic Resonance Imaging Time Series

CS 229 Final Project Report Learning to Decode Cognitive States of Rat using Functional Magnetic Resonance Imaging Time Series CS 229 Final Project Report Learning to Decode Cognitive States of Rat using Functional Magnetic Resonance Imaging Time Series Jingyuan Chen //Department of Electrical Engineering, cjy2010@stanford.edu//

More information

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy BSB663 Image Processing Pinar Duygulu Slides are adapted from Selim Aksoy Image matching Image matching is a fundamental aspect of many problems in computer vision. Object or scene recognition Solving

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 13 UNSUPERVISED LEARNING If you have access to labeled training data, you know what to do. This is the supervised setting, in which you have a teacher telling

More information

Understanding Clustering Supervising the unsupervised

Understanding Clustering Supervising the unsupervised Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data

More information

CHAPTER 3 DIFFERENT DOMAINS OF WATERMARKING. domain. In spatial domain the watermark bits directly added to the pixels of the cover

CHAPTER 3 DIFFERENT DOMAINS OF WATERMARKING. domain. In spatial domain the watermark bits directly added to the pixels of the cover 38 CHAPTER 3 DIFFERENT DOMAINS OF WATERMARKING Digital image watermarking can be done in both spatial domain and transform domain. In spatial domain the watermark bits directly added to the pixels of the

More information

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs) Data Mining: Concepts and Techniques Chapter 9 Classification: Support Vector Machines 1 Support Vector Machines (SVMs) SVMs are a set of related supervised learning methods used for classification Based

More information

Support Vector Machines

Support Vector Machines Support Vector Machines RBF-networks Support Vector Machines Good Decision Boundary Optimization Problem Soft margin Hyperplane Non-linear Decision Boundary Kernel-Trick Approximation Accurancy Overtraining

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used. 1 4.12 Generalization In back-propagation learning, as many training examples as possible are typically used. It is hoped that the network so designed generalizes well. A network generalizes well when

More information

Face Recognition using Eigenfaces SMAI Course Project

Face Recognition using Eigenfaces SMAI Course Project Face Recognition using Eigenfaces SMAI Course Project Satarupa Guha IIIT Hyderabad 201307566 satarupa.guha@research.iiit.ac.in Ayushi Dalmia IIIT Hyderabad 201307565 ayushi.dalmia@research.iiit.ac.in Abstract

More information

Clustering CS 550: Machine Learning

Clustering CS 550: Machine Learning Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf

More information

Visible and Long-Wave Infrared Image Fusion Schemes for Situational. Awareness

Visible and Long-Wave Infrared Image Fusion Schemes for Situational. Awareness Visible and Long-Wave Infrared Image Fusion Schemes for Situational Awareness Multi-Dimensional Digital Signal Processing Literature Survey Nathaniel Walker The University of Texas at Austin nathaniel.walker@baesystems.com

More information

Semi-supervised learning and active learning

Semi-supervised learning and active learning Semi-supervised learning and active learning Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Combining classifiers Ensemble learning: a machine learning paradigm where multiple learners

More information

Some Applications of Graph Bandwidth to Constraint Satisfaction Problems

Some Applications of Graph Bandwidth to Constraint Satisfaction Problems Some Applications of Graph Bandwidth to Constraint Satisfaction Problems Ramin Zabih Computer Science Department Stanford University Stanford, California 94305 Abstract Bandwidth is a fundamental concept

More information

Leave-One-Out Support Vector Machines

Leave-One-Out Support Vector Machines Leave-One-Out Support Vector Machines Jason Weston Department of Computer Science Royal Holloway, University of London, Egham Hill, Egham, Surrey, TW20 OEX, UK. Abstract We present a new learning algorithm

More information

Color Local Texture Features Based Face Recognition

Color Local Texture Features Based Face Recognition Color Local Texture Features Based Face Recognition Priyanka V. Bankar Department of Electronics and Communication Engineering SKN Sinhgad College of Engineering, Korti, Pandharpur, Maharashtra, India

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Image Data: Classification via Neural Networks Instructor: Yizhou Sun yzsun@ccs.neu.edu November 19, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining

More information

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes

More information

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1 Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that

More information

Naïve Bayes for text classification

Naïve Bayes for text classification Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support

More information

Chapter 10. Conclusion Discussion

Chapter 10. Conclusion Discussion Chapter 10 Conclusion 10.1 Discussion Question 1: Usually a dynamic system has delays and feedback. Can OMEGA handle systems with infinite delays, and with elastic delays? OMEGA handles those systems with

More information

CS 664 Segmentation. Daniel Huttenlocher

CS 664 Segmentation. Daniel Huttenlocher CS 664 Segmentation Daniel Huttenlocher Grouping Perceptual Organization Structural relationships between tokens Parallelism, symmetry, alignment Similarity of token properties Often strong psychophysical

More information

Chapter 7. Conclusions and Future Work

Chapter 7. Conclusions and Future Work Chapter 7 Conclusions and Future Work In this dissertation, we have presented a new way of analyzing a basic building block in computer graphics rendering algorithms the computational interaction between

More information

Scan Scheduling Specification and Analysis

Scan Scheduling Specification and Analysis Scan Scheduling Specification and Analysis Bruno Dutertre System Design Laboratory SRI International Menlo Park, CA 94025 May 24, 2000 This work was partially funded by DARPA/AFRL under BAE System subcontract

More information

Transductive Learning: Motivation, Model, Algorithms

Transductive Learning: Motivation, Model, Algorithms Transductive Learning: Motivation, Model, Algorithms Olivier Bousquet Centre de Mathématiques Appliquées Ecole Polytechnique, FRANCE olivier.bousquet@m4x.org University of New Mexico, January 2002 Goal

More information

Random projection for non-gaussian mixture models

Random projection for non-gaussian mixture models Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,

More information

2. LITERATURE REVIEW

2. LITERATURE REVIEW 2. LITERATURE REVIEW CBIR has come long way before 1990 and very little papers have been published at that time, however the number of papers published since 1997 is increasing. There are many CBIR algorithms

More information

Geometric Computations for Simulation

Geometric Computations for Simulation 1 Geometric Computations for Simulation David E. Johnson I. INTRODUCTION A static virtual world would be boring and unlikely to draw in a user enough to create a sense of immersion. Simulation allows things

More information

Data mining with Support Vector Machine

Data mining with Support Vector Machine Data mining with Support Vector Machine Ms. Arti Patle IES, IPS Academy Indore (M.P.) artipatle@gmail.com Mr. Deepak Singh Chouhan IES, IPS Academy Indore (M.P.) deepak.schouhan@yahoo.com Abstract: Machine

More information

Chapter 15 Introduction to Linear Programming

Chapter 15 Introduction to Linear Programming Chapter 15 Introduction to Linear Programming An Introduction to Optimization Spring, 2015 Wei-Ta Chu 1 Brief History of Linear Programming The goal of linear programming is to determine the values of

More information

CHAPTER 5 PROPAGATION DELAY

CHAPTER 5 PROPAGATION DELAY 98 CHAPTER 5 PROPAGATION DELAY Underwater wireless sensor networks deployed of sensor nodes with sensing, forwarding and processing abilities that operate in underwater. In this environment brought challenges,

More information

Image Transformation Techniques Dr. Rajeev Srivastava Dept. of Computer Engineering, ITBHU, Varanasi

Image Transformation Techniques Dr. Rajeev Srivastava Dept. of Computer Engineering, ITBHU, Varanasi Image Transformation Techniques Dr. Rajeev Srivastava Dept. of Computer Engineering, ITBHU, Varanasi 1. Introduction The choice of a particular transform in a given application depends on the amount of

More information

Supervised Learning with Neural Networks. We now look at how an agent might learn to solve a general problem by seeing examples.

Supervised Learning with Neural Networks. We now look at how an agent might learn to solve a general problem by seeing examples. Supervised Learning with Neural Networks We now look at how an agent might learn to solve a general problem by seeing examples. Aims: to present an outline of supervised learning as part of AI; to introduce

More information

Topics in Machine Learning

Topics in Machine Learning Topics in Machine Learning Gilad Lerman School of Mathematics University of Minnesota Text/slides stolen from G. James, D. Witten, T. Hastie, R. Tibshirani and A. Ng Machine Learning - Motivation Arthur

More information

Defining a Better Vehicle Trajectory With GMM

Defining a Better Vehicle Trajectory With GMM Santa Clara University Department of Computer Engineering COEN 281 Data Mining Professor Ming- Hwa Wang, Ph.D Winter 2016 Defining a Better Vehicle Trajectory With GMM Christiane Gregory Abe Millan Contents

More information

DM6 Support Vector Machines

DM6 Support Vector Machines DM6 Support Vector Machines Outline Large margin linear classifier Linear separable Nonlinear separable Creating nonlinear classifiers: kernel trick Discussion on SVM Conclusion SVM: LARGE MARGIN LINEAR

More information

Search Engines. Information Retrieval in Practice

Search Engines. Information Retrieval in Practice Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Classification and Clustering Classification and clustering are classical pattern recognition / machine learning problems

More information

A Neural Network for Real-Time Signal Processing

A Neural Network for Real-Time Signal Processing 248 MalkofT A Neural Network for Real-Time Signal Processing Donald B. Malkoff General Electric / Advanced Technology Laboratories Moorestown Corporate Center Building 145-2, Route 38 Moorestown, NJ 08057

More information

Image Mosaicing with Motion Segmentation from Video

Image Mosaicing with Motion Segmentation from Video Image Mosaicing with Motion Segmentation from Video Augusto Román and Taly Gilat EE392J Digital Video Processing Winter 2002 Introduction: Many digital cameras these days include the capability to record

More information

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 1. Introduction Reddit is one of the most popular online social news websites with millions

More information

CPSC 340: Machine Learning and Data Mining. Kernel Trick Fall 2017

CPSC 340: Machine Learning and Data Mining. Kernel Trick Fall 2017 CPSC 340: Machine Learning and Data Mining Kernel Trick Fall 2017 Admin Assignment 3: Due Friday. Midterm: Can view your exam during instructor office hours or after class this week. Digression: the other

More information

Formal Model. Figure 1: The target concept T is a subset of the concept S = [0, 1]. The search agent needs to search S for a point in T.

Formal Model. Figure 1: The target concept T is a subset of the concept S = [0, 1]. The search agent needs to search S for a point in T. Although this paper analyzes shaping with respect to its benefits on search problems, the reader should recognize that shaping is often intimately related to reinforcement learning. The objective in reinforcement

More information

Chapter 9 Object Tracking an Overview

Chapter 9 Object Tracking an Overview Chapter 9 Object Tracking an Overview The output of the background subtraction algorithm, described in the previous chapter, is a classification (segmentation) of pixels into foreground pixels (those belonging

More information

An Approach for Reduction of Rain Streaks from a Single Image

An Approach for Reduction of Rain Streaks from a Single Image An Approach for Reduction of Rain Streaks from a Single Image Vijayakumar Majjagi 1, Netravati U M 2 1 4 th Semester, M. Tech, Digital Electronics, Department of Electronics and Communication G M Institute

More information

Subspace Clustering with Global Dimension Minimization And Application to Motion Segmentation

Subspace Clustering with Global Dimension Minimization And Application to Motion Segmentation Subspace Clustering with Global Dimension Minimization And Application to Motion Segmentation Bryan Poling University of Minnesota Joint work with Gilad Lerman University of Minnesota The Problem of Subspace

More information

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods

More information

Equation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation.

Equation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation. Equation to LaTeX Abhinav Rastogi, Sevy Harris {arastogi,sharris5}@stanford.edu I. Introduction Copying equations from a pdf file to a LaTeX document can be time consuming because there is no easy way

More information

FACE RECOGNITION USING SUPPORT VECTOR MACHINES

FACE RECOGNITION USING SUPPORT VECTOR MACHINES FACE RECOGNITION USING SUPPORT VECTOR MACHINES Ashwin Swaminathan ashwins@umd.edu ENEE633: Statistical and Neural Pattern Recognition Instructor : Prof. Rama Chellappa Project 2, Part (b) 1. INTRODUCTION

More information

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer

More information