Context-Aware Activity Modeling using Hierarchical Conditional Random Fields

Size: px
Start display at page:

Download "Context-Aware Activity Modeling using Hierarchical Conditional Random Fields"

Transcription

1 Context-Aware Ativity Modeling using Hierarhial Conditional Random Fields Yingying Zhu, Nandita M. Nayak, and Amit K. Roy-Chowdhury Abstrat In this paper, rather than modeling ativities in videos individually, we jointly model and reognize related ativities in a sene using both motion and ontext features. This is motivated from the observations that ativities related in spae and time rarely our independentlnd an serve as the ontext for eah other. We propose a two-layer onditional random field model, that represents the ation segments and ativities in a hierarhial manner. The model allows the integration of both motion and various ontext features at different levels and automatially learns the statistis that apture the patterns of the features. With weakly labeled training data, the learning problem is formulated as a max-margin problem and is solved bn iterative algorithm. Rather than generating ativity labels for individual ativities, our model simultaneously predits an optimum strutural label for the related ativities in the sene. We show promising results on the UCLA Offie Dataset and VIRAT Ground Dataset that demonstrate the benefit of hierarhial modeling of related ativities using both motion and ontext features. Index Terms Ativity loalization and reognition, Context-aware ativity model, Hierarhial Conditional Random Field. INTRODUCTION It has been demonstrated in [28] that ontext is signifiant in human visual systems. As there is no formal definition of ontext in omputer vision, we onsider all the deteted objets and motion regions as providing ontextual information about eah other. Ativities in natural senes rarely happen independently. The spatial layout of ativities and their sequential patterns provide useful ues for their understanding. Consider the ativities that happen in the same spatio-temporal region in Fig. : the existene of the nearby ar gives information about what the person (bounded by red irle) is doing, and the relative position of the person of interest and the ar says that ativities (b) and () are very different from ativity (a). Moreover, just fousing on the person, it may be hard to tell what the person is doing in (b) and () - opening vehile trunk or losing vehile trunk. If we knew that these ativities ourred around the same vehile along time, it would be immediately lear that in (b) the person is opening the vehile trunk and in () the person is losing the vehile trunk. This example shows the importane of spatial and temporal relationships for ativity reognition.. Overview of the Framework Many existing works on ativity reognition assume that, the temporal loations of the ativities are known [], [27]. This work was partially supported under ONR grant N and NSF grant IIS Y. Zhu is with the Department of Eletrial and Computer Engineering, University of California, Riverside. yzhu00@ur.edu. N. M. Nayak is with the Department of Computer Siene and Engineering, University of California, Riverside. nandita.nayak@ .ur.edu. A. K. Roy-Chowdhury is with the Department of Eletrial and Computer Engineering, University of California, Riverside. amitr@ee.ur.edu. (a) t=5se (a) t=6se Spatial Relationships (b) t=3se Temporal Relationships (b) t=9se () t=8se Fig.. An example that demonstrates the importane of ontext in ativity reognition. Motion region surrounding the person of interest is loated by red irle, interating vehile is loated by blue bounding box. In pratie, ativity-based analysis of videos should involve reasoning about motion regions, objets involved in these motion regions, and spatio-temporal Spatial Relationships relationships between the motion regions. We fous on the problem of deteting ativities of interest in ontinuous videos without prior information about the loations of the ativities. The main hallenge is to develop a representation of the ontinuous video that respets the spatio-temporal relationships of the () t=25se ativities. To ahieve this goal, we build upon existing well-known feature desriptors Temporal and Relationships spatio-temporal ontext representations that, when ombined together, provide a powerful framework to model ativities in ontinuous videos. An ativity an be onsidered as a union of ation segments or ations that are neighbors to eah other losely in spae and time. We provide an integrated framework that onduts multiple stages of video analysis, starting with motion loalization. The deteted motion regions are divided into ation segments, whih are onsidered as the elements of ativities, using a motion segmentation algorithm based on the nonlinear dynami model (NDM) in [5]. The goal then is to generate smoothed ativity labels, whih

2 2 are optimum in a global sense, for the ation segments; and thus obtaining semantially meaningful ativity regions and orresponding ativity labels. Towards this goal, we perform an initial labeling to group adjaent ation segments into semantially meaningful ativities using a baseline ativity detetor. Any existing ativity detetion method, suh as sliding window bagof-words (BOW) with a support vetor mahine (SVM) [25] an be used in this step. We all the labeled groups of ation segments as the andidate ativities. Candidate ativities that are related to eah other in spae and time are grouped together into ativity sets. For eah set, the underlying ativities are jointly modeled and reognized with the proposed two-layer Conditional Random Field model, whih models the hierarhial relationship between the ation segments and ativities. We refer to this proposed two-layer Hierarhial-CRF as Hierarhial-CRF in short for simpliity of expression. First, the ation layer is modeled as a linear-hain CRF model with the ativity labels with the ation segments as the random variables. Latent ativity variables, whih represent the deteted ativities, are then introdued in the hidden ativity layer. Doing so, ation-ativity onsistennd intra-ativity potentials, as the higher-order smoothness potentials, an be introdued into the model to smooth the preliminartivity labels in the ation layer. Finally, the ativity layer variables, whose underlying ativities are within the neighborhoods of eah other in spae and time, are onneted to utilize the spatial and temporal relationships between ativities. The resulting model is the ation-based two-layer Hierarhial- CRF model. Potentials in and between the ation and ativity layers are developed to represent the motion and ontext patterns of individual variables and groups of them in both ation and ativity levels, as well as ation-ativity onsisteny patterns between variables in the two layers. The ationativity potentials upon sets of ation nodes and their orresponding ativity nodes are introdued between ation and ativity layers. Suh potentials, as smoothness potentials, are used to enfore label onsisteny of ation segments within ativity regions while allowing for label inonsisteny for ertain irumstanes. This allows the retifiation of the preliminartivity labels of ation segments during the inferene of the Hierarhial-CRF model aording to the motion and ontext patterns in and between ations and ativities. Fig. 2 shows the framework of our approah. Given a video, we detet the motion regions using bakground subtration. Then, the segmentation algorithm aims to divide a ontinuous motion region into ation segments, whose motion pattern is onsistent and is different from its adjaent segments. These ation segments, as the nodes in the ation layer, are modeled as a linear-hain CRF and the proposed Hierarhial-CRF model is built aordingly as desribed above. The model parameters are learned automatially from weakly-labeled training data with the loation and labels of ativities of interest. Image-level features are deteted and Y Time Motion Regions X a y2 3 y h y h 3 a y4 a y n h y m Ation Segments Ation Layer Ativity Layer Fig. 2. The left graph shows the video representation of an ativity set with n motion segments and m andidate ativities. The right graph shows the graphial representation of our Hierarhial-CRF model. The white nodes are the ation variables and the gray nodes in the graph are the hidden ativity variables. Note that observations assoiated with the model variables are not shown for lear representation. organized to form the ontext for ativities. Common sense domain knowledge about the ativities of interest is used to guide the formulation of these ontext features within ativities from the weakly-labeled training data. We utilize a strutural model in a max-margin framework, iteratively inferring the hidden ativity variables and learning the parameters of different layers. For the testing, the ation segments, whih are merged together and assigned with ativity labels by the preliminartivity detetion method, are relabeled through inferene on the learned Hierarhial- CRF model..2 Main Contributions The main ontribution of this work is three-fold. (i) We ombine low-level motion segmentation with highlevel ativity model under one framework. With the deteted individual ation segments as the elements of ativities, we design a Hierarhial-CRF model that jointly models the related ativities in the sene. (ii) We propose a weakly supervised approah that utilizes ontext within and between ations and ativities that provide helpful ues for ativity reognition. The proposed model integrates motion and various ontext features within and between ations and ativities into a unified model. The proposed model an loalize and label ativities in ontinuous videos simultaneously, in the presene of multiple ators in the sene interating with eah other or ating independently. (iii) With a task-oriented disriminative approah, the model learning problem is formulated as a max-margin problem and is solved bn Expetation Maximization approah. 2 RELATED WORK Many existing works exploring ontext fous on interations among features, objets and ations [], [3], [4], [34], [39], environmental onditions suh as spatial loations of ertain ativities in the sene [23], and temporal relationships between ativities [24], [35]. Spatio-temporal onstraints aross ativities in a wide-area sene are rarely onsidered.

3 3 Motion segmentation and ation reognition are done simultaneously in [26]. The proposed algorithm models the temporal order of ations while ignoring the spatial relationships between ations. The work in [35] models a omplex ativity b variable-duration hidden Markov model on equal-length temporal segments. It deomposes a omplex ativity into sequential ations, whih are the ontext of eah other. However, it onsiders only the temporal relationships, while ignoring the spatial relationships between ations. AND-OR graph [2], [3], [32] is a powerful tool for ativity representation. It has been used for multi-sale analysis of human ativities in [2], α, β, γ proedures were defined for a bottom-up ost sensitive inferene of low-level ation detetion. However, the learning and inferene proesses of AND-OR graphs beome more omplex as the graph grows large and more and more ativities are learned. In [20], [2], a strutural model is proposed to learn both feature-level and ationlevel interations of group members. This method labels eah image with a group ativity label. How to smooth the labeling results along time is a problem and is not addressed in the paper. Also, these methods aim to reognize group ativities and are not suitable in our senario where ativities annot be onsidered as the parts of larger ativities. In [4], omplex ativities are represented as spatiotemporal graphs representing multi-sale video segments and their hierarhial relationships. Existing higher-order models [6], [7], [9], [42] propose the use of higher order potentials that enourage the smoothness of variables within liques of the graph. Higher-order graphial models have been frequently used in image segmentation, objet reognition, et. However, few works exist in the field of ativity reognition. We propose a novel method that expliitly models the ation and ativity level motion and ontext patterns with a Hierarhial-CRF model and use them in the inferene stage for reognition. The problem of simultaneous traking and ativity reognition was addressed in [6], [5]. In these works, traking and ation/ativity reognition are expeted to benefit eah other through an iterative proess that maximizes a deomposable potential funtion whih onsists of traking potentials and ation/ativity potentials. However, only olletive ativities are onsidered in [6], [5], in whih the individual persons of interest have a ommon goal in terms of ativity. This work address the general problem of ativity reognition, when individual persons in the sene may ondut heterogeneous ativities. The inferene method on a strutural model proposed in [20], [2] searhes through the graphial struture, in order to find the one that maximizes the potential funtion. Though this inferene method is omputationally less intensive than exhaustive searh, it is still time onsuming. As an alternative, greedy searh has been used for inferene in objet reognition [8]. This paper has major differenes with our previous work in [45]. In [45], we proposed a strutural SVM to expliitly model the durations, motion, intra-ativity ontext and the spatio-temporal relationships between the ativities. In this work, we develop a hierarhial model whih represents the related ativities in a hidden ativity layer, whih interats with a lower-level ation layer. Representing ativities as hidden ativity variables simplifies the inferene problem, bssoiating eah hidden ativity with a small set of neighboring ation segments, and enables effiient iterative learning and inferene algorithms. Furthermore, the modeling of more aspets of the ativities of interest adds additional feature funtions that measure both ation and ativity variables. Sine more information about the ativities to be reognized is modeled, the reognition auray is improved as demonstrated by the experiments. 3 MODEL FORMULATION FOR CONTEXT- AWARE ACTIVITY REPRESENTATION In this setion, we desribe how the higher-order onditional random field (CRF) modeling of ativities that integrates ativity durations, motion features and various ontext features within and aross ativities is built upon automatially deteted ation segments to jointly model related ativities in spae and time. 3. Video Preproessing Assuming there are M + lasses of ativities at the sene, inluding a bakground lass with label 0 and M lasses of interest with labels,...,m (the bakground lass an be omitted if all the ativity lasses in the sene are known). Our goal is to loate and label eah ativity of interest in videos. Given a ontinuous video, bakground substration [46] is used to loate the moving objets. Moving persons are identified, and loal trajetories of moving persons are generated (any existing traking methods like [33] an be used). Spatio-temporal Interest Point (STIP) features [22] are generated only for these motion regions. Thus, STIPs generated by noise, suh as slight tree shaking, amera jitter and motion of shadows, are avoided. Eah motion region is segmented into ation segments using the motion segmentation based on the method in [5] with STIP histograms as the model observation. The detailed motion segmentation algorithm is desribed in Setion Hierarhial-CRF Models for Ativities The problem of ativity reognition in ontinuous videos requires two main tasks: to detet motion regions and to label these deteted motion regions. The detetion and labeling problems an be solved simultaneousls proposed in [26] or separatels proposed in [44], [45]. For the latter, andidate ation or ativity regions are usually deteted before the labeling task. The problem of ativity reognition is then onverted to a problem of labeling, that is, to assign eah andidate region with an optimum ativity label. CRF is a disriminative model often used usually used for labeling problems of image and image objets. Essentially, CRF an be onsidered as a speial version of Markov Random Field (MRF) where the variable potentials are onditioned on the observed data. Let x be the model

4 4 observations and y be the label variables. The posterior distribution p(y x, ω) of the label variables over the CRF is a Gibbs distribution and is usually represented as p(y x,ω) = Z(x,ω) exp(ω T ϕ (x,y )), () C where ω is a model weight vetor, whih needs to be learned from training data. Z(x, ω) is a normalizing onstant alled the partition funtion. ϕ (x,y ) is a feature vetor derived from the observation x and the label vetor, y, in the lique. The potential funtion of the CRF model given the observations x and model weight vetor ω is defined as ψ(y x,ω) = ω T ϕ (x,y ). (2) For the development of the Hierarhial-CRF model, the ation layer is first modeled as a linear-hain CRF. Ativity layer variables whih are assoiated with deteted ativities are then introdued for the smoothing of the ation-layer variables. Finally, ativity-layer variables are onneted to represent the spatial and temporal relationships between ativities. The evolution of the proposed two-layer Hierarhial-CRF model from the one-layer CRF model is shown in Fig. 3. Details on the development of these models will be desribed in the following sub-setions. The various feature vetors used for the alulation of the potentials are desribed in Setion Ation-based Linear-hain CRF We first desribe the linear-hain CRF model in Fig. 3(a). We first define the following items: intra-ation potential ψ ν ( i x,ω), whih measures the ompatibility of the observed feature of i and its label i ; inter-ation potential ψ ε ( i,ya j x,ω), whih measures the onsisteny between two onneted ation segments i and j. Let V a be the set of verties, eah representing an ation segment as the element in the ation layer and E a denotes the set of onneted ation pairs. The potential funtion of the ationlayer linear-hain CRF is ψ( x,ω) = ψ ν ( i x,ω) + ψ ε ( i, j x,ω) (3) i V a i j E a = ων, T a ϕ ν (x a i V a i i, i ) + ωε, T a ϕ i j E a i,ya ε (x a j i,x a j, i, j), where ϕ ν (x a i,ya i ) is the intra-ation feature vetor that desribes ation segment i. ων, a is the weight vetor of i the intra-ation features for lass i. ϕ ε(x a i,xa j,ya i,ya j ) is the inter-ation feature, whih is derived from the labels i, j and intra-ation feature vetors x a i and x a j. ωa ε, is i,ya j the weight vetor of the inter-ation features for lass pair i,ya j Inorporating Higher Order Potentials Aording to experimental observations, ation segments in a andidate ativity region, whih are generated btivity detetion methods [44], tend to have the same ativity labels. However, onsistent labeling is not guaranteed due to inaurate detetions. Let an ation lique a denote the union of ation segments in a andidate ativity. The linear-hain CRF an be onverted to a higher-order CRF bdding a latent ativity variable y h, representing the label of, for eah ation lique a. All ation variables assoiated with the same ativity variable are onneted. Then, the assoiated higher-order potential ψ ( x,ω) is introdued to enourage ation segments in the lique a to take the same label, while still allowing some of them to have different labels without additional penalty. The resulting CRF model is shown in 3 (b). The potential funtion ψ for the higher-order CRF model is represented as ψ(,y h x,ω) = ων, T a ϕ ν (x a i V a i i, i ) (4) + ωε, T a ϕ i j {E a i,ya ε (x a j i,x a j, i, j) + ψ ( x,ω), } C ah where E a denotes the set of onneted ation pairs in the new model. C ah is the set of ation-ativity liques and eah ation-ativity lique in C ah orresponds to an ation lique a in the ation layer and its assoiated ativity in the ativity layer. Let L = 0,,,M be the ativity label set in the ation layer, from whih the ation variables may take values. The ativity variable y h takes values from an extended label set L h = L l f, where L is the set of variable values in the ation layer. When an ativity variable takes value l f, it allows its hild variables to take different labels in L, without additional penalty upon label inonsisteny. We define ϕ,l (,y h ) as the ation-ativity onsisteny feature of ativity, and ω ah to be the weight vetor of,l,y h the ation-ativity onsisteny feature for lass y h. Define ϕ, f (x a,y h ) as the intra-ativity feature for ativity, and to be the weight vetor of intra-ativity feature ω ah, f,y h for lass y h. The orresponding ation-ativity higher-order potential an be defined as ψ( x,ω) = maxω ah T y h,y h ϕ (x a,x h,,y h ) (5) = max[ω ah T y h,l,,y h ϕ,l (,y h ) + ω ah T, f,y h ϕ, f (x a,y h )], where ω ah T,l,y h ϕ,l (,y h ) measures the labeling onsisteny within the ativity. Intuitively, the higher-order potentials are onstruted suh that a latent variable tends to take a label from L if majority of its hild nodes take the same value, and take the label l f if its hild nodes take diversified values. ω ah T, f,y h ϕ, f (x a,y h ) is the intra-ativity potential that measures the ompatibility between the ativity label of lique and its ativity features Inorporating Inter-Ativity Potentials As stated before, it would be helpful to model the spatial and temporal relationships between ativities. For this reason, we onnet ativity nodes in the higher-order CRF model. The resulting CRF is shown in Fig. 3(). We define ϕ s (x h s,x h d,yh s,y h d ) as the inter-ativity spatial feature that

5 5 3 2 Y Time Ation Segments y 8 a 7 6 X y Y a 2 Time y h y 3 h y 2 h y 8 a Ation Segments y 5 a y 7 a y 6 a 3 2 X Ation Segments y h 6 y h 2 5 Y Time 8 y h 3 7 X (a) (b) () Symbols used in this figure. i label variable for ation segment i, i L, where L = {0,,...,M} and a denotes the ation-layer and i denotes the index of the ation segment. y h label variable for ativity, y h L h, where L h = {0,,...,M} l f and h denotes the hidden ativity layer and denotes the index of the hidden ativity. (d) (e) Fig. 3. Illustration of CRF models for ativity reognition. (a): Ation-based Linear-Chain CRF; (b): Ation-based higher-order CRF model (with latent ativity variables); (): Ation-based two-layer Hierarhial-CRF. Note that all the observations for the random variables are omitted for ompatness; (d): symbols in sub-figures (a, b, ); (e): graph representation of the model in [45] for omparison. One ation segment denotes a random variable in the ation layer, whose value is the ativity label for the ation segment. A olored irle denotes a random variable in the ativity layer, whose value is the label for its onneted lique. As shown in (a), in the ation layer, ation segments that belong to the same trajetorre modeled as a linear-hain CRF. Then, hidden ativity-level variables with ation-ativity edges (in light blue) are added for eah ation lique to form higher-order CRF as shown in (b). An ativitnd its assoiated ation nodes have a same olor. Finally, pair-wise ativity edges (in red) are added to form the proposed two-layer Hierarhial-CRF mdoel. enodes the spatial relationship between ativities s and d, and ω h to be the weight vetor of inter-ativity spatial s,y h s,y h d feature for lass pair (y h s,y h d ). Define ϕ t(x h s,x h d,yh s,y h d ) as the inter-ativity temporal feature that enodes the temporal relationship between ativities s and d, and ω h to be t,y h s,y h d the weight vetor of inter-ativity temporal feature for lass pair (y h s,y h d ). The pairwise ativity potential between lique s and d is defined as ψ(y h x,ω) = [ω h T s,y h sd E h s,y h ϕs (x h s,x h d,yh s,y h d ) d + ω h T t,y h s,y h ϕt (x h s,x h d,yh s,y h d )], (6) d where ω h T s,y h s,y h ϕs (x h s,x h d,yh s,y h d ) is the pairwise spatial d potential between ativities s and d that measures the ompatibility between the andidate labels of s and d and their spatial relationship. ω h T t,y h s,y h ϕt (x h s,x h d,yh s,y h d ) is the d pairwise temporal potential between ativities s and d that measures the ompatibility between the andidate labels of s and d and their temporal relationship. 3.3 Feature Desriptors We now define the onepts we use for the feature development. An ativity is a 3D region onsisting of one or multiple onseutive ation segments. An agent is the underlying moving person(s) or a trajetory. Motion region at frame n is the region surrounding the moving objets of interest in the n th frame of the ativity. Ativity region is the smallest retangle region that enapsulates the motion regions over all frames of the ativity. In general, same type of features for different lass or lass pair an be different. There are mainly three kinds of features in our model: ation-layer features, ation-ativity features and ativitylayer features, whih an be further divided into five types of features. We now desribe how to enode motion and ontext information into feature desriptors. Intra-ation Feature: ϕ ν (x a i,ya i ) enodes the motion information of the ation segment i that is extrated from low-level motion features suh as STIP features. Sine in the ation layer, we obtain ation segments by utilizing their disriminative motion patterns, we use only motion features for the development of ation-layer features. STIP histograms are generated for eah ation segment using bag-of-word method [25]. We train a kernel multi-svm upon ation segments to generate the normalized onfidene sores, s i, j, of lassifying the ation segment i as ativity lass j, where j {0,,...,M}, suh that M j=0 s i, j =. In general, any kind of lassifier and low-level motion features an be used here. Given an ation segment i,

6 6 ϕ ν (x a i,ya i ) = [s i,0 s i,m ] T is developed as the intra-ation feature desriptor of ation segment i. Inter-ation Feature: ϕ ε (x a i,xa j,ya i,ya j ) enodes the probabilities of oexistene of ation segments i and j aording to their features and ativity labels. ϕ ε (x a i,xa j,ya i,ya j ) = I(ya i )I(ya j ), where I(ya k ) is the Dira measure that equals if the true label of segment k is k and equals to 0 other wise, for k = i, j. Ation-Ativity Consisteny Feature: ϕ,l (,y h ) enodes the labeling information within lique as { ϕ,l (,y h y h = l f ) = i I( i =yh ) N y h L. where I( ) is the Dira measure and N is the number of ation segments in lique. Intra-ativity Feature: ϕ, f (x a,x,y h a,y h ) enodes the intra-ativity motion and ontext information of ativity. To apture the motion pattern of an ativity, we use the intra-ation features of ation segments whih belong to the ativity. Given an ativity, [max i ℵ s i,0,...,max i ℵ s i,m ] is developed as the intra-ativity motion feature desriptor, where ℵ is a list of ation segments in ativity. Intra-ativity ontext feature aptures the ontext information about the agents and relationships between the agents, as well as the the interating objets (e.g. the objet lasses, interations between agents and their surroundings). We define a set, G, of attributes that desribes suh ontext for ativities of interest, using ommon-sense knowledge about the ativities of interest (how to identify suh attributes automatially is another researh topi that we do not address in this paper). For a given ativity, whether the defined attributes are true or not are determined from image-level detetion results. The resulting feature desriptor is a normalized feature histogram. The attributes used and the development of intra-ativity ontext features are different for different tasks (please refer to Setion 5.3. for the details). Finally, the weighted motion and ontext features are used as the input to a multi-svm and the output onfidene sores are used to develop the intra-ativity feature as ϕ, f (x a,y h ) = [s,0,...,s,m ] T. Inter-ativity Spatial and Temporal Features ϕ s (x h s,x h d,yh s,y h d ) and ϕ t(x h s,x h d,yh s,y h d ) apture the spatial and temporal relationships between ativities s and d. Define the saled distane between ativities s and d at the n th frame of s as r s (s(n),d) = D(O s(n),o d ) R s (n) + R d, (7) where O s (n) and R s (n) denote the enter and radius of the motion region of ativity s at its n th frame and O d and R d denote the enter and radius of the ativity region of ativity d. D( ) denotes the Eulidean distane. Then, the spatial relationship of s and d at the n th frame is modeled by s sd (n) = bin(r s (s(n),d)) as in Fig. 4 (a). The normalized histogram s s,d = N f N f n= s sd(n) is the inter-ativity spatial feature of ativity s and d. Let TC be defined by the following temporal relationships: n th frame of s is before d, n th frame of s is during d and n th frame of s is after d. t sd (n) is the temporal relationship of s and d at the n th frame of s as shown in Fig. 4 (b). The normalized histogram t = N f N f n= t sd(n) is the inter-ativity temporal ontext feature of ativity s with respet to ativity d. (a) Fig. 4. (a) The image shows one example of inter-ativity spatial relationship. The red irle indiates the motion region of s at this frame while the purple retangle indiates the ativity region of d. Assume SC is defined by quantizing and grouping r s (n) into three bins: r s (n) 0.5 (s and d is at the same spatial position at the n th frame of s), 0.5 < r s (n) <.5 (s is near d at the n th frame of s) and r s (n).5 (s is far away from d at the n th frame of s). In the image, r s (n) >.5, so, s sd (n) = [0 0 ]. (b) The image shows one example of inter-ativity temporal relationship. The n th frame of s ours before d. So, t sd (n) = [ 0 0]. 4 MODEL LEARNING AND INFERENCE The parameters of the overall potential funtion ψ(y x, ω) for the two-layer hierarhial CRF inlude ωv a, ωε a, ω ah ω ah, f, ωh s and ωt. h We define the weight vetor as the onatenation of these parameters: (b),l, ω = [ω a v,ω a ε,ω ah,l,ωah, f,ωh s,ω h t]. (8) Thus, the potential funtion, ψ(y x, ω), an be onverted into a linear funtion with a single parameter ω as ψ( ) = maxω T Γ(x,,y h ), (9) y h where Γ(x,,y h ), alled the joint feature of ativity set x, an be easily obtained by onatenating various feature vetors in (4),(5) and (6). 4. Learning Model Parameters Suppose we have P ativity sets for learning. Let the training set be (X,Y a,y h ) = (x,y,a,y,h ),...,(x P,y P,a,y P,h ), where x i denotes the i th ativity set as well as the observed features of the set. y i,a is the label vetor in the ation layer and y i,h is the label vetor in the hidden ativity layer. While there are various ways of learning the model parameters, we hoose a task-oriented disriminative approah. We would like to train the model in suh a way that it inreases the average preision sores on a training data and thus tend to produe the orret ativity labels for eah ation segment. A natural way to learn the model parameter ω is to adopt the latent strutural SVM. The loss (x i,ŷ i,a ) of labeling

7 7 x i with ŷ i,a in the ation layer equals the number of ation segments that assoiate with inorret ativity labels (an ation segment is mislabeled if over half of the segment is mislabeled). From the onstrution of the higher-order potentials in setion 3.2.2, it is observed that, in order to ahieve the best labeling of the ation segments, the optimum latent ativity label of an ation lique must be the dominant ground truth label l of its hild nodes in the ation layer; or the free label l f if no dominant label exists for the ation lique. Thus the loss (x i,ŷ i,h ) of labeling the ativity layer of x i with ŷ i,h is (x i,ŷ i,h ) = I(y i,h {l,l i f }), (0) V h where I( ) is the indiator funtion whih equals if the inside equation is satisfied and 0 otherwise. (0) ounts the number of ativity labels in ŷ i,h that are neither a free label nor the dominant label of its hild nodes. Finally, the loss funtion of assigning x i with (ŷ i,a,ŷ i,h ) is defined as the summation of the two, that is (x i,ŷ i,a,ŷ i,h ) = (x i,ŷ i,a ) + (x i,ŷ i,h ). () Next, we define a onvex funtion F(ω) and a onave funtion J(ω) as F(ω) = 2 ωt ω (2) +C and P max i= (ŷ i,a,ŷ i,h ) J(ω) = C [ ( ω T Γ x i,ŷ i,a,ŷ i,h) ( + x i,ŷ i,a,ŷ i,h)], P i= ( max ωt Γ x i,y i,a,y i,h). y i,h The model learning problem is given as: ω = argmin[f(ω) + J(ω)] (3) ω Although the objetive funtion to be minimized in (3) is not onvex, it is a ombination of a onvex funtion and a onave funtion [29]. Suh kind of problems an be solved using the Conave-Convex Proedure (CCCP) [40], [4]. We desribe an algorithm similar to the CCCP in [40] that iteratively infers the latent variables y i,h for i =,...,P and optimizes the weight vetor ω. The inferene and optimization proedures ontinue until onvergene or a predefined maximum number of iterations is reahed. The limitation of all learning algorithms that involve gradient optimization is that there suseptible to loal extrema and saddle points [8]. Thus, the performane of the proposed latent strutural model is sensitive to initialization. There have been many works dealing with the problem of learning the parameters of hierarhial models [0], [36]. We use a oarse to fine sheme that separately initializes the model parameters using pieewise training, and then refines the model parameters jointly in a globally optimum manner. Speifially, the separately learned model parameters are used as the initialization values for the proposed learning algorithm. Given the weakly labeled training data with ativity labels for eah ation segment, the dominant label l for eah ation lique an be determined. We initialize the latent ativity variable of with the dominant label l of its ation lique a, and with l f if there is no dominant label for a. In the E step, we infer latent variables using the previously learned weight vetor ω t (or the initiallssigned weight vetor for the first iteration) leading to ( y i,h t+ = argmaxω t T Γ x i,y i,a,y i,h). (4) y i,h Then, in the M step, with the inferred latent variable y i,h t+, we solve a fully visible strutural SVM (SSVM). Let us define the risk funtion at iteration t +, Λ(ω), as P { ( Λ t+ (ω) = C max x i,ŷ i,a,ŷ i,h) (5) i= (ŷ i,a,ŷ i,h ) + ω T [ Γ ( x i,ŷ i,a,ŷ i,h) Γ (x i,y i,a,y i,h t+)] }. Thus, the optimization problem in (3) is onverted to a fully visible SSVM as { } ωt+ = argmin ω 2 ωt ω + Λ t+ (ω). (6) The problem in (6) an be onverted to an unonstrained onvex optimization problem [44] and solved by the modified bundle method in [38]. The algorithm iteratively searhes for the inreasingly tight quadrati upper and lower utting planes of the objetive funtion until the gap between the two bounds reahes a predefined threshold. The algorithm is effetive beause of its very high onvergene rate [37]. The visible SSVM learning algorithm speified for our problem is summarized in Algorithm. Algorithm Learning the model parameter in (6) through bundle method Input: S = ((a T (),y T ()),...,(a T (P),y T (P))),ωt,y i,h t+,c,ε Output: Optimum model parameter ωt+ ) initialize ωt+ 0 with ω t, G t+ (utting plane set) Ø. 2) for k = 0 to do 3) for i =,...,P do find the most violated label vetor for eah training instane, if any, using ωt+ k (the value of ω t+ at the k th iteration); 4) end for 5) find the utting plane g ω k of Λ(ω) at ω k t+ t+ : g ω k = ω T ω Λ t+ (ω k t+ t+ ) + b ωt+ k, where b ω k = Λ t+ (ω k t+ t+ ) ωk t+t ω Λ(ωt+ k ). 6) G t+ G t+ g ω k (ω); t+ 7) update ω t+ : ωt+ k+ ω F ω k (ω), t+ where F ω k t+ (ω) = 2 ωt ω + max(0,max j=,...,k g ω j t+ 8) gap k+ = min k k F ω k + (ω k + t+ ) F ω k (ωt+ k+ ); t+ t+ 9) if gap k+ ε, then return ωt+ = ωk+ t+ ; 0) end for 4.2 Inferene (ω)). Suppose the model parameter vetor ω is given. We now desribe how to identify the optimum label vetor for a test instane x that maximizes (9). The inferene problem is generally NP hard for multi-lass problems, thus MAP

8 8 inferene algorithms, suh as loopy belief propagation [29], are slow to onverge. We propose an approximation method that alternatively optimizes the hidden variable y h and the label vetor. Suh an algorithm is guaranteed to inrease the objetive at every iteration [29]. Let us define the ativity layer potential funtion as ψ h (y h ) = C a ψ( x,ω) + ψ(y h x,ω). (7) For eah iteration, with urrent predited label vetor fixed, the inferene sub-problem is to find the y h that maximizes ψ h (y h ). An effiient greedy searh method is used to find the optimum y h as desribed in Algorithm 2. In order to simplify the inferene, we fore the edge weights between non-adjaent ations to be zeros. With the inferred hidden variable y h, the model is redued to a one-layer disriminative CRF. The inferene sub-problem of finding the optimum an now be solved by omputing the exat mixed integer solution. We initialize the proess by holding the hidden variable fixed using the values obtained from automati ativity detetion. The proess ontinues until onvergene or a predefined maximum number of iterations is reahed. Algorithm 2 Greedy Searh Algorithm for the sub-problem of finding optimum hidden variable y h Input: Output: Testing instane with ation layer labels Hidden variable labels y h ) initialize (V h,y h ) {Ø,Ø} and ψ h = 0. 2) repeat ψ h (y h ) V h = ψ(y h y h ) ψ(y h ); y h opt = argmax V h ψ h (y h ); (V h,y h ) (V h,y h ) (,y h opt ); 3) end if all ativities are labeled Analysis of Computational Complexity We now disuss the omputational omplexity of inferene for a partiular ativity set onsists of n ation segments and m ativities. Assuming there are M ativity lasses in the problem. For the graphial model in [45], the time omplexity of the inferene as disussed in the paper is O(d max n 2 M), where d max is the maximum number of ation segments one ativity may have. The inferene on both the higher-order CRF and hierarhial-crf is arried out layerby-layer, and so the overall time omplexity is linear in the number of layers used. Speifially, we use two-layer CRFs with an ation layer and an ativity layer. For the higher-order CRF model, inferene on the ativity layer takes O(mM) omputation to obtain the ativity labels for eah andidate ativity. With the inferred ativity labels, inferene on the ation layer takes O(nM 2 ), sine the model is redued to a hain-crf. For the hierarhial-crf, the inrease of omputational omplexity over the higher-order CRF lies in the inferene on the ativity layer, beause the ativities are onneted with eah other in this model. Using the proposed greedy searh algorithm, the time omplexity for inferene on the ativity layer is O(m 2 M). Thus, the overall omplexity of inferene is O[T ((mm)+o(nm 2 ))] for higher-order CRF and O[T ((m 2 M) + O(nM 2 ))] for hierarhial-crf, where T is the number of iterations. Furthermore, the number of ation segments n is usually several times of the number of ativities, that is n = αm, where α is a small positive value larger that one. d max and T are small positive value larger than one. Assuming n, m and M are in the same order, whih is a reasonable assumption for our ase, the asymptoti omputational omplexity of the model in [45] and the ompared higher-order CRF and hierarhial-crf models is of the same order. 5 EXPERIMENTAL RESULTS The goal of our framework is to loate and reognize ativities of interest in ontinuous videos using both motion and ontext information about the ativities; therefore, datasets with segmented video lips or independent ativities like Weizmann [], KTH [3], UT-Interation Dataset [30] and Colletive Ativity Dataset [7] do not fit our evaluation goal. To assess the effetiveness of our framework in ativity modeling and reognition, we perform experiments on two hallenging datasets ontaining long duration videos: the UCLA offie Dataset [32] and VIRAT Ground Dataset [9]. 5. Motion Segmentation and Ativity Loalization We first develop an automati motion segmentation algorithm by deteting boundaries where the statistis of motion features hange dramatially, and thus obtain the ation segments. Let two NDMs be denoted as M and M 2, and d s be the dimension of the hidden states. The distane between the models an be measured by the normalized geodesi distane dist(m,m 2 ) = 4 d s π 2 d s i= θ i 2, where θ i is the prinipal subspae angle (please refer to [5] for details on the distane omputation). A sliding window of size T s, where T s is the number of temporal bins in the window, is applied to eah deteted motion region along time. A NDM M(t) is built for the time window entered at the t th temporal bin. Sine an ation an be modeled as one dynami model, the model distanes between subsequenes from the same ation should be small, ompared to those of subsequenes from a different ation. Suppose an ativity starts from temporal bin k; the average model distane between temporal bin j > k and k is defined as the weighted average distane between model j and neighboring models of k as T d DE k ( j) = i=0 γ i dist(m(k + i),m( j)), (8) where T d is the number of neighboring bins used, and γ i is the smoothing weight for model k + i that dereases along time. When the average model distane grows above a predefined threshold d th, an ation boundary is deteted. Ation segments along traks are thus obtained. A multi-lass SVM is trained upon the intra-ativity features (as desribed in Setion 3.3) of ativities of d- ifferent lasses. After obtaining the ation segments, we

9 9 use the sliding window method with the trained multilass SVM to group adjaent ation segments into andidate ativities. To speed up, we only work on andidate ativities with onfidene sores larger than a predefined threshold, indiating there likely to be of ativity lasses of interest. 5.2 UCLA Dataset The UCLA Offie Dataset [32] onsists of indoor and outdoor videos of single ativities and person-person interations. Here, we perform experiments on the videos of offie sene ontaining about 35 minutes of ativities in an offie room that aptured with a single fixed amera. We identify 0 frequent ativities as the ativities of interest: - enter room, 2 - exit room, 3 - sit down, 4 - stand up, 5 - work on laptop, 6 - work on paper, 7 - throw trash, 8 - pour drink, 9 - pik phone, 0 - plae phone down. Eah ativity ours 9 to 26 times in the dataset. Sine the dataset ontains only single person ativities, it is natural to model ativities in one sequene together. The dataset is divided into 8 sets, eah set ontains 2 sequenes of ativities and eah sequene ontains 2 to 9 ativities of interest, as well as varying number of bakground ativities. We use leaveone-set-out ross validation for the evaluation: use 7 sets for training and set for testing Preproessing Intra-ativity ontext feature is based on interations between the agent and the surroundings. In the offie dataset, there are 7 lasses of objets that are frequently involved in the ativities of interest: laptop, garbage an, papers, phone, offee maker and up. Fig. 5 shows the deteted objets of interest in the offie room. Sine the UCLA Dataset onsists (a) Fig. 5. Deteted objets of interest in the UCLA offie sene. of single person ativities, the intra-ativitttributes onsidered inlude agent-objet interations and their relative loations. We identify (N G = 0) subsets of attributes for the development of intra-ativity ontext features in the experiment as shown in Fig. 6. For a given ativity, the above attributes are determined from image-level detetion results. The loations of objets are automatially traked. Similar to [32], if enough skin olor is deteted within the areas of laptop, paper and phone,the orresponding Attribute Subset G G 2 G 3 G 4 G 5 G 6 G 7 G 8 G 9 G 0 Assoiated Attributes the agent is touhing / not touhing laptop, paper 2, phone 3. the agent is oluding / not oluding the garbage an 4, offee maker 5. the agent is near / far away from the garbage an 6, offee maker 7, door 8. the agent disappears / not disappears at the door. the agent appears / not appears at the door. Fig. 6. Subsets of ontext attributes used for the development of intra-ativity ontext features for UCLA Dataset (the supersripts indiates the orrespondene between the subsets and the objets). touh laptop touh paper olude garbage an touh phone (a) Fig. 7. Examples of agent-objet interations deteted from image. attributes are onsidered as true. Fig. 7 shows examples of deteted agent-objet interations. Whether the agent is near or far away from an objet is determined by the distane between the two based on normal distributions of the distanes of the two senarios. Probabilities indiating how likely the agent is near or far away from an objet are thus obtained. For frame n of an ativity, we obtain g i (n) = I(G i (n)), where I( ) is the indiator funtion. g i (n) is then normalized so that its elements sum to. Related andidate ativities are onneted. Whether two ativities are related an be naturally determined by their temporal distanes. One way to deide if the relationships between two andidate ativities should be modeled is to see if there in the α-neighborhood of eah other in time. Two ativities are said to be in the α-neighborhood of eah other if there are less than α other ativities ourring between the two Experimental Results Although UCLA Dataset has been used in [32], the reognition auray for the offie dataset has not been provided in the paper. We ompare the performane of the popular BOW+SVM lassifier and our model. The experiment results in preision and reall as shown in Fig. 8. In order to show the affets of inorporating different kinds of motion and ontext features, we also show results of using the ation-based linear-hain CRF approah and the ation-based higher-order CRF approah (Fig. 3 (a) and 3 (b)). It an be seen that the use of intra-ativity ontext inreases the reognition auray of ativities with obvious ontext patterns. For example, enter room is haraterized by the ontext that the agent appears at

10 0 the door. The inreased reognition auray of enter room by using intra-ativity ontext features indiates that our model suessfully aptures this harateristis. From the performane of higher-order CRF approah and Hierarhial-CRF approah, we an see that for ativities with strong spatio-temporal patterns, suh as pik phone and plae phone down, modeling the inter-ativity spatiotemporal relationships inreases the reognition auray signifiantly. Next, we hange the value of α to see how it BOW+SVM Linear hain CRF Higher order CRF HCRF α=2 (a) BOW+SVM Linear hain CRF Higher order CRF HCRF α=2 (b) Fig. 8. Preision (a) and reall (b) for the ten ativities in UCLA Offie Dataset. The ativities are defined in Setion 5.2. HCRF is the short of Hierarhial-CRF. influenes the reognition auray of the Hierarhial-CRF approah. Fig. 9 ompares the overall auray of different methods and the Hierarhial-CRF approah with different α values. From the results, we an see that Hierarhial- Method Overall Average per-lass BOW+SVM Linear-hain CRF Higher-order CRF HCRF (α = ) HCRF (α = 2) HCRF (fully onneted) Fig. 9. Overall and average per-lass auray for different methods on UCLA Offie Dataset. The BOW+SVM method is tested on video lips, while other results are in the framework of our proposed ation-based CRF models upon automatially deteted ation segments. HCRF is the short of Hierarhial-CRF. CRF approah with α = 2 outperforms other models. This is expeted. When α is too small, the spatio-temporal relationships of related ativities are not fully utilized, while Hierarhial-CRF with fully onneted ativity layer models the spatio-temporal relationships of unrelated ativities. For instane, in the UCLA offie Dataset, one typial temporal pattern of ativities is a person sits down to work on the laptop, then, the same person stands up to do other things, and then sits down to work on the laptop. All these ativities are onduted sequentially. Thus, Hierarhial- CRF model with fully onneted ativity layer aptures the false temporal pattern of stand up followed by work on the laptop. The optimum value of α an be obtained using ross validation on the training data. 5.3 VIRAT Ground Dataset The VIRAT Ground Dataset is a state-of-the-art ativity dataset with many hallenging harateristis, suh as wide variation in the ativities and lutter in the sene. The dataset onsists of surveillane videos of realisti senes with different sales and resolution, eah lasting 2 to 5 minutes and ontaining upto 30 events. The ativities defined in Release inlude - person loading an objet to a vehile; 2 - person unloading an objet from a vehile; 3 - person opening a vehile trunk; 4 - person losing a vehile trunk; 5 - person getting into a vehile; 6 - person getting out of a vehile. We work on the all the senes in Release exept sene 0002 and use half of the data for training and the rest for testing. Five more ativities are defined in VIRAT Release 2 as: 7 - person gesturing; 8 - person arrying an objet; 9 - person running; 0 - person entering a faility; - person exiting a faility. We work on the all the senes in Release 2 exept sene 0002 and 002, and use two-third of the data for training and the rest for testing Preproessing Motion regions that do not involve people are exluded from the experiments sine we are only interested in person ativities and person-vehile interations. For the development of STIP histograms, nearest neighbor softweighting sheme [25] is used. Sine we work on the VIRAT Dataset with individual person ativities and person-objet interations, we use the following N G = 7 subsets of attributes for the development of intra-ativity ontext features in the experiments as shown in Fig. 0. Persons and vehiles are deteted based on the partbased objet detetion method in [9]. Opening/losing entrane/exit doors of failities, boxes and bags are deteted using method in [6] with binary linear-svm as the lassifier. Using these high-level image features, we follow the desription in Setion 5.2. to develop the feature desriptors for eah ativity set. The first three sets of attributes in Fig. 0 are used for the experiments on Release, and all are used for the experiments on Release 2. Fig. shows examples of g i (n) defined as in Setion 5.2. for different ativities in VIRAT. Sine, in VIRAT, ativities are naturally related to eah other, the ativity layer nodes are fully onneted to utilize the spatio-temporal relationships of ativities ourring in the same loal spae-time volume. 5.4 Reognition Results on VIRAT Release Fig. 2 ompares the preision and reall for the six ativities defined in VIRAT Release using BOW+SVM method and our approah with different kinds of features. The results show, as expeted, the reognition auray

11 Subset G G 2 G 3 G 4 G 5 G 6 G 7 Assoiated Attributes moving objet is a person; moving objet is a vehile trunk; moving objet is of other kind. the agent is at the body of the interating vehile; the agent is at the rear/head of the interating vehile; the agent is far away from the vehiles. the agent disappears at the body of the interating vehile; the agent appears at the body of the interating vehile; none of the two. the agent disappears at the entrane of a faility; the agent appears at the exit of a faility; none of the two. veloity of the agent (in pixel) is larger than a predefined threshold; veloity of objet of interest is smaller than a predefine threshold. the ativity ours at parking areas; the ativity ours at other areas. an objet (e.g. bag/box) is deteted on the agent; no objet is deteted on the agent. Fig. 0. Subsets of ontext attributes used for the development of intra-ativity ontext features. Ativity Example Image g (n) person loading [ ] person unloading [ 0 0] opening trunk losing trunk [ ] [ ] g 2 (n) [0 0] [0 0] [0 0] [0 0] g 5 (n) [0 ] [0 ] [0 ] [0 ] g 6 (n) [ 0] [ 0] [ 0] [ 0] g 7 (n) [ 0] [0 ] [0 ] [ 0] Ativity getting into vehile getting out of vehile gesturing arrying objet inter-ativity ontext patterns of ativities. Thus, ativities with strong spatio-temporal relationships with eah other are better reognized by the Hierarhial-CRF approah. For instane, the higher-order CRF approah often onfuses open a vehile trunk and lose a vehile trunk with eah other. However, if the two ativities happen losely in time in the same plae, the first ativity in time is probably open a vehile trunk. This kind of ontextual information within and aross ativity lasses are aptured by the Hierarhial- CRF approah and used to improve the reognition performane. Fig. 3 shows examples that demonstrate the signifiane of ontext in ativity reognition BOW+SVM Linear hain CRF Higher order CRF HCRF (a) BOW+SVM Linear hain CRF Higher order CRF HCRF (b) Fig. 2. Preision (a) and reall (b) for the six ativities defined in VIRAT Release. getting out of vehile opening trunk getting into vehile Example Image g (n) [ 0 0] [ 0 0] [ 0 0] [ ] g 2 (n) [ 0 0] [ 0 0] [0 0 ] [0 0 ] g 5 (n) [0 ] [0 ] [0 ] [0 ] g 6 (n) [ 0] [ 0] [ 0] [ 0] loading an objet unloading an objet getting into vehile g 7 (n) [0 ] [0 ] [0 ] [ 0] Fig.. Examples of deteted intra-ativity ontext features. The example images are shown with deteted high-level image features. Objet in red bounding box is a moving person; objet in blue bounding box is a stati vehile; objet in orange bounding box is a moving objet of other kind; objet in blak bounding box is a bag/box on the agent. getting out of vehile losing trunk getting into vehile Fig. 3. Example ativities (defined in VIRAT Release ) Figure 2. Examples showorretly the effetreognized of ontext features btion-based in reognizing linear-hain ativitiescrf that (top), were inorretly reognized byinorretly the baseline by(ndm+svm) linear-hainlassifier CRF but (related orreted example using results higherorder CRF with intra-ativity ontext (middle), and inorretly for inreases by enoding the various ontext features. For Figure 6 in Setion 5.4). instane, the higher-order CRF approah enodes intraativity reognized by higher-order CRF, but retified using ation- ontext patterns of ativities of interest. Thus, based hierarhial CRF with inter-ativity ontext (bottom). ativities with strong intra-ativity ontext pattern, suh as person getting into vehile, are better reognized by the higher-order CRF approah than by the linea-hain CRF approah, whih does not model intra-ativity ontext of ativities. The Hierarhial-CRF approah further enodes We also show the results on VIRAT Release for different methods using overall and average auray in Fig. 4. We have ompared our results with the popular BOW+SVM approah, the more reently proposed String-

12 2 of-feature-graphs approah [2], [43] and strutural model in [45]. Method average auray BOW+SVM [25] 45.8 SFG [44] 57.6 Strutural Model [45] 62.9 Linear-hain CRF 42.6 Higher-order CRF 60.4 Hierarhial-CRF 66.2 Fig. 4. Average auray for the six ativities defined in VIRAT Release. Note that SVM+BOW works on video lips; while other methods work on ontinuous videos. Note that BOW+SVM works on video lip while others work on ontinuous video. The Hierarhial-CRF approah outperforms the other methods. The results are expeted sine the intra-ativity and inter-ativity ontext within and between ation and ativities gives the model additional information about the ativities of interest beyond the motion information enoded in low-level features. SFG approah models the spatial and temporal relationships between the low-level features and thus takes into aount the loal struture of the sene; However, it does not onsider the relationships between various ativities and thus our method outperforms the SFGs. Strutural model in [45] models the intra and inter ontext within and between ativities, however, it does not model the ation layer and the interations between ation and ativities. 5.5 Reognition Results on VIRAT Release 2 VIRAT Release 2 defines additional ativities of interest. We work on VIRAT Release 2 to further evaluate the effetiveness of the proposed approah. We follow the method defined above to get the reognition results on this dataset. Fig. 5 ompares the preision and reall for the eleven ativities defined in VIRAT Release 2 for BOW+SVM method, the strutural model in [45], and our method. We see that by modeling the relationships between ativities, those with strong ontext patterns, suh as person losing a vehile trunk (4) and person running (9), ahieve larger performane gain ompared to ativities with weak ontext patterns suh as person gesturing (7). Fig. 6 shows example results on ativities in Release 2. Fig. 7 ompares the reognition auray using reall for different methods. We an see that the performane of our Hierarhial-CRF approah is omparable to the reently proposed method in []. In [], a SPN on BOW is learned to explore the ontext among motion features. However, [] works on video lips, eah ontaining an ativity of interest with additional 0 seonds ourring randomly before or after the target ativity instane, while we work on ontinuous video. 6 CONCLUSION In this paper, we design a framework for modeling and detetion of ativities in ontinuous videos. The proposed BOW+SVM Linear hain CRF Higher order CRF HCRF (a) BOW+SVM Linear hain CRF Higher order CRF HCRF (b) Fig. 5. Preision (a) and reall (b) for the eleven ativities defined in VIRAT Release 2. opening trunk getting out of vehile entering a faility exiting a faility person running arrying an objet Fig. 6. Examples of reognition results (from VIRAT Release 2). For eah two rows, examples in the bottom row show the effet of ontext features in orretly reognizing ativities that were inorretly reognized by the linear-hain CRF approah, while other examples of the same ativities orretly reognized by the linear-hain CRF are shown in the top row. framework jointly models a variable number of ativities in ontinuous videos, with ation segments as the basi motion elements. The model expliitly learns the ativity durations and motion patterns for eah ativity lass as well as the ontext patterns within and aross ation and ativities of different lasses from training ativity sets. It has been demonstrated that joint modeling of ativities by enapsulating objet interations and spatial and temporal

Discrete sequential models and CRFs. 1 Case Study: Supervised Part-of-Speech Tagging

Discrete sequential models and CRFs. 1 Case Study: Supervised Part-of-Speech Tagging 0-708: Probabilisti Graphial Models 0-708, Spring 204 Disrete sequential models and CRFs Leturer: Eri P. Xing Sribes: Pankesh Bamotra, Xuanhong Li Case Study: Supervised Part-of-Speeh Tagging The supervised

More information

Learning Convention Propagation in BeerAdvocate Reviews from a etwork Perspective. Abstract

Learning Convention Propagation in BeerAdvocate Reviews from a etwork Perspective. Abstract CS 9 Projet Final Report: Learning Convention Propagation in BeerAdvoate Reviews from a etwork Perspetive Abstrat We look at the way onventions propagate between reviews on the BeerAdvoate dataset, and

More information

Boosted Random Forest

Boosted Random Forest Boosted Random Forest Yohei Mishina, Masamitsu suhiya and Hironobu Fujiyoshi Department of Computer Siene, Chubu University, 1200 Matsumoto-ho, Kasugai, Aihi, Japan {mishi, mtdoll}@vision.s.hubu.a.jp,

More information

A Novel Validity Index for Determination of the Optimal Number of Clusters

A Novel Validity Index for Determination of the Optimal Number of Clusters IEICE TRANS. INF. & SYST., VOL.E84 D, NO.2 FEBRUARY 2001 281 LETTER A Novel Validity Index for Determination of the Optimal Number of Clusters Do-Jong KIM, Yong-Woon PARK, and Dong-Jo PARK, Nonmembers

More information

The Minimum Redundancy Maximum Relevance Approach to Building Sparse Support Vector Machines

The Minimum Redundancy Maximum Relevance Approach to Building Sparse Support Vector Machines The Minimum Redundany Maximum Relevane Approah to Building Sparse Support Vetor Mahines Xiaoxing Yang, Ke Tang, and Xin Yao, Nature Inspired Computation and Appliations Laboratory (NICAL), Shool of Computer

More information

Improved Vehicle Classification in Long Traffic Video by Cooperating Tracker and Classifier Modules

Improved Vehicle Classification in Long Traffic Video by Cooperating Tracker and Classifier Modules Improved Vehile Classifiation in Long Traffi Video by Cooperating Traker and Classifier Modules Brendan Morris and Mohan Trivedi University of California, San Diego San Diego, CA 92093 {b1morris, trivedi}@usd.edu

More information

Detection and Recognition of Non-Occluded Objects using Signature Map

Detection and Recognition of Non-Occluded Objects using Signature Map 6th WSEAS International Conferene on CIRCUITS, SYSTEMS, ELECTRONICS,CONTROL & SIGNAL PROCESSING, Cairo, Egypt, De 9-31, 007 65 Detetion and Reognition of Non-Oluded Objets using Signature Map Sangbum Park,

More information

Using Augmented Measurements to Improve the Convergence of ICP

Using Augmented Measurements to Improve the Convergence of ICP Using Augmented Measurements to Improve the onvergene of IP Jaopo Serafin, Giorgio Grisetti Dept. of omputer, ontrol and Management Engineering, Sapienza University of Rome, Via Ariosto 25, I-0085, Rome,

More information

Multi-Piece Mold Design Based on Linear Mixed-Integer Program Toward Guaranteed Optimality

Multi-Piece Mold Design Based on Linear Mixed-Integer Program Toward Guaranteed Optimality INTERNATIONAL CONFERENCE ON MANUFACTURING AUTOMATION (ICMA200) Multi-Piee Mold Design Based on Linear Mixed-Integer Program Toward Guaranteed Optimality Stephen Stoyan, Yong Chen* Epstein Department of

More information

NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION. Ken Sauer and Charles A. Bouman

NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION. Ken Sauer and Charles A. Bouman NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION Ken Sauer and Charles A. Bouman Department of Eletrial Engineering, University of Notre Dame Notre Dame, IN 46556, (219) 631-6999 Shool of

More information

Extracting Partition Statistics from Semistructured Data

Extracting Partition Statistics from Semistructured Data Extrating Partition Statistis from Semistrutured Data John N. Wilson Rihard Gourlay Robert Japp Mathias Neumüller Department of Computer and Information Sienes University of Strathlyde, Glasgow, UK {jnw,rsg,rpj,mathias}@is.strath.a.uk

More information

Video Data and Sonar Data: Real World Data Fusion Example

Video Data and Sonar Data: Real World Data Fusion Example 14th International Conferene on Information Fusion Chiago, Illinois, USA, July 5-8, 2011 Video Data and Sonar Data: Real World Data Fusion Example David W. Krout Applied Physis Lab dkrout@apl.washington.edu

More information

Exploiting Enriched Contextual Information for Mobile App Classification

Exploiting Enriched Contextual Information for Mobile App Classification Exploiting Enrihed Contextual Information for Mobile App Classifiation Hengshu Zhu 1 Huanhuan Cao 2 Enhong Chen 1 Hui Xiong 3 Jilei Tian 2 1 University of Siene and Tehnology of China 2 Nokia Researh Center

More information

KERNEL SPARSE REPRESENTATION WITH LOCAL PATTERNS FOR FACE RECOGNITION

KERNEL SPARSE REPRESENTATION WITH LOCAL PATTERNS FOR FACE RECOGNITION KERNEL SPARSE REPRESENTATION WITH LOCAL PATTERNS FOR FACE RECOGNITION Cuiui Kang 1, Shengai Liao, Shiming Xiang 1, Chunhong Pan 1 1 National Laboratory of Pattern Reognition, Institute of Automation, Chinese

More information

Semi-Supervised Affinity Propagation with Instance-Level Constraints

Semi-Supervised Affinity Propagation with Instance-Level Constraints Semi-Supervised Affinity Propagation with Instane-Level Constraints Inmar E. Givoni, Brendan J. Frey Probabilisti and Statistial Inferene Group University of Toronto 10 King s College Road, Toronto, Ontario,

More information

A scheme for racquet sports video analysis with the combination of audio-visual information

A scheme for racquet sports video analysis with the combination of audio-visual information A sheme for raquet sports video analysis with the ombination of audio-visual information Liyuan Xing a*, Qixiang Ye b, Weigang Zhang, Qingming Huang a and Hua Yu a a Graduate Shool of the Chinese Aadamy

More information

Graph-Based vs Depth-Based Data Representation for Multiview Images

Graph-Based vs Depth-Based Data Representation for Multiview Images Graph-Based vs Depth-Based Data Representation for Multiview Images Thomas Maugey, Antonio Ortega, Pasal Frossard Signal Proessing Laboratory (LTS), Eole Polytehnique Fédérale de Lausanne (EPFL) Email:

More information

Gray Codes for Reflectable Languages

Gray Codes for Reflectable Languages Gray Codes for Refletable Languages Yue Li Joe Sawada Marh 8, 2008 Abstrat We lassify a type of language alled a refletable language. We then develop a generi algorithm that an be used to list all strings

More information

FOREGROUND OBJECT EXTRACTION USING FUZZY C MEANS WITH BIT-PLANE SLICING AND OPTICAL FLOW

FOREGROUND OBJECT EXTRACTION USING FUZZY C MEANS WITH BIT-PLANE SLICING AND OPTICAL FLOW FOREGROUND OBJECT EXTRACTION USING FUZZY C EANS WITH BIT-PLANE SLICING AND OPTICAL FLOW SIVAGAI., REVATHI.T, JEGANATHAN.L 3 APSG, SCSE, VIT University, Chennai, India JRF, DST, Dehi, India. 3 Professor,

More information

Performance of Histogram-Based Skin Colour Segmentation for Arms Detection in Human Motion Analysis Application

Performance of Histogram-Based Skin Colour Segmentation for Arms Detection in Human Motion Analysis Application World Aademy of Siene, Engineering and Tehnology 8 009 Performane of Histogram-Based Skin Colour Segmentation for Arms Detetion in Human Motion Analysis Appliation Rosalyn R. Porle, Ali Chekima, Farrah

More information

Plot-to-track correlation in A-SMGCS using the target images from a Surface Movement Radar

Plot-to-track correlation in A-SMGCS using the target images from a Surface Movement Radar Plot-to-trak orrelation in A-SMGCS using the target images from a Surfae Movement Radar G. Golino Radar & ehnology Division AMS, Italy ggolino@amsjv.it Abstrat he main topi of this paper is the formulation

More information

Learning Discriminative and Shareable Features. Scene Classificsion

Learning Discriminative and Shareable Features. Scene Classificsion Learning Disriminative and Shareable Features for Sene Classifiation Zhen Zuo, Gang Wang, Bing Shuai, Lifan Zhao, Qingxiong Yang, and Xudong Jiang Nanyang Tehnologial University, Singapore, Advaned Digital

More information

Pipelined Multipliers for Reconfigurable Hardware

Pipelined Multipliers for Reconfigurable Hardware Pipelined Multipliers for Reonfigurable Hardware Mithell J. Myjak and José G. Delgado-Frias Shool of Eletrial Engineering and Computer Siene, Washington State University Pullman, WA 99164-2752 USA {mmyjak,

More information

Algorithms for External Memory Lecture 6 Graph Algorithms - Weighted List Ranking

Algorithms for External Memory Lecture 6 Graph Algorithms - Weighted List Ranking Algorithms for External Memory Leture 6 Graph Algorithms - Weighted List Ranking Leturer: Nodari Sithinava Sribe: Andi Hellmund, Simon Ohsenreither 1 Introdution & Motivation After talking about I/O-effiient

More information

Shape Outlier Detection Using Pose Preserving Dynamic Shape Models

Shape Outlier Detection Using Pose Preserving Dynamic Shape Models Shape Outlier Detetion Using Pose Preserving Dynami Shape Models Chan-Su Lee Ahmed Elgammal Department of Computer Siene, Rutgers University, Pisataway, NJ 8854 USA CHANSU@CS.RUTGERS.EDU ELGAMMAL@CS.RUTGERS.EDU

More information

Unsupervised Stereoscopic Video Object Segmentation Based on Active Contours and Retrainable Neural Networks

Unsupervised Stereoscopic Video Object Segmentation Based on Active Contours and Retrainable Neural Networks Unsupervised Stereosopi Video Objet Segmentation Based on Ative Contours and Retrainable Neural Networks KLIMIS NTALIANIS, ANASTASIOS DOULAMIS, and NIKOLAOS DOULAMIS National Tehnial University of Athens

More information

timestamp, if silhouette(x, y) 0 0 if silhouette(x, y) = 0, mhi(x, y) = and mhi(x, y) < timestamp - duration mhi(x, y), else

timestamp, if silhouette(x, y) 0 0 if silhouette(x, y) = 0, mhi(x, y) = and mhi(x, y) < timestamp - duration mhi(x, y), else 3rd International Conferene on Multimedia Tehnolog(ICMT 013) An Effiient Moving Target Traking Strateg Based on OpenCV and CAMShift Theor Dongu Li 1 Abstrat Image movement involved bakground movement and

More information

Automatic Physical Design Tuning: Workload as a Sequence Sanjay Agrawal Microsoft Research One Microsoft Way Redmond, WA, USA +1-(425)

Automatic Physical Design Tuning: Workload as a Sequence Sanjay Agrawal Microsoft Research One Microsoft Way Redmond, WA, USA +1-(425) Automati Physial Design Tuning: Workload as a Sequene Sanjay Agrawal Mirosoft Researh One Mirosoft Way Redmond, WA, USA +1-(425) 75-357 sagrawal@mirosoft.om Eri Chu * Computer Sienes Department University

More information

arxiv: v1 [cs.db] 13 Sep 2017

arxiv: v1 [cs.db] 13 Sep 2017 An effiient lustering algorithm from the measure of loal Gaussian distribution Yuan-Yen Tai (Dated: May 27, 2018) In this paper, I will introdue a fast and novel lustering algorithm based on Gaussian distribution

More information

Contour Box: Rejecting Object Proposals Without Explicit Closed Contours

Contour Box: Rejecting Object Proposals Without Explicit Closed Contours Contour Box: Rejeting Objet Proposals Without Expliit Closed Contours Cewu Lu, Shu Liu Jiaya Jia Chi-Keung Tang The Hong Kong University of Siene and Tehnology Stanford University The Chinese University

More information

Towards Optimal Naive Bayes Nearest Neighbor

Towards Optimal Naive Bayes Nearest Neighbor Towards Optimal Naive Bayes Nearest Neighbor Régis Behmo 1, Paul Marombes 1,2, Arnak Dalalyan 2,andVéronique Prinet 1 1 NLPR / LIAMA, Institute of Automation, Chinese Aademy of Sienes 2 IMAGINE, LIGM,

More information

Superpixel Tracking. School of Information and Communication Engineering, Dalian University of Technology, China 2

Superpixel Tracking. School of Information and Communication Engineering, Dalian University of Technology, China 2 Superpixel Traking Shu Wang1, Huhuan Lu1, Fan Yang1, and Ming-Hsuan Yang2 1 Shool of Information and Communiation Engineering, Dalian University of Tehnology, China 2 Eletrial Engineering and Computer

More information

A Coarse-to-Fine Classification Scheme for Facial Expression Recognition

A Coarse-to-Fine Classification Scheme for Facial Expression Recognition A Coarse-to-Fine Classifiation Sheme for Faial Expression Reognition Xiaoyi Feng 1,, Abdenour Hadid 1 and Matti Pietikäinen 1 1 Mahine Vision Group Infoteh Oulu and Dept. of Eletrial and Information Engineering

More information

Self-Adaptive Parent to Mean-Centric Recombination for Real-Parameter Optimization

Self-Adaptive Parent to Mean-Centric Recombination for Real-Parameter Optimization Self-Adaptive Parent to Mean-Centri Reombination for Real-Parameter Optimization Kalyanmoy Deb and Himanshu Jain Department of Mehanial Engineering Indian Institute of Tehnology Kanpur Kanpur, PIN 86 {deb,hjain}@iitk.a.in

More information

Contents Contents...I List of Tables...VIII List of Figures...IX 1. Introduction Information Retrieval... 8

Contents Contents...I List of Tables...VIII List of Figures...IX 1. Introduction Information Retrieval... 8 Contents Contents...I List of Tables...VIII List of Figures...IX 1. Introdution... 1 1.1. Internet Information...2 1.2. Internet Information Retrieval...3 1.2.1. Doument Indexing...4 1.2.2. Doument Retrieval...4

More information

Evolutionary Feature Synthesis for Image Databases

Evolutionary Feature Synthesis for Image Databases Evolutionary Feature Synthesis for Image Databases Anlei Dong, Bir Bhanu, Yingqiang Lin Center for Researh in Intelligent Systems University of California, Riverside, California 92521, USA {adong, bhanu,

More information

Gradient based progressive probabilistic Hough transform

Gradient based progressive probabilistic Hough transform Gradient based progressive probabilisti Hough transform C.Galambos, J.Kittler and J.Matas Abstrat: The authors look at the benefits of exploiting gradient information to enhane the progressive probabilisti

More information

Cluster-Based Cumulative Ensembles

Cluster-Based Cumulative Ensembles Cluster-Based Cumulative Ensembles Hanan G. Ayad and Mohamed S. Kamel Pattern Analysis and Mahine Intelligene Lab, Eletrial and Computer Engineering, University of Waterloo, Waterloo, Ontario N2L 3G1,

More information

Approximate logic synthesis for error tolerant applications

Approximate logic synthesis for error tolerant applications Approximate logi synthesis for error tolerant appliations Doohul Shin and Sandeep K. Gupta Eletrial Engineering Department, University of Southern California, Los Angeles, CA 989 {doohuls, sandeep}@us.edu

More information

Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors

Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors Eurographis Symposium on Geometry Proessing (003) L. Kobbelt, P. Shröder, H. Hoppe (Editors) Rotation Invariant Spherial Harmoni Representation of 3D Shape Desriptors Mihael Kazhdan, Thomas Funkhouser,

More information

Detecting Moving Targets in Clutter in Airborne SAR via Keystoning and Multiple Phase Center Interferometry

Detecting Moving Targets in Clutter in Airborne SAR via Keystoning and Multiple Phase Center Interferometry Deteting Moving Targets in Clutter in Airborne SAR via Keystoning and Multiple Phase Center Interferometry D. M. Zasada, P. K. Sanyal The MITRE Corp., 6 Eletroni Parkway, Rome, NY 134 (dmzasada, psanyal)@mitre.org

More information

ASL Recognition Based on a Coupling Between HMMs and 3D Motion Analysis

ASL Recognition Based on a Coupling Between HMMs and 3D Motion Analysis An earlier version of this paper appeared in the proeedings of the International Conferene on Computer Vision, pp. 33 3, Mumbai, India, January 4 7, 18 ASL Reognition Based on a Coupling Between HMMs and

More information

the data. Structured Principal Component Analysis (SPCA)

the data. Structured Principal Component Analysis (SPCA) Strutured Prinipal Component Analysis Kristin M. Branson and Sameer Agarwal Department of Computer Siene and Engineering University of California, San Diego La Jolla, CA 9193-114 Abstrat Many tasks involving

More information

8 : Learning Fully Observed Undirected Graphical Models

8 : Learning Fully Observed Undirected Graphical Models 10-708: Probabilisti Graphial Models 10-708, Spring 2018 8 : Learning Fully Observed Undireted Graphial Models Leturer: Kayhan Batmanghelih Sribes: Chenghui Zhou, Cheng Ran (Harvey) Zhang When learning

More information

Drawing lines. Naïve line drawing algorithm. drawpixel(x, round(y)); double dy = y1 - y0; double dx = x1 - x0; double m = dy / dx; double y = y0;

Drawing lines. Naïve line drawing algorithm. drawpixel(x, round(y)); double dy = y1 - y0; double dx = x1 - x0; double m = dy / dx; double y = y0; Naïve line drawing algorithm // Connet to grid points(x0,y0) and // (x1,y1) by a line. void drawline(int x0, int y0, int x1, int y1) { int x; double dy = y1 - y0; double dx = x1 - x0; double m = dy / dx;

More information

Calculation of typical running time of a branch-and-bound algorithm for the vertex-cover problem

Calculation of typical running time of a branch-and-bound algorithm for the vertex-cover problem Calulation of typial running time of a branh-and-bound algorithm for the vertex-over problem Joni Pajarinen, Joni.Pajarinen@iki.fi Otober 21, 2007 1 Introdution The vertex-over problem is one of a olletion

More information

An Efficient and Scalable Approach to CNN Queries in a Road Network

An Efficient and Scalable Approach to CNN Queries in a Road Network An Effiient and Salable Approah to CNN Queries in a Road Network Hyung-Ju Cho Chin-Wan Chung Dept. of Eletrial Engineering & Computer Siene Korea Advaned Institute of Siene and Tehnology 373- Kusong-dong,

More information

A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR

A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR Malaysian Journal of Computer Siene, Vol 10 No 1, June 1997, pp 36-41 A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR Md Rafiqul Islam, Harihodin Selamat and Mohd Noor Md Sap Faulty of Computer Siene and

More information

HEXA: Compact Data Structures for Faster Packet Processing

HEXA: Compact Data Structures for Faster Packet Processing Washington University in St. Louis Washington University Open Sholarship All Computer Siene and Engineering Researh Computer Siene and Engineering Report Number: 27-26 27 HEXA: Compat Data Strutures for

More information

An Interactive-Voting Based Map Matching Algorithm

An Interactive-Voting Based Map Matching Algorithm Eleventh International Conferene on Mobile Data Management An Interative-Voting Based Map Mathing Algorithm Jing Yuan* University of Siene and Tehnology of China Hefei, China yuanjing@mail.ust.edu.n Yu

More information

DETECTION METHOD FOR NETWORK PENETRATING BEHAVIOR BASED ON COMMUNICATION FINGERPRINT

DETECTION METHOD FOR NETWORK PENETRATING BEHAVIOR BASED ON COMMUNICATION FINGERPRINT DETECTION METHOD FOR NETWORK PENETRATING BEHAVIOR BASED ON COMMUNICATION FINGERPRINT 1 ZHANGGUO TANG, 2 HUANZHOU LI, 3 MINGQUAN ZHONG, 4 JIAN ZHANG 1 Institute of Computer Network and Communiation Tehnology,

More information

FUZZY WATERSHED FOR IMAGE SEGMENTATION

FUZZY WATERSHED FOR IMAGE SEGMENTATION FUZZY WATERSHED FOR IMAGE SEGMENTATION Ramón Moreno, Manuel Graña Computational Intelligene Group, Universidad del País Vaso, Spain http://www.ehu.es/winto; {ramon.moreno,manuel.grana}@ehu.es Abstrat The

More information

Detecting Outliers in High-Dimensional Datasets with Mixed Attributes

Detecting Outliers in High-Dimensional Datasets with Mixed Attributes Deteting Outliers in High-Dimensional Datasets with Mixed Attributes A. Koufakou, M. Georgiopoulos, and G.C. Anagnostopoulos 2 Shool of EECS, University of Central Florida, Orlando, FL, USA 2 Dept. of

More information

Chromaticity-matched Superimposition of Foreground Objects in Different Environments

Chromaticity-matched Superimposition of Foreground Objects in Different Environments FCV216, the 22nd Korea-Japan Joint Workshop on Frontiers of Computer Vision Chromatiity-mathed Superimposition of Foreground Objets in Different Environments Yohei Ogura Graduate Shool of Siene and Tehnology

More information

Outline: Software Design

Outline: Software Design Outline: Software Design. Goals History of software design ideas Design priniples Design methods Life belt or leg iron? (Budgen) Copyright Nany Leveson, Sept. 1999 A Little History... At first, struggling

More information

Hello neighbor: accurate object retrieval with k-reciprocal nearest neighbors

Hello neighbor: accurate object retrieval with k-reciprocal nearest neighbors Hello neighbor: aurate objet retrieval with -reiproal nearest neighbors Danfeng Qin Stephan Gammeter Luas Bossard Till Qua,2 Lu van Gool,3 ETH Zürih 2 Kooaba AG 3 K.U. Leuven {ind,gammeter,bossard,tua,vangool}@vision.ee.ethz.h

More information

Performance Benchmarks for an Interactive Video-on-Demand System

Performance Benchmarks for an Interactive Video-on-Demand System Performane Benhmarks for an Interative Video-on-Demand System. Guo,P.G.Taylor,E.W.M.Wong,S.Chan,M.Zukerman andk.s.tang ARC Speial Researh Centre for Ultra-Broadband Information Networks (CUBIN) Department

More information

Relevance for Computer Vision

Relevance for Computer Vision The Geometry of ROC Spae: Understanding Mahine Learning Metris through ROC Isometris, by Peter A. Flah International Conferene on Mahine Learning (ICML-23) http://www.s.bris.a.uk/publiations/papers/74.pdf

More information

A New RBFNDDA-KNN Network and Its Application to Medical Pattern Classification

A New RBFNDDA-KNN Network and Its Application to Medical Pattern Classification A New RBFNDDA-KNN Network and Its Appliation to Medial Pattern Classifiation Shing Chiang Tan 1*, Chee Peng Lim 2, Robert F. Harrison 3, R. Lee Kennedy 4 1 Faulty of Information Siene and Tehnology, Multimedia

More information

An Edge-based Clustering Algorithm to Detect Social Circles in Ego Networks

An Edge-based Clustering Algorithm to Detect Social Circles in Ego Networks JOURNAL OF COMPUTERS, VOL. 8, NO., OCTOBER 23 2575 An Edge-based Clustering Algorithm to Detet Soial Cirles in Ego Networks Yu Wang Shool of Computer Siene and Tehnology, Xidian University Xi an,77, China

More information

TUMOR DETECTION IN MRI BRAIN IMAGE SEGMENTATION USING PHASE CONGRUENCY MODIFIED FUZZY C MEAN ALGORITHM

TUMOR DETECTION IN MRI BRAIN IMAGE SEGMENTATION USING PHASE CONGRUENCY MODIFIED FUZZY C MEAN ALGORITHM TUMOR DETECTION IN MRI BRAIN IMAGE SEGMENTATION USING PHASE CONGRUENCY MODIFIED FUZZY C MEAN ALGORITHM M. Murugeswari 1, M.Gayathri 2 1 Assoiate Professor, 2 PG Sholar 1,2 K.L.N College of Information

More information

Improved Circuit-to-CNF Transformation for SAT-based ATPG

Improved Circuit-to-CNF Transformation for SAT-based ATPG Improved Ciruit-to-CNF Transformation for SAT-based ATPG Daniel Tille 1 René Krenz-Bååth 2 Juergen Shloeffel 2 Rolf Drehsler 1 1 Institute of Computer Siene, University of Bremen, 28359 Bremen, Germany

More information

A Novel Bit Level Time Series Representation with Implication of Similarity Search and Clustering

A Novel Bit Level Time Series Representation with Implication of Similarity Search and Clustering A Novel Bit Level Time Series Representation with Impliation of Similarity Searh and lustering hotirat Ratanamahatana, Eamonn Keogh, Anthony J. Bagnall 2, and Stefano Lonardi Dept. of omputer Siene & Engineering,

More information

From indoor GIS maps to path planning for autonomous wheelchairs

From indoor GIS maps to path planning for autonomous wheelchairs From indoor GIS maps to path planning for autonomous wheelhairs Jérôme Guzzi and Gianni A. Di Caro Abstrat This work fouses on how to ompute trajetories for an autonomous wheelhair based on indoor GIS

More information

特集 Road Border Recognition Using FIR Images and LIDAR Signal Processing

特集 Road Border Recognition Using FIR Images and LIDAR Signal Processing デンソーテクニカルレビュー Vol. 15 2010 特集 Road Border Reognition Using FIR Images and LIDAR Signal Proessing 高木聖和 バーゼル ファルディ Kiyokazu TAKAGI Basel Fardi ヘンドリック ヴァイゲル Hendrik Weigel ゲルド ヴァニーリック Gerd Wanielik This paper

More information

Time delay estimation of reverberant meeting speech: on the use of multichannel linear prediction

Time delay estimation of reverberant meeting speech: on the use of multichannel linear prediction University of Wollongong Researh Online Faulty of Informatis - apers (Arhive) Faulty of Engineering and Information Sienes 7 Time delay estimation of reverberant meeting speeh: on the use of multihannel

More information

SEGMENTATION OF IMAGERY USING NETWORK SNAKES

SEGMENTATION OF IMAGERY USING NETWORK SNAKES SEGMENTATION OF IMAGERY USING NETWORK SNAKES Matthias Butenuth Institute of Photogrammetry and GeoInformation, Leibniz Universität Hannover Nienburger Str. 1, 30167 Hannover, Germany butenuth@ipi.uni-hannover.de

More information

New Fuzzy Object Segmentation Algorithm for Video Sequences *

New Fuzzy Object Segmentation Algorithm for Video Sequences * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 521-537 (2008) New Fuzzy Obet Segmentation Algorithm for Video Sequenes * KUO-LIANG CHUNG, SHIH-WEI YU, HSUEH-JU YEH, YONG-HUAI HUANG AND TA-JEN YAO Department

More information

3-D IMAGE MODELS AND COMPRESSION - SYNTHETIC HYBRID OR NATURAL FIT?

3-D IMAGE MODELS AND COMPRESSION - SYNTHETIC HYBRID OR NATURAL FIT? 3-D IMAGE MODELS AND COMPRESSION - SYNTHETIC HYBRID OR NATURAL FIT? Bernd Girod, Peter Eisert, Marus Magnor, Ekehard Steinbah, Thomas Wiegand Te {girod eommuniations Laboratory, University of Erlangen-Nuremberg

More information

One Against One or One Against All : Which One is Better for Handwriting Recognition with SVMs?

One Against One or One Against All : Which One is Better for Handwriting Recognition with SVMs? One Against One or One Against All : Whih One is Better for Handwriting Reognition with SVMs? Jonathan Milgram, Mohamed Cheriet, Robert Sabourin To ite this version: Jonathan Milgram, Mohamed Cheriet,

More information

1. The collection of the vowels in the word probability. 2. The collection of real numbers that satisfy the equation x 9 = 0.

1. The collection of the vowels in the word probability. 2. The collection of real numbers that satisfy the equation x 9 = 0. C HPTER 1 SETS I. DEFINITION OF SET We begin our study of probability with the disussion of the basi onept of set. We assume that there is a ommon understanding of what is meant by the notion of a olletion

More information

An Optimized Approach on Applying Genetic Algorithm to Adaptive Cluster Validity Index

An Optimized Approach on Applying Genetic Algorithm to Adaptive Cluster Validity Index IJCSES International Journal of Computer Sienes and Engineering Systems, ol., No.4, Otober 2007 CSES International 2007 ISSN 0973-4406 253 An Optimized Approah on Applying Geneti Algorithm to Adaptive

More information

Capturing Large Intra-class Variations of Biometric Data by Template Co-updating

Capturing Large Intra-class Variations of Biometric Data by Template Co-updating Capturing Large Intra-lass Variations of Biometri Data by Template Co-updating Ajita Rattani University of Cagliari Piazza d'armi, Cagliari, Italy ajita.rattani@diee.unia.it Gian Lua Marialis University

More information

2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media,

2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any urrent or future media, inluding reprinting/republishing this material for advertising

More information

Partial Character Decoding for Improved Regular Expression Matching in FPGAs

Partial Character Decoding for Improved Regular Expression Matching in FPGAs Partial Charater Deoding for Improved Regular Expression Mathing in FPGAs Peter Sutton Shool of Information Tehnology and Eletrial Engineering The University of Queensland Brisbane, Queensland, 4072, Australia

More information

Machine Vision. Laboratory Exercise Name: Student ID: S

Machine Vision. Laboratory Exercise Name: Student ID: S Mahine Vision 521466S Laoratory Eerise 2011 Name: Student D: General nformation To pass these laoratory works, you should answer all questions (Q.y) with an understandale handwriting either in English

More information

Face and Facial Feature Tracking for Natural Human-Computer Interface

Face and Facial Feature Tracking for Natural Human-Computer Interface Fae and Faial Feature Traking for Natural Human-Computer Interfae Vladimir Vezhnevets Graphis & Media Laboratory, Dept. of Applied Mathematis and Computer Siene of Mosow State University Mosow, Russia

More information

We don t need no generation - a practical approach to sliding window RLNC

We don t need no generation - a practical approach to sliding window RLNC We don t need no generation - a pratial approah to sliding window RLNC Simon Wunderlih, Frank Gabriel, Sreekrishna Pandi, Frank H.P. Fitzek Deutshe Telekom Chair of Communiation Networks, TU Dresden, Dresden,

More information

Spatio-Temporal Naive-Bayes Nearest-Neighbor (ST-NBNN) for Skeleton-Based Action Recognition

Spatio-Temporal Naive-Bayes Nearest-Neighbor (ST-NBNN) for Skeleton-Based Action Recognition Spatio-Temporal Naive-Bayes Nearest-Neighbor () for Skeleton-Based Ation Reognition Junwu Weng Chaoqun Weng Junsong Yuan Shool of Eletrial and Eletroni Engineering Nanyang Tehnologial University, Singapore

More information

Smooth Trajectory Planning Along Bezier Curve for Mobile Robots with Velocity Constraints

Smooth Trajectory Planning Along Bezier Curve for Mobile Robots with Velocity Constraints Smooth Trajetory Planning Along Bezier Curve for Mobile Robots with Veloity Constraints Gil Jin Yang and Byoung Wook Choi Department of Eletrial and Information Engineering Seoul National University of

More information

Weak Dependence on Initialization in Mixture of Linear Regressions

Weak Dependence on Initialization in Mixture of Linear Regressions Proeedings of the International MultiConferene of Engineers and Computer Sientists 8 Vol I IMECS 8, Marh -6, 8, Hong Kong Weak Dependene on Initialization in Mixture of Linear Regressions Ryohei Nakano

More information

Naïve Bayes Slides are adapted from Sebastian Thrun (Udacity ), Ke Chen Jonathan Huang and H. Witten s and E. Frank s Data Mining and Jeremy Wyatt,

Naïve Bayes Slides are adapted from Sebastian Thrun (Udacity ), Ke Chen Jonathan Huang and H. Witten s and E. Frank s Data Mining and Jeremy Wyatt, Naïve Bayes Slides are adapted from Sebastian Thrun (Udaity ), Ke Chen Jonathan Huang and H. Witten s and E. Frank s Data Mining and Jeremy Wyatt, Bakground There are three methods to establish a lassifier

More information

Cluster Centric Fuzzy Modeling

Cluster Centric Fuzzy Modeling 10.1109/TFUZZ.014.300134, IEEE Transations on Fuzzy Systems TFS-013-0379.R1 1 Cluster Centri Fuzzy Modeling Witold Pedryz, Fellow, IEEE, and Hesam Izakian, Student Member, IEEE Abstrat In this study, we

More information

Dynamic Programming Bipartite Belief Propagation For Hyper Graph Matching

Dynamic Programming Bipartite Belief Propagation For Hyper Graph Matching Proeedings of the Twenty-Sixth International Joint Conferene on Artifiial Intelligene (IJCAI-17) Dynami Programming Bipartite Belief Propagation For Hyper Graph Mathing Zhen Zhang 1, Julian MAuley 2, Yong

More information

CleanUp: Improving Quadrilateral Finite Element Meshes

CleanUp: Improving Quadrilateral Finite Element Meshes CleanUp: Improving Quadrilateral Finite Element Meshes Paul Kinney MD-10 ECC P.O. Box 203 Ford Motor Company Dearborn, MI. 8121 (313) 28-1228 pkinney@ford.om Abstrat: Unless an all quadrilateral (quad)

More information

Transition Detection Using Hilbert Transform and Texture Features

Transition Detection Using Hilbert Transform and Texture Features Amerian Journal of Signal Proessing 1, (): 35-4 DOI: 1.593/.asp.1.6 Transition Detetion Using Hilbert Transform and Texture Features G. G. Lashmi Priya *, S. Domni Department of Computer Appliations, National

More information

Bayesian Belief Networks for Data Mining. Harald Steck and Volker Tresp. Siemens AG, Corporate Technology. Information and Communications

Bayesian Belief Networks for Data Mining. Harald Steck and Volker Tresp. Siemens AG, Corporate Technology. Information and Communications Bayesian Belief Networks for Data Mining Harald Stek and Volker Tresp Siemens AG, Corporate Tehnology Information and Communiations 81730 Munih, Germany fharald.stek, Volker.Trespg@mhp.siemens.de Abstrat

More information

Figure 1. LBP in the field of texture analysis operators.

Figure 1. LBP in the field of texture analysis operators. L MEHODOLOGY he loal inary pattern (L) texture analysis operator is defined as a gray-sale invariant texture measure, derived from a general definition of texture in a loal neighorhood. he urrent form

More information

Multiple-Criteria Decision Analysis: A Novel Rank Aggregation Method

Multiple-Criteria Decision Analysis: A Novel Rank Aggregation Method 3537 Multiple-Criteria Deision Analysis: A Novel Rank Aggregation Method Derya Yiltas-Kaplan Department of Computer Engineering, Istanbul University, 34320, Avilar, Istanbul, Turkey Email: dyiltas@ istanbul.edu.tr

More information

Gait Based Human Recognition with Various Classifiers Using Exhaustive Angle Calculations in Model Free Approach

Gait Based Human Recognition with Various Classifiers Using Exhaustive Angle Calculations in Model Free Approach Ciruits and Systems, 2016, 7, 1465-1475 Published Online June 2016 in SiRes. http://www.sirp.org/journal/s http://dx.doi.org/10.4236/s.2016.78128 Gait Based Human Reognition with Various Classifiers Using

More information

Fast Rigid Motion Segmentation via Incrementally-Complex Local Models

Fast Rigid Motion Segmentation via Incrementally-Complex Local Models Fast Rigid Motion Segmentation via Inrementally-Complex Loal Models Fernando Flores-Mangas Allan D. Jepson Department of Computer Siene, University of Toronto {mangas,jepson}@s.toronto.edu Abstrat The

More information

A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks

A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks Abouberine Ould Cheikhna Department of Computer Siene University of Piardie Jules Verne 80039 Amiens Frane Ould.heikhna.abouberine @u-piardie.fr

More information

A Comparison of Hard-state and Soft-state Signaling Protocols

A Comparison of Hard-state and Soft-state Signaling Protocols University of Massahusetts Amherst SholarWorks@UMass Amherst Computer Siene Department Faulty Publiation Series Computer Siene 2003 A Comparison of Hard-state and Soft-state Signaling Protools Ping Ji

More information

An Alternative Approach to the Fuzzifier in Fuzzy Clustering to Obtain Better Clustering Results

An Alternative Approach to the Fuzzifier in Fuzzy Clustering to Obtain Better Clustering Results An Alternative Approah to the Fuzziier in Fuzzy Clustering to Obtain Better Clustering Results Frank Klawonn Department o Computer Siene University o Applied Sienes BS/WF Salzdahlumer Str. 46/48 D-38302

More information

Volume 3, Issue 9, September 2013 International Journal of Advanced Research in Computer Science and Software Engineering

Volume 3, Issue 9, September 2013 International Journal of Advanced Research in Computer Science and Software Engineering Volume 3, Issue 9, September 2013 ISSN: 2277 128X International Journal of Advaned Researh in Computer Siene and Software Engineering Researh Paper Available online at: www.ijarsse.om A New-Fangled Algorithm

More information

Wide-baseline Multiple-view Correspondences

Wide-baseline Multiple-view Correspondences Wide-baseline Multiple-view Correspondenes Vittorio Ferrari, Tinne Tuytelaars, Lu Van Gool, Computer Vision Group (BIWI), ETH Zuerih, Switzerland ESAT-PSI, University of Leuven, Belgium {ferrari,vangool}@vision.ee.ethz.h,

More information

A {k, n}-secret Sharing Scheme for Color Images

A {k, n}-secret Sharing Scheme for Color Images A {k, n}-seret Sharing Sheme for Color Images Rastislav Luka, Konstantinos N. Plataniotis, and Anastasios N. Venetsanopoulos The Edward S. Rogers Sr. Dept. of Eletrial and Computer Engineering, University

More information

Adapting K-Medians to Generate Normalized Cluster Centers

Adapting K-Medians to Generate Normalized Cluster Centers Adapting -Medians to Generate Normalized Cluster Centers Benamin J. Anderson, Deborah S. Gross, David R. Musiant Anna M. Ritz, Thomas G. Smith, Leah E. Steinberg Carleton College andersbe@gmail.om, {dgross,

More information

On - Line Path Delay Fault Testing of Omega MINs M. Bellos 1, E. Kalligeros 1, D. Nikolos 1,2 & H. T. Vergos 1,2

On - Line Path Delay Fault Testing of Omega MINs M. Bellos 1, E. Kalligeros 1, D. Nikolos 1,2 & H. T. Vergos 1,2 On - Line Path Delay Fault Testing of Omega MINs M. Bellos, E. Kalligeros, D. Nikolos,2 & H. T. Vergos,2 Dept. of Computer Engineering and Informatis 2 Computer Tehnology Institute University of Patras,

More information

The Implementation of RRTs for a Remote-Controlled Mobile Robot

The Implementation of RRTs for a Remote-Controlled Mobile Robot ICCAS5 June -5, KINEX, Gyeonggi-Do, Korea he Implementation of RRs for a Remote-Controlled Mobile Robot Chi-Won Roh*, Woo-Sub Lee **, Sung-Chul Kang *** and Kwang-Won Lee **** * Intelligent Robotis Researh

More information