Weighted Finite Automatas using Spectral Methods for Computer Vision


 Dwight Barton
 5 months ago
 Views:
Transcription
1 Weighted Finite Automatas using Spectral Methods for Computer Vision A Thesis Presented by Zulqarnain Qayyum Khan to The Department of Electrical and Computer Engineering in partial fulfillment of the requirements for the degree of Master of Science in Electrical and Computer Engineering Northeastern University Boston, Massachusetts April 2016
2 To Abbu Jaan, wish you were still here! i
3 Contents List of Figures List of Tables Acknowledgments Abstract of the Thesis v vii viii ix 1 Introduction Background Problem Statement Related Work Overview Weighted Finite Automatas Introduction Definition Transformations WFA Hankels Spectral Learning Empirical Hankel Recovering WFA PreProcessing Introduction Posebits Posebit Selection Clusters of Velocities and Acceleration Hankel Matrices Clustering Gram Matrices Synthetic Experiments WFA Generation iii
4 4.2. String Generation from WFAs Evaluation Functions, Empirical Hankels, and Spectral Learning Evaluation Functions Empirical Hankels Spectral Learning Experiments to Evaluate Estimated WFAs Frobenius Norm Perplexity KL Divergence Word Prediction Error Rate Experiments Experimental Setup MHAD Dataset Description Evaluation MSR3D Dataset Description Evaluation Composable Activities Dataset Description Evaluation HDM05 Dataset Description Evaluation UTKinect Dataset Description Evaluation Some Other Experiments Experiments on PbDb Dataset Experiments with Hankels Conclusion and Future Work 46 Bibliography 48 iv
5 List of Figures 1. Graphical Representation of a WFA The General Flow sheet of the learning algorithm Examples of posebits, and some posses condition on different posebits Different relationships in body parts Posebit Binary Tree Description of structure of a hankel matrix Snapshots from MHAD Dataset Snapshots throwing action from MHAD Dataset Confusion matrix for MHAD Dataset with s=90% Confusion matrix for MHAD Dataset with s=60% Confusion matrix for MHAD Dataset with s=99% Confusion matrix for MSR3D Dataset with s=90% Confusion matrix for MSR3D Dataset with s=95% Confusion matrix for MSR3D Dataset with s=75% Snapshots from Composable Activites Dataset Confusion matrix for Composable Activities Dataset with s=95% Confusion matrix for Composable Activities Dataset with s=99% Confusion matrix for Composable Activities Dataset with s=75% Confusion matrix for Composable Activities Dataset from [38] Confusion matrix for HDM05 Dataset with s=94% Confusion matrix for HDM05 Dataset with s=99% Confusion matrix for HDM05 Dataset with s=85% Confusion matrix for HDM05 Dataset following protocol of [2] Snapshots from UTKinect Dataset Confusion matrix for UTKinect Dataset with s=95% Confusion matrix for UTKinect Dataset with s=99% Confusion matrix for UTKinect Dataset with s=75% v
6 12.1. Scores when WFA is trained on walk Scores when WFA is trained on jogging Scores when WFA is trained on boxing vi
7 List of Tables 1. Perplexity comparison for estimated WFAs KLD comparison for estimated WFAs Comparison of accuracies with other methods on MHAD Dataset Comparison of accuracies with other methods on UTKinect Dataset vii
8 Acknowledgments Here I wish to thank everyone who has supported me during the process of the thesis work, especially Prof. Camps for advising and supervising my work, and Prof. Sznaier and Prof. Dy for agreeing to be on my thesis committee. I would also like to thank my lab fellows, especially Caglayan and Xikang for guiding and helping me. Last but not least I d like to acknowledge the support of my family back home and my support base here in Boston, my Minions. viii
9 Abstract of the Thesis Weighted Finite Automatas using Spectral Methods for Computer Vision By Zulqarnain Qayyum Khan Master of Science in Electrical and Computer Engineering Northeastern University, April 2016 Dr. Octavia I. Camps, Adviser There are many possible ways to model the machine or model that generates a set of sequences, Weighted Finite Automatas (WFAs) have been demonstrated to be a powerful tool in this regard by the Natural Language Processing Community. Spectral techniques of recovering WFAs from empirically constructed hankel matrices have also been demonstrated to work very well, with theoretical backing, and thus make the task of recovering the underlying machine very much possible. Our focus here is an attempt to port WFAs and the spectral recovery techniques to the field of Computer Vision, implementing every technique from scratch to gain more in depth understanding. More specifically we look at activity videos (simple and complex) as string sequences, where the goal is to then recover the underlying machines that generate similar activities. Different features are used to convert the videos into strings, spectral methods are then applied to demonstrate viability of WFAs in tasks such as Action Classification on multiple datasets. The results are encouraging but indicate a further refinement of the approach and more data is needed. ix
10 Chapter 1 Introduction 1.1. Background Recognizing, Classifying or segmenting sequences plays a major role in any field that deals with pattern recognition, be it text based Natural Language Processing, or image based Computer Vision. There are multiple ways to identify sequences, one possible way can be to try and differentiate between sequences based on appearance or motion or any other features. Another way is the approach explored in this thesis, and that is making the assumption that instead of directly comparing sequences what if the underlying system that generates those sequences can be modelled, for example [1] and [2] take the approach by attempting to identify and comparing the dynamical systems that generate activities and then using different metrics for the task of activity recognition. Other possible approaches can be broadly classified as Generative, such as HMM based modelling that have been around since as early as [3] to more recent approaches such as those used by [4] vs Discriminative models which have been in more use recently, such as SVMs and Artificial Neural Networks (ANNs), such as those utilized by [53][23][30][54]. Keeping these in mind and the work done by Borja Balle et al [5] in the Natural Language Processing community, the intention is to introduce another generative model, namely Weighted Finite Automatas to the Computer Vision community. 1
11 1.2. Problem Statement We start with a from scratch implementation of [5], to develop a more indepth understanding of the working of Weighted Finite Automatas, and also to make it easier to adapt it to tasks more specific to us. The next step is to test the implementation on synthetic data, for this we ll need to implement a synthetic WFA generator, as well as a generator that can mimic producing strings from WFAs. After testing the discriminative ability of the WFAs on synthetic examples and satisfactory implementation, we move on to applying the WFA and the spectral techniques associated with them to Computer Vision tasks of Activity Recognition and Action Segmentation. The goal is to demonstrate usability of WFAs in the community and provide this as a tool. To use WFAs preprocessing of activity recognition videos needs to be done in different ways, which is also tackled, with a related issue being what kind of videos to use. For now we deal with videos that provide skeletal joint locations Related Work The body of work related to this thesis can be broadly divided into two different subsections, which are touched upon separately below: Weighted Finite Automatas: To a large extent this is the main focus of the thesis, implementing and following the lead of [5], who in turn are motivated by more detailed work on Automatas, like [6] on spectral learning and Quadratic Weighted Automatas, fundamental work on automatas and theorems that form the backbone of this work can be found in [7]. Activity Recognition: The problem of activity recognition is one of the most intuitive and commonly tackled problem in Computer Vision, despite that it also remains one of the most complicated ones. This interest and complexity has spawned a number of ways to attack the problem. The approaches vary inherently as well as based on the kind of data they are dealing with, some being more efficient in tackling data that has skeletal joint information, some dealing with dynamics, and yet others motivated more by appearance based features. The list of work in the area is exhaustive and for brevity we ll just point out to approaches that are different from each other to give the reader an idea of the work been done. 2
12 Recent work includes approaches that are based on grammars, such as those using segmental grammars to parse videos, for example [8] that uses a latent structural SVM to train grammar parameters, learning the hidden subactions in the process, other similar approaches make use of Context Free Grammars (CFGs), such as [9][10][11][12]. This, looking at actions as a set of subactions approach is very natural and intuitive and hence is oftutilized, for example by the likes of [13][14], which used decompoasable motion segments and learning temporal structures for the task. Yet another way is to make use of spatiotemporal features such as optical flow [15][16][17], and Bag of Features [18]. Longer video sequences that have multiple activities tend to be dealt with by by probabilistic models such Hidden Markov Models (HMMs) [4] including earlier Finite State Machines [19][20], to the more recent models such Conditional Random Fields (CRFs) [18]. Further variability in length of video sequences is tackled by approaches such as Hierarchical HMMs [21][22][23] or segmental HMMs [24][25][26][27]. A very different way of approaching the problem is to assume that there are underlying systems that generate a particular activity, and then to make use of Hankelets and Dynamical Distance metrics to identify those systems indirectly, such as done by [1] and [2] Overview The thesis is further divided into chapters dealing individually with the different steps and methods involved. What follows next in Chapter 2 is an indepth discussion and explanation of Weighted Finite Automatas, their implementation, their generation, as well as the spectral techniques used to recover them. Chapter 3 deals with the preprocessing step, that is, how to convert available videos into strings that can be processed by the WFAs. Chapter 4 is an explanation of our implementation of the whole process and the synthetic experiments done to establish confidence in the method moving forwards. Finally Chapter 5 provides results on multiple realworld datasets with skeletal joint information. This is followed by a conclusion of the whole work and a brief discussion on what the future holds in this direction. 3
13 Chapter 2 Weighted Finite Automatas 2.1. Introduction Weighted Finite Automatas (WFAs), also referred to as Observable Operator Models (or OOMs) [28], are a generalization of HMMs [29][28]. WFAs can be viewed as a more expressive form of HMM with the advantage that this expressiveness doesn t come at the cost of an increased complexity in learning, in fact as [28] points out, they re often easier to learn, WFAs like HMMs are inherently random models and hence are best suited to model systems that are intrinsically random themselves. Moreover WFAs can be probabilistic as well as improbablistic. Keeping this in mind we now move on to a formal definition of WFAs 2.2. Definition From an application point of view WFAs are functions that map strings to real numbers, more formally as defined in [5] WFA W with n states can be completely defined by the set of tuple W,,{ A } 1 over a set of symbols where, 1 n R  is the initial state probability vector n R  is the termination probability vector A R nxn  are the transition probability matrices for each symbol 4
14 Figure 1 (a) Graphical representation of a WFA with 2 states (n=2) and { ab, } (b) operator or matrix representation of the same WFA Given this form of WFA and a string x, it can be used to model the probability (or score) of the given string being generated from the WFA W, as follows: T f () x 1 A x (1) Where A x is a product of all the matrices associated with the symbols in the string x. For example given the WFA of figure 1, and a string x = aba we ll have, f ( x) f ( aba) A A A (2) T 1 a b a Since this is not a probabilistic WFA, a higher score indicates a higher likelihood of the string coming from this WFA Transformations Another useful characteristic of a WFA is its ability to model different scoring functions, through equivalent transformation. Two transformation pointed out both by [5] and [28] make it easier to manipulate and use WFAs. Given the WFA W as defined in the previous section, It is possible to transform it into two equivalent WFAs W and W, by using the following transformations: s Transformation 1: Given a WFA W,,{ A } 1 W,,{ } s 1 s s A, where p X ( I X) T T 1s 1 s A ( I X) 1 1, it can be transformed into Given this representation we can evaluate another scoring function f ( x) E[ w ] s x (3), that is the expected number of times the string x appears in a string w as a substring, where now it ll be given by the equation 5
15 f x E w A (4) T s( ) [ ] x 1s x s This is a critical transformation and one that we ll make use of often in this work. Transformation 2: Third representation W p can be obtained by only applying the transformation of (3) to the final probability vector, with the transformed WFA coming to be defined by the tuple W p,,{ } 1 s A, this now realizes another scoring function that realizes the scores/probabilities of x being a prefix in the sample space of the strings * T p( ) [ ] 1 s x s *, i.e. f x P x A (5) We did not find any useful application of this transformation and hence this is mentioned here just for completeness. The proves of both transformation are discussed in detail in [5] WFA Hankels Now we introduce another important building block in this discussion, i.e. creating hankel matrices from which WFAs can be recovered. The idea is to construct a large matrix H f PxS R, such that H f( p, s) fs( p. s), where p P and s S, P and S being the set of all possible prefixes and suffixes respectively. Now in this way, theoretically the hankel matrix will be of infinite size, and hence it ll be impossible to work with it. To circumvent this problem a basis is defined by restricting the set of prefixes and suffixes before hand. Now the Hankel can be created while being finite. The choice of basis depends on the problem at hand, the important part being that the values of this hankel now correspond to the scores obtained from an underlying WFA. Example: Let s assume we have a set of sample strings X { aa, b, bab, a, b, a, ab, aa, ba, b, aa, a, aa, bab, b, aa} If we want to create a hankel matrix that realizes the substring expectations such that those given by (4), we can define a set of basis P {, a, b, ba} and S { a, b}, and empirically fill in the hankel matrix with these expectations such that Giving the empirical hankel of the form N 1 i HS ( p, s) I[ x x] (6) N i 1 6
16 H s a b a b ba In our case we ll generally define the hankel matrix with equal set of basis, since it s easier to deal with and is much more intuitive Spectral Learning Figure 2. The general flow sheet of the learning algorithm, the training data is used to create the empirical hankel matrix, which is then factorized to create the underlying WFA. The spectral learning of WFA from data can be divided into two parts: Empirical Hankel Now we get to a very integral part of the method that is the spectral learning of the underlying WFA responsible for generating a set of sequence. Let X be an available training set of N strings, also assume the strings consist of the alphabets a and b, appearing in differing orders. The first step in learning the underlying WFA from this sample set is creation of the empirical hankel matrix. The critical property of this hankel matrix as mentioned in Theorem 1 below is that the rank of this hankel matrix gives the number of states in the WFA, the theorem of course holds for the theoretical case of infinite matrix. Theorem 1: [30] [31] f 1. If A f for some WFA A with n states implies rank( H ) n 2. If rank ( H f ) n implies exists WFA A with n states s.t. f fa This is an important theorem in the context of the work, however, working with infinite matrices is not possible in practice and hence as pointed out earlier we need to define a set of basis in advance. This big hankel H is a concatenation of empirically constructed hankels for each alphabet and the empty symbol. i.e. H [ H H ] PxS Where each of the sub hankels are of dimensions R, if P and S are the number of prefix and suffix bases. Two more hankel vectors are needed for the learning which are f 7
17 h h p,, s R R Px1 1xS Moreover each H, a subblock of the big H where H ( p, s) H( p s) Example: Consider a set of sequences with 2 symbols {a,b} and a basis of the form P { a, b, aa, ab, ba, bb} S, then the matrices and vectors discussed above will look like fs ( ) fs ( a) fs ( b) fs ( aa) fs( ab) fs( ba) fs( bb) fs ( a) fs ( aa) fs ( ab) fs( aaa) fs( aab) fs( aba) fs( abb) fs ( b) fs ( ba) fs( bb) fs( baa) fs( bab) fs( bba) fs( bbb) H fs ( aa) fs( aaa) fs( aab) fs( aaaa) fs( aaab) fs( aaba) fs( aabb) fs ( ab) fs ( aba) fs( abb) fs ( abaa) fs ( abab) fs ( abba) fs( abbb) fs ( ba) fs ( baa) fs ( bab) fs ( baaa) fs( baab) fs ( baba) fs( babb) fs ( bb) fs ( bba) fs ( bbb) fs ( bbaa) fs( bbab) fs( bbba) fs( bbbb) fs ( a) fs ( aa) fs( ab) fs ( aaa) fs( aab) fs( aba) fs( abb) fs ( aa) fs ( aaa) fs( aab) fs( aaaa) fs( aaab) fs( aaba) fs( aabb) fs ( ba) fs ( baa) fs( bab) fs( baaa) fs( baab) fs( baba) fs( babb) Ha fs ( aaa) fs ( aaaa) fs ( aaab) fs( aaaaa) fs( aaaab) fs( aaaba) fs ( aaabb) fs ( aba) fs ( abaa) fs( abab) fs( abaaa) fs( abaab) fs( ababa) fs( ababb) fs ( baa) fs ( baaa) fs( baab) fs( baaaa) fs( baaab) fs( baaba) fs( baabb) fs ( bba) fs ( bbaa) fs( bbab) fs( bbaaa) fs( bbaab) fs( bbaba) fs( bbabb) 8
18 fs ( b) fs ( ba) fs ( bb) fs( baa) fs( bab) fs( bba) fs( bbb) fs ( ab) fs ( aba) fs( abb) fs( abaa) fs( abab) fs( abba) fs( abbb) fs ( bb) fs ( bba) fs ( bbb) fs( bbaa) fs( bbab) fs( bbba) fs( bbbb) Hb fs ( aab) fs ( aaba) fs ( aabb) fs( aabaa) fs( aabab) fs( aabba) fs( aabbb) fs ( abb) fs ( abba) fs ( abbb) fs( abbaa) fs( abbab) fs( abbba) fs( abbbb) fs ( bab) fs ( baba) fs ( babb) fs ( babaa) fs( babab) fs( babba) f s( babbb) fs ( bbb) fs ( bbba) fs ( bbbb) fs ( bbbaa) fs( bbbab) fs( bbbba) f s( bbbbb) h fs ( a) fs ( b) f ( ab) h fs ( ba) fs ( bb) s T P,, S fs ( aa) Recovering WFA Once the above hankels have been learnt the recovery part is pretty straightforward, and involves taking an SVD and doing some matrix multiplications and inversions. The step by step algorithm is as follows: 1. Given the Hankel Matrices, H and H 2. Take a reduced SVD of H UDV T, based on the desired number of states n 3. Let X U * D, 4. Then Y V 1 1 (, ) T s h SY, T A X H Y 1 s X hp,, 1 1 We have found substring counting to be more intuitive and hence the empirical hankels are created by substring expectation calculations, that s why the recovered WFA is defined by the tuple W,,{ } s 1 s s A and can be transformed into W,,{ A } 1 transformations discussed in section 2.3. using the 9
19 Chapter 3 PreProcessing 3.1. Introduction The WFAs, the way they are implemented here primarily deal with strings, while in our target tasks we are dealing with videos. So our data needs to be preprocessed in order for it to be ready for training the WFAs. The intention is to convert the available data into a set of representative alphabet sequences. Different fairly simple ways are explored to this end, the intention in most of them being exploiting dynamical information rather than appearance based. This is also one of the reasons why we deal primarily with videos that have skeleton joint information available Posebits One of the very initial features that we started off with is the use of Posebits, as introduced in [32]. Posebits are a midlevel representation and are based on Boolean relationships between body parts, for example, is the left arm in front of the right arm etc. More examples are shown in Figure 3. The idea is to directly infer them from image features using a trained classifier. They are by nature compositional and hence are very flexible as compared to just action class labels. The dataset made available by [32] is known as Posebit Dataset (PbDb) and is mainly made up of videos collected from 4 further different datasets, some with available MoCap Data while other being 2D images. From MoCap data there are 10,000 poses taken from HumanEva [33] and HMODB [34] while for 2D images they use the Fashion [35] and Parse [36]. 10
20 Figure 3. Examples of posebits, and some poses conditioned on different posebits [32] Out of these we make use of the HumanEva dataset information since it corresponds more to the task we are initially looking at, that is action classification Posebit Selection Figure 4.1. Different relationships in body parts that posebits intend to exploit. Joints distance, relative positions, articulation angles. 11
21 Figure 4.2. Posebit Binary Tree: the poses in each leaf node, constrained by all posebits in a posebyte. Not all posebits are created equal, [32] argue that it is important to select posebits based on what tasks you intend to perform with them. To this end they propose a simple selection mechanism inspired by decision trees to choose a subset of posebits from the available ones, based on the task at hand. For example for 3D pose estimation (and activity recognition) the aim is to choose a subset of posebits using the following two criteria for posebits selection:  Reliability inferred from image features, r  How helpful they can be in reducing uncertainty in the hidden variable, x To select a subset S m from the available posebit candidate pool S c [32] use a forward selection mechanism to select the posebits with a greedy approach. That is, one bit at a time. With each next posebit at step j selected to maximize information gain * C R a arg max I j I j. I j as (7) M Where, I j  mixed information gain at the jth level of the tree C I  Clustering term R I  Reliability term  balances the two terms, generally kept at
22 The clustering information gain is further defined in terms of entropies as: I H H (8) C j j1 j Where defined as : H j is the sum of entropies weighted at each node of the jth level of the tree, it can be H j j 2 X SC X H( SC ) (9) X S c1 X S C being the subset of poses X laste term ( ) C x p in class C, HS is the differential entropy. The reliability measure is defined as, X S being the bigger set of MoCap poses, while the Q( X r, m) p( x a) p( a r) (10) m aa p( x a ) and p( a r ) being the conditional pose and posterior posebyte distributions respectively. For posebits classification a structural SVM model is used: ^ T a arg max F( r, a, w ) w ( a, r) (11) j j a j a j j a A j Where ( a, r) j is the joint feature map of input r and output j The experiments we did combining posebits with WFA will be discussed in the experiments section. a Clusters of Velocities and Accelerations The second approach we used is by utilizing the skeleton joint informations available with datasets such as Berkley Multimodal Human Action Database (MHAD) [37], Composable Activities Dataset [38], UT Kinect Dataset [39] and extracted from larger datasets such as JHMDB [40]. 13
23 Given the skeleton joint positions, for example in 3D, x, y, z we first center the skeleton around one of the joints (usually the hip joint), afterwards the mean of all the frames is removed to center the skeleton in the center. Afterwards a combination of these three simple techniques is utilized: 1. SubSampling: In most cases the joints do not move too much from one frame to the other and using all frames can result in redundantly long sequences while capturing much less information, for this purpose first of all instead of using each frame, an average skeleton is taken from K frames at a time. F subsampled K j1 K F j (12) 2. Velocities: Once these subsampled skeletons have been obtained, the velocities of these skeletons are taken to account for first order motion v F F (13) j subsampled subsampled 3. Acceleration: The acceleration is represented by taking differences of these velocities j aj vj vj 1 (14) Once this has been obtained, different combinations of these are utilized and the next step is to do Kmeans clustering on them, with number of clusters C serving as the number of characters in the alphabet of the WFA. Matlab s inbuilt Kmeans++ algorithm [41] j Hankel Matrices This is a more complicated way as compared to the ones discussed above. But can potentially encode much more dynamical information. From control systems we know that a dynamic system can be defined by the following set of equations: y Cx w x k k k k Ax k 1 (15) Dynamical systems play a pivotal role in recognition systems that emphasize more on dynamics as compared to appearance, these include systems for recognizing various tasks such as gait, dynamic texture recognition and activity recognition systems etc. The basic idea is gleamed from system ID methods, that is the identification of the A and C matrices in eq.15 from training data. 14
24 However, most of the times, and specially in computer vision the identification of these matrices is not an easy task, as these matrices are not unique and trying to recover them can lead to nonconvex problem statements. To work around this [1] introduced the making use of the special structure of Hankel Matrices [42], it is important to mention here that these hankel matrices are different from the ones discussed in Chapter 2. To understand hankelets (Tracklets of Hankels), consider a tracklet from a video sequence with measurements t k the underlying dynamic sequence behind this tracklet can be modelled by a linear regressor [43] n t a t, k s n k i1 This regressor can be modelled as a hankel matrix H D (to differentiate from the hankel matrices previously discussed we are adding the subscript D), in the absence of noise, such that rank( H D) order of the system, i ki (16) H D t1 t2... ts t t... t.... t t... t 2 3 s1 r r1 rs1 (17) Figure 5. The line represents a trajectory, with colored points representing observation, the matrics on the right shows how to create a hankel matrix from these observations The important argument by [42] in favour of this Hankel Matrix is that it captures the underlying dynamics of the system irrespective of the initial conditions or in other words two Hankelets from two trajectories output from the same underlying system will span the same linear subspace. They show this by factoring H into, where is the observability matrix and X is the state matrix, that is D 15 X
25 C CA, X x0 x1... x. m CA m (18) These hankel matrices can be formed either by using trajectories or any other features such as joint information etc. For our purposes we follow the lead of [2] and use gram matrices of Hankels encoding the joint positions in the Hankels, that is each observation t i encodes the 3D locations of joints in each frame t [ x, y, z, x, y, z,...] i (19) i i i i i i T Given the hankel matrix defined as in (17) the corresponding gram matrix is given by Clustering Gram Matrices ^ G H H T D D T (20) H D H D F The next step in conversion to the grammar required by WFA is the clustering of Gram matrices defined by (20), for clustering a distance like metric needs to be defined to find the centers of clusters, since these matrices live on the Positive Semi Definite (PSD) manifold, [2] mentions a number of metrics that can be used for the purpose including Affine Invariant Riemannien Metric (AIRM) [44], defined as, given two Gram Matrices XY, d X Y X YX 1/2 1/2 R(, ) log( ) F (21) The second one that can be used is the LogEuclidean Riemannian Metric (LERM) [45] d ( X, Y) log( X ) log( Y) (22) le Another metric that they mention and argue in favour of is and hence is used here is the Jensen Bregman Logdet Divergence (JBLD) [46], defined as X Y 1 dj ( X, Y) log log XY (23) 2 2 F 16
26 This JBLD defined in eq. 23 is what we use here in clustering with the mean (or the center of clusters) defined as X * N arg min J( X, X ) (24) X i1 i So, in summary, if we have a set of sequences of different activities, we chop the sequences into smaller overlapping sequences, encode them into gram matrices and cluster them using JBLD, the cluster labels will serve as the alphabet for training WFAs. 17
27 Chapter 4 Synthetic Experiments Before moving on to the target tasks in computer vision, we felt it important to establish our implementation of the Weighted Finite Automatas and their discriminative capabilities on firmer grounds by performing a variety of synthetic experiments, the details of which we ll discuss here with examples. This chapter will also cover the implementation details of the WFA part of the work, topics covered include: 1. WFA Generation 2. String Generation from WFAs 3. Evaluation Functions, Hankel Construction, and Spectral Learning 4. Experiments to Evaluate Estimated WFAs 4.1. WFA Generation The first step in performing synthetic experiments was the establishment of a Ground Truth. Which means having the ability to create WFAs on our own with different specifications. This was handled from the knowledge that a WFA W defined by the tuple W,,{ A } 1, where N 1 R  initial state probability vector N R  termination probability vector A R NxN  transition probabilities for the symbol 18
28 And the knowledge that we can create a probabilistic WFA by following the below rules: N i1 N j1 1 1i { A } 1, i 1... N ij i (25) In addition to this we also provide a sparsity option to control how dense or how sparse (in terms of connections between states) we want our WFA to be. Example: An example of the WFAs we created is the following WFA, the code takes input the number of states N and number of characters of the alphabet S. for example, with N = 6, S = 3 ( abc,, ), the following WFA was generated with 80% density T 1 [ ] = [ ] T A a
29 Ab Ac String Generation from WFAs Now that we have created a WFA, this WFA can be used to generate random sequences by traversing the states of the WFA, these strings can be used for training as well as testing. The following steps are followed to generate a string from a WFA W,,{ A } 1  Select initial state by sampling the states based on the initial probability distribution defined by 1  The next state, as well as the symbol to be emitted is selected based on the probability distribution defined by rows of { A } and the termination vector  No length limit is imposed, the generation stops once the termination state is reached Example: Based on the WFA from section 4.1. one of the strings generated looks like this x bcbcabbaabcbaacbcbccbbcaabbbbbaacbacca For our training purposes in the synthetic experminets we generate 10,000 strings from each WFA with an average length of around 12 characters, the longest string generated was
30 characters, minimum being an empty string. Which shows WFAs cover wide range of sequences in terms of length Evaluation Functions, Empirical Hankels, and Spectral Learning These have already been discussed in Chapter 2, for completeness and narrative purposes we ll mention them very briefly here Evaluation Functions Given the WFA W,,{ A } 1 and its transformation W,,{ } s 1 s s A, and a string or substring x, the following two evaluation functions are implemented by simple multiplication T f () x A x i) 1 T f () x A ii) 1 s s x s In practice the scores are kept positive, and also normalized to make sure the length of the sequences has lesser effect on the scores Empirical Hankels The structure and of these Hankels and the number of Hankel Matrices calculated as well as the entries are exactly as mentioned in section 2.5. In terms of implementation, in the case of the running example of this chapter with 3 characters. We select the basis PS { all combinations of a,b,c up to length 4}, which results in a basis of length 121 thus the size of the Hankels mentioned previously would be: H, H, H, H R h h p,, p R R a b c 121x1 1x x121 The number of the basis based on number of characters S, length of the basis l is given by the following formula: 21
31 S S basis 1 S l1 1 (26) Moreover while selecting basis, we do a frequency counting and the order of the bases depends on their frequency as substrings in the training strings. Also, since we have the ground truth WFA available we can directly fill up the entries in the Hankel Matrices, to create what we call Theoretical Hankel. If our Empirically estimated hankels are close to these Theoretical Hankels it means our estimation corresponds well to the theory. Recall that the entries are filled in by the evaluation function given by: f ( x) E[ w ] w P( w) s x w x Empirically, this means calculating the expected value of each combination of bases in the training set Spectral Learning Once we have the empirical hankel matrices, we can proceed with learning the underlying WFA, since the hankels are constructed while using expected values, the recovered WFA (using the methods outlined in Section 2.5., is the transformed one ^ W,,{ } s 1 s s A, which of course can be transformed to correspond to W. An important thing to remember is the spectral technique does not guarantee the recovery of a unique WFA however, so in terms of entries in the matrices the estimate ^ W and the ground truth W can and most of the time will be very different, but what we are interested in more is there behavior, which will be evaluated using the experiments outlined in the next section Experiments to Evaluate Estimated WFAs To make the case of whether or not the estimated WFAs are a good approximation of the ground Truth we did the following experiments: Frobenius Norm Frobenius Norms are generally a good metric to compare matrices. We posit that even before the spectral learning method kicks in it is important to establish the closeness of empirically 22
32 calculated Hankel Matrices with the Theoretical Hankels created using the ground truth WFA. If eh is the estimated Hankel and Hgt is the theoretical Hankel, the normalized frobenius norm distance can be calculated as: eh Hgt F d( eh, Hgt) (27) Hgt F We ran exhaustive experiments creating WFAs with different number of states, and alphabets, and in all cases found that the difference in (27) never went above 2%. Moreover when the same was calculated with different WFAs, the frobenius norm distance was found to be larger than the distance of estimated hankel from the ground truth. For example, with an estimation using 10,000 strings, the froenius norm distance of (27) vs the ground truth Hankels was found to be around 1%, when the ground truth was compared to Hankels constructed from false WFAs the distance was much larger (on average more than 10%). Very similar behaviour was observed when the same calculations were done using subspace angles and JBLD instead of frobenius norm. All establishing the validity of our counting process and the empirically created hankels Perplexity This is a fairly popularly used metric in Natural Language Processing, and is also suggested by [5]. The idea is to evaluate a number of test strings, treating them as an ensemble, normalizing the resultant scores to sum to 1, thus making a probability distribution over the ensemble. If the estimated WFAs are good enough this probability distribution should be close to the distribution obtained if the scores are calculated using the ground truth WFAs. Perplexity is a measure used to compare probability distributions, it is defined as follows: Given a probability distribution px ( ) and its estimate qx, ( ) the perplexity P( p, q ) is defined as: p( x)log( q( x)) P( p, q) 2 x (28) 23
33 A lower perplexity means a closer approximation, but an important thing to remember is that there s a lower bound as well, a perplexity lower than the lower bound also indicates a farther approximation, this lower bound is given by the self perplexity of p p( x)log( p( x)) L( p) 2 x (29) 1000 For example, in our case we have a test ensemble consisting of 1000 strings, i.e. { x }, i i 1 the probability distributions are calculated. The following tare the number for perplexity and lower bound 1000 p( xi)log( q( xi) i1 P( p, q) p( xi)log( p( xi)) i1 Lp ( ) As can be seen the perplexity, , obtained from the estimated WFA vs Ground Truth WFA is very close to the Lower bound , indicating that the estimation q is fairly close to p. To drive home the point the same calculations were done with false WFAs vs Ground Truth WFA, the results are tabulated in the table below Table 1. The perplexity values for false WFAs vs the ground truth WFA, either they are above the perplexity obtained with our estimate, or well below the Lower bound, indicating in both cases that our estimate is performing well. N Perplexity KL Divergence This is very closely related to perplexity, considering both model entropy, with the exception that the lower bound for KLD is zero, as it s defined as follows px ( ) KLD( p, q) p( x)ln (30) qx ( ) Just like Perplexity, a lower KLD indicated a closer estimation. The value we obtain with our estimate is, 24 x
34 1000 px ( i ) KLD( p, q) p( xi )ln qx ( ) i1 i Table 2. Again, the KLD values here are higher than the KLD value obtained by our estimate, indicating the estimate is close to the actual WFA. N KLD Word Prediction Error Rate As mentioned in [5] given a prefix, WFAs are able to predict the next symbol. That allows us another way to see how close our estimated WFA is performing versus the ground truth WFA. We define Word Prediction Error Rate (WPER) The number of times the prediction of ground truth WFA W differs from the prediction from the estimated ^ W divided by total number of symbols predicted The predictions are done the same way as suggested by [5], that is given a wfa W and its transformw s, if a prefix w 1,i is provided do the following:  Compute scores it A, for all possible symbols it  Compute the score for end of sequence s  1 1, A is calculated iteratively it i1 wi it Aw i s  The symbol (or end of string) that gives highest score is predicted In our case the WPER remains on average around 5% that is for 100 predicted symbols there are only 5 errors. 25
35 Chapter 5 Experiments After establishing the grounds for the use of WFA with experiments on synthetic data. We proceed towards applying our work to different datasets in Computer Vision, specifically in the Activity Recognition scenario. What follows in this chapter is an explanation of the datasets used, followed by the experiments performed while varying different parameters, and results obtained on each dataset. Some of the initial experiments, for example those done on Posebit Database were done as a proof of concept, and thus are not as detailed as those done later on with other datasets Experimental Setup As described earlier there are different parameters to play with in the method, including the methods for preparing the alphabet needed to train the WFAs, the number of states for the WFAs, the number of basis and the number of symbols. Since the WFAs in general require considerably more data to train than is available in these datasets, we use leave one out strategy for evaluation. The idea here is to train one WFA per action, leaving out one sequence for testing, and then checking the scores assigned by all the WFAs to that sequence, if the maximum score assigned corresponds to ground truth label, it s a correct recognition, if not, it s counted as an error. Instead of controlling the number of states of individual WFAs (which would leave us with too many parameters to tune), we control the number of states enmasse by using the percentage rule: n the number where the eigen value of H s % of the highest eigen value of H 26
36 Thus, varying this ' s ' allows us to vary the number of states of the WFAs, without individually tuning them. Other parameters include the number of symbols (or clusters)' C ', and the overlap window while considering velocities, acceleration etc. During the course of our experiments we found that C = 10, seems to give good results, having too less symbols leads to more monotonous sequences, while having too many symbols can lead to slowing down of the process without yielding any significant improvement MHAD Dataset Description Multimodal Human Action Database (MHAD) [37] is one of the most ubiquitous action recognition datasets with 3D joint information in the Computer Vision community. The dataset consists of 11 actions performed by 12 actors. Each action is performed 5 times each. One sequence is missing, so the total number of sequences is 659. The actions are as follows, the same numbering will be followed in the confusion matrices shown in this section 1. Jumping in place 2. Jumping Jacks 3. Bending hands up all the way down 4. Punching/boxing 5. Waving two hands 6. Waving One hand 7. Clapping hands 8. Throwing a ball 9. Sit down then stand up 10. Sit down 11. Stand up The activities have varying level of dynamics, some just in the upper body, like waving, punching, clapping etc, while others have dynamics in the whole body. So this dataset can be considered a naturalistic dataset. 27
37 Figure 6.1. Snapshots from one of the actors performing the 11 actions in MHAD, we make use of the MoCap data, with 35 joints, 3Dimensional, leading to 105 point feature vectors. Figure 6.2. Snapshots from throwing action as captured by MoCap cameras from different angles Evaluation For MHAD Dataset, results in Figure 6 are the best results that we were able to achieve while using the parameters C 10, s 95%, basislength 2. 28
38 Figure 7.1. Confusion matrix for Activities in MHAD Dataset, with an average accuracy of around 90%. As opposed to this, if a very low s 60% is used, it means the WFAs are not able to capture much the dynamics of the sequences, and hence the performance drops drastically, as shown in Figure 6b. Figure 7.2. Modelling the activities using a lower number of states leads to a significant drop in performance, as shown here for MHAD Dataset, the average accuracy has gone down to around 54%. 29
39 Similarly, increasing s to a higher percentage, which means allowing for a higher dependence on the number of states gleamed from the training data, can result in overfitting, and hence once again a drop in performance is observed. Figure 7.3. Confusion Matrix showing a drop in average accuracy when the WFAs are allowed to fit too much to the data s 99% Overall, the number of states plays a critical role in the performance of the system, we noticed a bell shaped trend relative to the value of s, that is increasing s yielded an improvement in the average accuracy up to a certain value (generally close to 95% ), any further increase results in a deteriorating accuracy. The MHAD Dataset is now a solved dataset, and hence our accuracy is not state of the art, recently [2] have demonstrated 100% accuracy, and also list the accuracies achieved by other methods, we are copying there results here. Table 3. Comparison of accuracies with existing methods shows there is room for improvement. Method SMIJ [47] RBF Net [48] Dynemes [49] BioLDS [50] Average Accuracy (%)
40 HBRNNL [51] 100 GL [2] GJ [2] GA [2] GK [2] WFA MSR 3D Dataset Description Microsoft Research 3D (MSR3D) dataset [52] is another popular action recognition dataset which provides the mocap information, the dataset consists of 20 actions, of varying similarities and dynamics, from full body movements to partial body movements. Each action is performed by 10 subjects, leading to a total of 557 relatively short sequences. The actions are as follows, the same numbering is followed in the confusion matrices: 1. High Arm Wave 2. Horizontal Arm Wave 3. Hammer 4. Hand Catch 5. Forward Punch 6. High Throw 7. Draw Cross 8. Draw Tick 9. Draw Circle 10. Hand Clap 11. Two Hand Wave 12. Side Boxing 13. Bend 14. Forward Kick 15. Side Kick 16. Jogging 31
41 17. Tennis Swing 18. Tennis Serve 19. Golf Swing 20. Pickup & Throw Evaluation Similar set of experiments were performed and once again the best accuracy was observed with s 90%, C 10, basislength 2, the results are shown in Figure 8a in the form of a confusion matrix. Figure 8.1. An average accuracy of almost 93% is achieved with understandable confusion in two different types of kicks (forward and side kicks). For s 95%, the accuracy goes down, indicating overfitting. 32
42 Figure 8.2. The average accuracy goes down when s is increased. Similarly on lowering s 75%, the accuracy again suffers drastically, indicating the WFAs have failed to model the dynamics Figure 8.3. The average accuracy again suffers when a smaller s is used. 33
43 5.4. Composable Activites Dataset Description The Composable Activities Dataset, introduced in [38] is a very different dataset as compared to the datasets discussed so far, since it s made up of sequences of complex activities, which in turn are made up of subactivities. All in all, there are 693 sequences of 16 classes performed by 14 actors. Each composable action is made up of different combinations of 3 to 11 subactivities out of a total of 26 activities. The Dataset exhibits high variance in the complexity as well as similarity of the sequences and as such is a difficult dataset for action recognition. Following are the 16 action classes for classification: 1. Composable Activity 1 2. Composable Activity 2 3. Composable Activity 3 4. Composable Activity 4 5. Composable Activity 5 6. Composable Activity 6 7. Composable Activity 7 8. Composable Activity 8 9. Hand Wave and Drink 10. Talk Phone and Drink 11. Talk Phone and Pickup 12. Talk Phone and Scratch Head 13. Walk while Calling with Hands 14. Walk while Clapping 15. Walk while Hand Waving 16. Walk while Reading The first 8 activities are composed of 3 to 11 subactions, most of the time performed sequentially, but sometimes performed in parallel. The subactions include reading, gesticulating, erasing/writing on a board etc. The authors provide skeleton data, and annotations. 34
44 Figure 9.1. A few examples of actions from the composable activities dataset, some actions are parallel, like topleft the subject walks while hand waving, topright the subject talks on phone, and then runs sequentially. Since this is a comparatively harder dataset, it was harder to perform well on this dataset, the best available performance on this dataset was around 86% by the creators of the dataset [38] Evaluation Our best performance is similar to the Bag of Visual Words baseline mentioned by the authors, with s 95%, C 10 we were able to get an accuracy of around 67.83%. Figure 9.2. Confusion Matrix for Composable Activities dataset. We are able to do well in the first 8 activities which are composed of multiple activities. 35
45 As previously observed increasing s led to a decrease in performance Figure 9.3. A drastic decrease in average accuracy to about half with s 99%. Similarly, a significant decrease in s also lead to a similarly reduced recognition accuracy Figure 9.4. A decrease in accuracy is seen when s is reduced to 75% As before, we are not performing close to the best, however we were able to hit at least one baseline which goes on to show that the method, although not perfect, can be made viable. As a reference the confusion matrix obtained by [38] is shown here. We are actually able to outperform 36
46 them in recognizing some of the activities, like Composable Activities 5,6,7 (numbered the same in fig 9.2) Figure 9.5. Confusion Matrix obtained by [38] with around 85% Average Accuracy HDM05 Dataset Description The next dataset that we experiment on is the HDM05 dataset [55]. It is also a MoCap dataset which provides 3D locations, for 31 joints, However like [2] we also used just 4 joints corresponding to arms and legs. The results were again similar to the pattern followed in the previous experiments Evaluation The best recognition accuracy that we were able to achieve was 90.5%, with s94%, C 10. This is the only dataset on which we were able to outperform the state of the 37
47 art, however, as mentioned earlier we follow a leave one out protocol, while [2] follows a leave one out subject protocol, keeping that in mind, our performance is still not state of the art. Figure An accuracy of over 90% is achieved with s 94% and C 10 Following the pattern so far, a higher s leads to a degradation in performance. At s 99% the confusion matrix looks like this Figure The average accuracy drops to around 77% at s 99% Similarly, going down also leads to a drop in performance 38
48 Figure A drop in performance is observed when s 85% is used. Considering that we were able to achieve a higher performance than the one reported in [2] we redid the experiments following there protocol, with leave one subject out, the performance dropped well below the state of the art, to around 71%. Figure Confusion Matrix for the best possible performance following protocol of [2] 39
49 5.6. UTKinect Dataset Description The UTKinect Dataset [39] is another popular dataset used evaluated frequently in action recognition settings. It is also based on 3D skeleton joints, consists of 10 simple actions including: 1. Walk 2. Sit Down 3. Stand Up 4. Pick Up 5. Carry 6. Throw 7. Push 8. Pull 9. Wave Hands 10. Clap Hands Furthermore, each action is performed by 10 subjects twice. Leading to 199 sequences (with one missing sequence). Figure Some sample images from different actions from the UTKinect Dataset. 40
50 Evaluation Leave One Out protocol is itself proposed by [39] in this case, which we follow. Table 4 is picked up directly from [2] and shows the performance of different methods on the dataset. While, once again this is a solved dataset now, we are able to perform significantly better than [1] and close to [4] Table 4. A comparison with different methods on UTKinect dataset, we are able to do reasonably well on most activities except carry and throw. ` Walk S.Dwn S.Up P.Up Carry Throw Push Pull Wave Clap Avg [1] [4] [39] [2] WFA (.95) The following is the confusion matrix for s 95%. We are able to perform reasonably well on all activities except carry and throw. Figure Confusion matrix for s 95% showing our best performance. 41
51 Bumping up s to 99% expectedly results in a drop in average accuracy. Figure Confusion matrix for s 99% showing a drop in average accuracy. Similarly, selecting a lower s also leads to a drastic drop in accuracy. Figure Confusion matrix for s 75% showing a drastic drop. 42
CHAPTER TWO LANGUAGES. Dr Zalmiyah Zakaria
CHAPTER TWO LANGUAGES By Dr Zalmiyah Zakaria Languages Contents: 1. Strings and Languages 2. Finite Specification of Languages 3. Regular Sets and Expressions Sept2011 Theory of Computer Science 2 Strings
More informationAutomata Theory TEST 1 Answers Max points: 156 Grade basis: 150 Median grade: 81%
Automata Theory TEST 1 Answers Max points: 156 Grade basis: 150 Median grade: 81% 1. (2 pts) See text. You can t be sloppy defining terms like this. You must show a bijection between the natural numbers
More informationGlynda, the good witch of the North
Strings and Languages It is always best to start at the beginning  Glynda, the good witch of the North What is a Language? A language is a set of strings made of of symbols from a given alphabet. An
More informationPerson Identity Recognition on Motion Capture Data Using Label Propagation
Person Identity Recognition on Motion Capture Data Using Label Propagation Nikos Nikolaidis Charalambos Symeonidis AIIA Lab, Department of Informatics Aristotle University of Thessaloniki Greece email:
More informationChapter 4: Regular Expressions
CSI 3104 /Winter 2011: Introduction to Formal Languages What are the languages with a finite representation? We start with a simple and interesting class of such languages. Dr. Nejib Zaguia CSI3104W11
More informationFinite Automata Part Three
Finite Automata Part Three Recap from Last Time A language L is called a regular language if there exists a DFA D such that L( D) = L. NFAs An NFA is a Nondeterministic Finite Automaton Can have missing
More informationJNTUWORLD. Code No: R
Code No: R09220504 R09 SET1 B.Tech II Year  II Semester Examinations, AprilMay, 2012 FORMAL LANGUAGES AND AUTOMATA THEORY (Computer Science and Engineering) Time: 3 hours Max. Marks: 75 Answer any five
More informationCLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS
CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of
More informationBayesian Networks and Decision Graphs
ayesian Networks and ecision raphs hapter 7 hapter 7 p. 1/27 Learning the structure of a ayesian network We have: complete database of cases over a set of variables. We want: ayesian network structure
More informationLesson 18: Counting Problems
Student Outcomes Students solve counting problems related to computing percents. Lesson Notes Students will continue to apply their understanding of percent to solve counting problems. The problems in
More informationArticulated Pose Estimation with Flexible MixturesofParts
Articulated Pose Estimation with Flexible MixturesofParts PRESENTATION: JESSE DAVIS CS 3710 VISUAL RECOGNITION Outline Modeling Special Cases Inferences Learning Experiments Problem and Relevance Problem:
More informationThe exam is closed book, closed notes except your onepage (twosided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your onepage (twosided) cheat sheet. No calculators or
More informationCompiler Construction
Compiler Construction Exercises 1 Review of some Topics in Formal Languages 1. (a) Prove that two words x, y commute (i.e., satisfy xy = yx) if and only if there exists a word w such that x = w m, y =
More informationUNIT I PART A PART B
OXFORD ENGINEERING COLLEGE (NAAC ACCREDITED WITH B GRADE) DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING LIST OF QUESTIONS YEAR/SEM: III/V STAFF NAME: Dr. Sangeetha Senthilkumar SUB.CODE: CS6503 SUB.NAME:
More informationGraphbased High Level Motion Segmentation using Normalized Cuts
Graphbased High Level Motion Segmentation using Normalized Cuts Sungju Yun, Anjin Park and Keechul Jung Abstract Motion capture devices have been utilized in producing several contents, such as movies
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationHand Tracking Miro Enev UCDS Cognitive Science Department 9500 Gilman Dr., La Jolla CA
Hand Tracking Miro Enev UCDS Cognitive Science Department 9500 Gilman Dr., La Jolla CA menev@ucsd.edu Abstract: Tracking the pose of a moving hand from a monocular perspective is a difficult problem. In
More informationData Compression Fundamentals
1 Data Compression Fundamentals Touradj Ebrahimi Touradj.Ebrahimi@epfl.ch 2 Several classifications of compression methods are possible Based on data type :» Generic data compression» Audio compression»
More informationITEC2620 Introduction to Data Structures
ITEC2620 Introduction to Data Structures Lecture 9b Grammars I Overview How can a computer do Natural Language Processing? Grammar checking? Artificial Intelligence Represent knowledge so that brute force
More informationAutoencoders, denoising autoencoders, and learning deep networks
4 th CiFAR Summer School on Learning and Vision in Biology and Engineering Toronto, August 59 2008 Autoencoders, denoising autoencoders, and learning deep networks Part II joint work with Hugo Larochelle,
More informationCPSC 340: Machine Learning and Data Mining. Kernel Trick Fall 2017
CPSC 340: Machine Learning and Data Mining Kernel Trick Fall 2017 Admin Assignment 3: Due Friday. Midterm: Can view your exam during instructor office hours or after class this week. Digression: the other
More informationIntroduction to Computer Architecture
Boolean Operators The Boolean operators AND and OR are binary infix operators (that is, they take two arguments, and the operator appears between them.) A AND B D OR E We will form Boolean Functions of
More informationPredicting Popular Xbox games based on Search Queries of Users
1 Predicting Popular Xbox games based on Search Queries of Users Chinmoy Mandayam and Saahil Shenoy I. INTRODUCTION This project is based on a completed Kaggle competition. Our goal is to predict which
More informationChapter Seven: Regular Expressions
Chapter Seven: Regular Expressions Regular Expressions We have seen that DFAs and NFAs have equal definitional power. It turns out that regular expressions also have exactly that same definitional power:
More informationLimitations of Matrix Completion via Trace Norm Minimization
Limitations of Matrix Completion via Trace Norm Minimization ABSTRACT Xiaoxiao Shi Computer Science Department University of Illinois at Chicago xiaoxiao@cs.uic.edu In recent years, compressive sensing
More informationTopology and Topological Spaces
Topology and Topological Spaces Mathematical spaces such as vector spaces, normed vector spaces (Banach spaces), and metric spaces are generalizations of ideas that are familiar in R or in R n. For example,
More informationFacial Expression Classification with Random Filters Feature Extraction
Facial Expression Classification with Random Filters Feature Extraction Mengye Ren Facial Monkey mren@cs.toronto.edu Zhi Hao Luo It s Me lzh@cs.toronto.edu I. ABSTRACT In our work, we attempted to tackle
More information1. Which of the following regular expressions over {0, 1} denotes the set of all strings not containing 100 as a substring?
Multiple choice type questions. Which of the following regular expressions over {, } denotes the set of all strings not containing as a substring? 2. DFA has a) *(*)* b) ** c) ** d) *(+)* a) single final
More informationArmhand Action Recognition Based on 3D Skeleton Joints Ling RUI 1, Shiwei MA 1,a, *, Jiarui WEN 1 and Lina LIU 1,2
1 International Conference on Control and Automation (ICCA 1) ISBN: 971939 Armhand Action Recognition Based on 3D Skeleton Joints Ling RUI 1, Shiwei MA 1,a, *, Jiarui WEN 1 and Lina LIU 1, 1 School
More informationFiniteState Transducers in Language and Speech Processing
FiniteState Transducers in Language and Speech Processing Mehryar Mohri AT&T LabsResearch Finitestate machines have been used in various domains of natural language processing. We consider here the
More informationAutomata Theory CS SFR Final Review
Automata Theory CS4112015SFR Final Review David Galles Department of Computer Science University of San Francisco FR0: Sets & Functions Sets Membership: a?{a,b,c} a?{b,c} a?{b,{a,b,c},d} {a,b,c}?{b,{a,b,c},d}
More informationAllstate Insurance Claims Severity: A Machine Learning Approach
Allstate Insurance Claims Severity: A Machine Learning Approach Rajeeva Gaur SUNet ID: rajeevag Jeff Pickelman SUNet ID: pattern Hongyi Wang SUNet ID: hongyiw I. INTRODUCTION The insurance industry has
More informationFacial Expression Recognition Using Nonnegative Matrix Factorization
Facial Expression Recognition Using Nonnegative Matrix Factorization Symeon Nikitidis, Anastasios Tefas and Ioannis Pitas Artificial Intelligence & Information Analysis Lab Department of Informatics Aristotle,
More informationFeature Extractors. CS 188: Artificial Intelligence Fall NearestNeighbor Classification. The Perceptron Update Rule.
CS 188: Artificial Intelligence Fall 2007 Lecture 26: Kernels 11/29/2007 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit your
More informationModule 1 Lecture Notes 2. Optimization Problem and Model Formulation
Optimization Methods: Introduction and Basic concepts 1 Module 1 Lecture Notes 2 Optimization Problem and Model Formulation Introduction In the previous lecture we studied the evolution of optimization
More information3D Human Motion Analysis and Manifolds
D E P A R T M E N T O F C O M P U T E R S C I E N C E U N I V E R S I T Y O F C O P E N H A G E N 3D Human Motion Analysis and Manifolds Kim Steenstrup Pedersen DIKU Image group and EScience center Motivation
More informationA Taxonomy of SemiSupervised Learning Algorithms
A Taxonomy of SemiSupervised Learning Algorithms Olivier Chapelle Max Planck Institute for Biological Cybernetics December 2005 Outline 1 Introduction 2 Generative models 3 Low density separation 4 Graph
More informationBuilding Classifiers using Bayesian Networks
Building Classifiers using Bayesian Networks Nir Friedman and Moises Goldszmidt 1997 Presented by Brian Collins and Lukas Seitlinger Paper Summary The Naive Bayes classifier has reasonable performance
More information3 Feature Selection & Feature Extraction
3 Feature Selection & Feature Extraction Overview: 3.1 Introduction 3.2 Feature Extraction 3.3 Feature Selection 3.3.1 MaxDependency, MaxRelevance, MinRedundancy 3.3.2 Relevance Filter 3.3.3 Redundancy
More informationConditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,
Conditional Random Fields and beyond D A N I E L K H A S H A B I C S 5 4 6 U I U C, 2 0 1 3 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative
More informationDynamic Human Shape Description and Characterization
Dynamic Human Shape Description and Characterization Z. Cheng*, S. Mosher, Jeanne Smith H. Cheng, and K. Robinette Infoscitex Corporation, Dayton, Ohio, USA 711 th Human Performance Wing, Air Force Research
More informationCS402  Theory of Automata FAQs By
CS402  Theory of Automata FAQs By Define the main formula of Regular expressions? Define the back ground of regular expression? Regular expressions are a notation that you can think of similar to a programming
More informationLast lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions
Last lecture CMSC330 Finite Automata Languages Sets of strings Operations on languages Regular expressions Constants Operators Precedence 1 2 Finite automata States Transitions Examples Types This lecture
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 Homework 2 out Announcements Due May 3 rd (11:59pm) Course project proposal
More informationClassification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University
Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate
More informationAction Classification in Soccer Videos with Long ShortTerm Memory Recurrent Neural Networks
Action Classification in Soccer Videos with Long ShortTerm Memory Recurrent Neural Networks Moez Baccouche 1,2, Franck Mamalet 1, Christian Wolf 2, Christophe Garcia 1, and Atilla Baskurt 2 1 Orange Labs,
More informationDeveloping Focused Crawlers for Genre Specific Search Engines
Developing Focused Crawlers for Genre Specific Search Engines Nikhil Priyatam Thesis Advisor: Prof. Vasudeva Varma IIIT Hyderabad July 7, 2014 Examples of Genre Specific Search Engines MedlinePlus Naukri.com
More informationMixture Models and the EM Algorithm
Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is
More informationDimension Reduction of Image Manifolds
Dimension Reduction of Image Manifolds Arian Maleki Department of Electrical Engineering Stanford University Stanford, CA, 9435, USA Email: arianm@stanford.edu I. INTRODUCTION Dimension reduction of datasets
More informationR3DG Features: Relative 3D Geometrybased Skeletal Representations for Human Action Recognition
R3DG Features: Relative 3D Geometrybased Skeletal Representations for Human Action Recognition Raviteja Vemulapalli 1, Felipe Arrate 2, Rama Chellappa 1 1 Center for Automation Research, UMIACS University
More informationDVA337 HT17  LECTURE 4. Languages and regular expressions
DVA337 HT17  LECTURE 4 Languages and regular expressions 1 SO FAR 2 TODAY Formal definition of languages in terms of strings Operations on strings and languages Definition of regular expressions Meaning
More informationChapter 18: Decidability
Chapter 18: Decidability Peter Cappello Department of Computer Science University of California, Santa Barbara Santa Barbara, CA 93106 cappello@cs.ucsb.edu Please read the corresponding chapter before
More informationConditional Random Fields for Object Recognition
Conditional Random Fields for Object Recognition Ariadna Quattoni Michael Collins Trevor Darrell MIT Computer Science and Artificial Intelligence Laboratory Cambridge, MA 02139 {ariadna, mcollins, trevor}@csail.mit.edu
More informationVisual Representations for Machine Learning
Visual Representations for Machine Learning Spectral Clustering and Channel Representations Lecture 1 Spectral Clustering: introduction and confusion Michael Felsberg Klas Nordberg The Spectral Clustering
More informationContextFree Grammars
ContextFree Grammars 1 Informal Comments A contextfree grammar is a notation for describing languages. It is more powerful than finite automata or RE s, but still cannot define all possible languages.
More informationADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL
ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY BHARAT SIGINAM IN
More informationLimitations of Algorithmic Solvability In this Chapter we investigate the power of algorithms to solve problems Some can be solved algorithmically and
Computer Language Theory Chapter 4: Decidability 1 Limitations of Algorithmic Solvability In this Chapter we investigate the power of algorithms to solve problems Some can be solved algorithmically and
More informationKHALID PERVEZ (MBA+MCS) CHICHAWATNI
FAQ's about Lectures 1 to 5 QNo1.What is the difference between the strings and the words of a language? A string is any combination of the letters of an alphabet where as the words of a language are the
More informationObject and Action Detection from a Single Example
Object and Action Detection from a Single Example Peyman Milanfar* EE Department University of California, Santa Cruz *Joint work with Hae Jong Seo AFOSR Program Review, June 45, 29 Take a look at this:
More informationHandEye Calibration from Image Derivatives
HandEye Calibration from Image Derivatives Abstract In this paper it is shown how to perform handeye calibration using only the normal flow field and knowledge about the motion of the hand. The proposed
More informationDirected Graph for FiniteState Machine
Directed Graph for FiniteState Machine Tito D. Kesumo Siregar (13511018) 1 Program Studi Teknik Informatika Sekolah Teknik Elektro dan Informatika Institut Teknologi Bandung, Jl. Ganesha 10 Bandung 40132,
More informationGeneral Instructions. Questions
CS246: Mining Massive Data Sets Winter 2018 Problem Set 2 Due 11:59pm February 8, 2018 Only one late period is allowed for this homework (11:59pm 2/13). General Instructions Submission instructions: These
More informationClustering CS 550: Machine Learning
Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://wwwusers.cs.umn.edu/~kumar/dmbook/ch8.pdf http://wwwusers.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf
More informationIntroduction to Machine Learning. Xiaojin Zhu
Introduction to Machine Learning Xiaojin Zhu jerryzhu@cs.wisc.edu Read Chapter 1 of this book: Xiaojin Zhu and Andrew B. Goldberg. Introduction to Semi Supervised Learning. http://www.morganclaypool.com/doi/abs/10.2200/s00196ed1v01y200906aim006
More informationCSE 586 Final Programming Project Spring 2011 Due date: Tuesday, May 3
CSE 586 Final Programming Project Spring 2011 Due date: Tuesday, May 3 What I have in mind for our last programming project is to do something with either graphical models or random sampling. A few ideas
More informationBagging for OneClass Learning
Bagging for OneClass Learning David Kamm December 13, 2008 1 Introduction Consider the following outlier detection problem: suppose you are given an unlabeled data set and make the assumptions that one
More informationLec5HW1, TM basics
Lec5HW1, TM basics (Problem 0) Design a Turing Machine (TM), T_sub, that does unary decrement by one. Assume a legal, initial tape consists of a contiguous set of cells, each containing
More informationTrimodal Human Body Segmentation
Trimodal Human Body Segmentation Master of Science Thesis Cristina Palmero Cantariño Advisor: Sergio Escalera Guerrero February 6, 2014 Outline 1 Introduction 2 Trimodal dataset 3 Proposed baseline 4
More information7. Boosting and Bagging Bagging
Group Prof. Daniel Cremers 7. Boosting and Bagging Bagging Bagging So far: Boosting as an ensemble learning method, i.e.: a combination of (weak) learners A different way to combine classifiers is known
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 08: Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu October 24, 2017 Learnt Prediction and Classification Methods Vector Data
More informationComplex Prediction Problems
Problems A novel approach to multiple Structured Output Prediction MaxPlanck Institute ECML HLIE08 Information Extraction Extract structured information from unstructured data Typical subtasks Named Entity
More informationAccelerometer Gesture Recognition
Accelerometer Gesture Recognition Michael Xie xie@cs.stanford.edu David Pan napdivad@stanford.edu December 12, 2014 Abstract Our goal is to make gesturebased input for smartphones and smartwatches accurate
More informationComplexity Theory. Compiled By : Hari Prasad Pokhrel Page 1 of 20. ioenotes.edu.np
Chapter 1: Introduction Introduction Purpose of the Theory of Computation: Develop formal mathematical models of computation that reflect realworld computers. Nowadays, the Theory of Computation can be
More informationMusic Genre Classification
Music Genre Classification Matthew Creme, Charles Burlin, Raphael Lenain Stanford University December 15, 2016 Abstract What exactly is it that makes us, humans, able to tell apart two songs of different
More informationIndian Institute of Technology Kanpur. Visuomotor Learning Using Image Manifolds: STGK Problem
Indian Institute of Technology Kanpur Introduction to Cognitive Science SE367A Visuomotor Learning Using Image Manifolds: STGK Problem Author: Anurag Misra Department of Computer Science and Engineering
More informationDetecting Burnscar from Hyperspectral Imagery via Sparse Representation with LowRank Interference
Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with LowRank Interference Minh Dao 1, Xiang Xiang 1, Bulent Ayhan 2, Chiman Kwan 2, Trac D. Tran 1 Johns Hopkins Univeristy, 3400
More informationA Performance Evaluation of HMM and DTW for Gesture Recognition
A Performance Evaluation of HMM and DTW for Gesture Recognition Josep Maria Carmona and Joan Climent Barcelona Tech (UPC), Spain Abstract. It is unclear whether Hidden Markov Models (HMMs) or Dynamic Time
More information6.867 Machine Learning
6.867 Machine Learning Problem set  solutions Thursday, October What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove. Do not
More informationAnnotation of Human Motion Capture Data using Conditional Random Fields
Annotation of Human Motion Capture Data using Conditional Random Fields Mert Değirmenci Department of Computer Engineering, Middle East Technical University, Turkey mert.degirmenci@ceng.metu.edu.tr Anıl
More informationTHE preceding chapters were all devoted to the analysis of images and signals which
Chapter 5 Segmentation of Color, Texture, and Orientation Images THE preceding chapters were all devoted to the analysis of images and signals which take values in IR. It is often necessary, however, to
More informationLeaveOneOut Support Vector Machines
LeaveOneOut Support Vector Machines Jason Weston Department of Computer Science Royal Holloway, University of London, Egham Hill, Egham, Surrey, TW20 OEX, UK. Abstract We present a new learning algorithm
More informationCALCULATING TRANSFORMATIONS OF KINEMATIC CHAINS USING HOMOGENEOUS COORDINATES
CALCULATING TRANSFORMATIONS OF KINEMATIC CHAINS USING HOMOGENEOUS COORDINATES YINGYING REN Abstract. In this paper, the applications of homogeneous coordinates are discussed to obtain an efficient model
More informationEFFICIENT ATTACKS ON HOMOPHONIC SUBSTITUTION CIPHERS
EFFICIENT ATTACKS ON HOMOPHONIC SUBSTITUTION CIPHERS A Project Report Presented to The faculty of the Department of Computer Science San Jose State University In Partial Fulfillment of the Requirements
More information3 NoWait Job Shops with Variable Processing Times
3 NoWait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical nowait job shop setting, we are given a set of processing times for each operation. We may select
More informationMIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA
Exploratory Machine Learning studies for disruption prediction on DIIID by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA Presented at the 2 nd IAEA Technical Meeting on
More informationCRF Based Point Cloud Segmentation Jonathan Nation
CRF Based Point Cloud Segmentation Jonathan Nation jsnation@stanford.edu 1. INTRODUCTION The goal of the project is to use the recently proposed fully connected conditional random field (CRF) model to
More informationCS 314 Principles of Programming Languages. Lecture 3
CS 314 Principles of Programming Languages Lecture 3 Zheng Zhang Department of Computer Science Rutgers University Wednesday 14 th September, 2016 Zheng Zhang 1 CS@Rutgers University Class Information
More informationBayes Net Learning. EECS 474 Fall 2016
Bayes Net Learning EECS 474 Fall 2016 Homework Remaining Homework #3 assigned Homework #4 will be about semisupervised learning and expectationmaximization Homeworks #3#4: the how of Graphical Models
More informationObject Purpose Based Grasping
Object Purpose Based Grasping Song Cao, Jijie Zhao Abstract Objects often have multiple purposes, and the way humans grasp a certain object may vary based on the different intended purposes. To enable
More informationNearest Neighbor Classification
Nearest Neighbor Classification Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 11, 2017 1 / 48 Outline 1 Administration 2 First learning algorithm: Nearest
More informationLecture 2: Analyzing Algorithms: The 2d Maxima Problem
Lecture 2: Analyzing Algorithms: The 2d Maxima Problem (Thursday, Jan 29, 1998) Read: Chapter 1 in CLR. Analyzing Algorithms: In order to design good algorithms, we must first agree the criteria for measuring
More informationSupervised vs unsupervised clustering
Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a priori. Classification: Classes are defined apriori Sometimes called supervised clustering Extract useful
More informationWeka ( )
Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data preprocessing (filtering) and representation Supervised
More informationCMSC 132: ObjectOriented Programming II
CMSC 132: ObjectOriented Programming II Regular Expressions & Automata Department of Computer Science University of Maryland, College Park 1 Regular expressions Notation Patterns Java support Automata
More information1. INTRODUCTION ABSTRACT
Weighted Fusion of Depth and Inertial Data to Improve View Invariance for Human Action Recognition Chen Chen a, Huiyan Hao a,b, Roozbeh Jafari c, Nasser Kehtarnavaz a a Center for Research in Computer
More informationEvaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München
Evaluation Measures Sebastian Pölsterl Computer Aided Medical Procedures Technische Universität München April 28, 2015 Outline 1 Classification 1. Confusion Matrix 2. Receiver operating characteristics
More informationOnline Subspace Estimation and Tracking from Missing or Corrupted Data
Online Subspace Estimation and Tracking from Missing or Corrupted Data Laura Balzano www.ece.wisc.edu/~sunbeam Work with Benjamin Recht and Robert Nowak Subspace Representations Capture Dependencies Subspace
More information3D Face and Hand Tracking for American Sign Language Recognition
3D Face and Hand Tracking for American Sign Language Recognition NSFITR (20042008) D. Metaxas, A. Elgammal, V. Pavlovic (Rutgers Univ.) C. Neidle (Boston Univ.) C. Vogler (Gallaudet) The need for automated
More informationTracking People. Tracking People: Context
Tracking People A presentation of Deva Ramanan s Finding and Tracking People from the Bottom Up and Strike a Pose: Tracking People by Finding Stylized Poses Tracking People: Context Motion Capture Surveillance
More informationGesture Spotting and Recognition Using Salience Detection and Concatenated Hidden Markov Models
Gesture Spotting and Recognition Using Salience Detection and Concatenated Hidden Markov Models Ying Yin Massachusetts Institute of Technology yingyin@csail.mit.edu Randall Davis Massachusetts Institute
More information