Learning Pigeon Behaviour Using Binary Latent Variables Matthew D. Zeiler

Size: px
Start display at page:

Download "Learning Pigeon Behaviour Using Binary Latent Variables Matthew D. Zeiler"

Transcription

1 Learning Pigeon Behaviour Using Binary Latent Variables By Matthew D. Zeiler Supervisor: Geoffrey E. Hinton April 29

2 Abstract In an effort to better understand the complex courtship behaviour of pigeons, we have built a model learned from motion capture data. We employ a Conditional Restricted Boltzmann Machine with binary latent features and real-valued visible units. The units are conditioned on information from previous time steps in a sequence to learn long-term effects and infer current features. We validate a trained model by quantifying the characteristic head-bobbing present in generated pigeon motion. We also introduce a method of predicting missing data by marginalizing out the hidden variables and minimizing the free energy of the model. An alternative prediction method using forward and reverse passes over gaps of missing markers was presented as well. Lastly, the effects of head and foot motion on prediction results were analyzed. Website: ii

3 Acknowledgements I would like to thank Professor Geoffrey E. Hinton of the Department of Computer Science at University of Toronto for supervising my research on this project and providing resources for the experimentation. Additionally, his research in the field of machine learning and neural networks is second to none, and without that, the research within this thesis would not have been possible. Further, I thank Graham W. Taylor, a PhD candidate of this department under Geoffrey E. Hinton and Sam T. Roweis, for the exceptional patience, guidance, and advice that he provided throughout this thesis work. His expertise with Conditional Restricted Boltzmann Machines and their application to motion capture and animation was the foundation of my research and being able to work with him to further the applications of these models has been very rewarding. Lastly, I would like to thank Nikolaus F. Troje from the Psychology Department at Queen s University for providing the pigeon motion capture data and the previous research that inspired this project. iii

4 Contents List of Symbols List of Figures List of Tables v vi vii 1 Introduction Overview Literature Review Understanding Pigeon Behaviour Alternative Models The Conditional Restricted Boltzmann Machine Data Gathering and Preprocessing Motion Capture Setup Body Centered Coordinates Ground Plane Invariance Data Normalization and Batch Preparation Motion Generation Generation of Novel Motion Quantifying Head Motion Prediction of Missing Foot Markers Marker Occlusion Method 1: Alternating Gibbs Sampling Method 2: Free-Energy Minimization N-step Prediction Results Bi-Directional Gap Filling Motion Effects on Frame Prediction Trained Model Substitution Conclusion 27 6 Recommendations for Future Work 28 References 3 A Coordinate Transformations 31 iv

5 List of Symbols AR CD CRBM dof fps HMM LDS MoCap RBM TRBM Auto-Regressive Contrastive Divergence Conditional Restricted Boltzmann Machine degrees of freedom frames per second Hidden Markov Model Linear Dynamical System Motion Capture Restricted Boltzmann Machine Temporal Restricted Boltzmann Machine v

6 List of Figures 1 Pigeon video test setup at Queen s University Boltzmann Machine with 3 visible units and 2 hidden units Restricted Boltzmann Machine with 3 visible units and 2 hidden units Architecture of a 3 rd order CRBM The CRBM used as a building-block in deep architectures Motion capture setup at Queen s University Biomotion Lab Vicon MX-2 series 1.3 Megapixel motion capture camera Example motion capture marker placement and derived coordinate frames Stroboscopic image from a walking pigeon Example hold-phases present in generated motion Hold-Phases present in path length and yaw angles (shaded grey) layer CRBM showing prediction problem Effects of changing the ratio of visible influense, (λ) Comparison of various models for N-step prediction Comparison of 2-layer CRBM predictions for right foot, left foot, and both feet The effect of changing the forward model weighting (A F ) Effects of motion on frame prediction errors for various prediction techniques Forward/Reverse 1-layer CRBM s used for bi-directional prediction Queen s University Biomotion Lab virtual pigeon vi

7 List of Tables 1 Notation for Coordinate Frames Comparison of Mean Standard Deviation in Hold and Thrust Phases Average 1-step prediction errors for 773 test frames Mean prediction errors for various methods over 773 test frames vii

8 1 Introduction 1.1 Overview Investigating avian social perception is an interesting problem, as it presents a sandbox in which we can experiment to better understand interactive social behaviour. In this thesis, time-series neural network models are used to capture key characteristics of avian behaviour. To test the modeling capacity of these networks, various experiments are conducted which evaluate the potential of the model. The hope is to provide researchers both control over and insight into the complex factors underlying courtship behaviour. The models used are first trained on motion capture (MoCap) data collected from various pigeon specimens. Once a model of the data is learned, it can be used to either generate motion, predict positions of body segments, or to provide useful animation for determining the social interactions between real pigeons. Due to the complex data set and long sequences, a powerful model is required to capture all relevant subtleties of pigeon motion in order to truly provide suitable interactions with a real pigeon. To complete this task, a time-series extension of a Boltzmann Machine was utilized. Before the experiments can be explained however, some background into previous work must be considered. 1.2 Literature Review Understanding Pigeon Behaviour Recent studies investigating the complex motion of the pigeon, Columbia livia, demonstrate distinct patterns present throughout walking sequences [1]. The pigeon walk can be characterized by its distinct head-bobbing which consists of a hold-phase in which the head displays no translation or rotation. Between such hold-phases are periods of rapid changes in position and orientation of the head called the thrust-phases of the motion. By quantifying these two phases and comparing them between pigeons in courtship or in regular walking motion, differences can be noted [2]. These differences help researchers classify pigeon behaviour and thus measuring these quantities in the output from our models could provide insight into how well our models learn the subtleties of pigeon motion. Further, studies investigating the complex courtship behaviour of pigeons demonstrate that courtship responses can occur not only with real partners, but also with video [3]. Confining a pigeon within a small box in a dark room as shown in Figure 1 and displaying video of another pigeon exhibiting courtship behaviour immediately elicits a response in the live pigeon. More recently, social behaviour in pigeons has been elicited by a virtual pigeon, driven by motion capture (Mocap) data gathered from a real pigeon and rendered through a computer graphics engine [4]. Being able to generate novel motion onto which the computer graphics engine can be applied would allow researchers to test more precisely what criteria in the motion garners a specific response. Thus, this thesis also aims to generate realistic pigeon motion with the future possibility 1

9 Figure 1: Pigeon video test setup at Queen s University. of displaying it to a live pigeon to determine if the model is powerful enough to create a response Alternative Models Motion capture data requires a powerful learning algorithm to capture all its relationships as it is very nonlinear and has multiple inherent characteristics that are related to one another (componential structure). A standard model for sequential data used by researchers is the Hidden Markov Model (HMM) which, for example, has been used by Bregler to recognize motion in video sequences [5]. The HMM utilizes a combination of hidden states and observable outputs with respective transition and output probabilities. Training these models reduces to a problem of determining the transition probabilities between hidden states and the output probabilities given a hidden state that produces an observable sequence (see [6] for more details). Due to their structure, Hidden Markov Models are unable to efficiently model the complex motion capture data used in these experiments. Since HMMs use a K-state multinomial to model time-series data, their modeling capacity increases with an exponential growth in the number of hidden states. In order to model N bits of past information, the number of hidden states would need to grow exponentially as 2 N. Due to this explosive growth in the number of HMM parameters required to capture features of the data, it becomes intractable to use this type of model efficiently. A distributed model capable of representing the varied information efficiently over a set of states is therefore desirable. One such distributed model is called a Linear Dynamical System (LDS). While these models are distributed, they suffer from not being able to deal with highly non-linear data such as the motion capture data used here. For static data sets, the Boltzmann Machine provides h 1 h 2 v 1 v 2 v 3 Figure 2: Boltzmann Machine with 3 visible units and 2 hidden units. 2

10 both a distributed representation of the data along with non-linear modeling capabilities. Though a Boltzmann Machine can consist of only a collection of visible units, comparable in size to given data vectors, these models can also have multiple layers of hidden units. These latent variables, as they are called, can aid in modeling abstract features of the data. Shown in Figure 2 is an example Boltzmann Machine composed of a visible layer of units corresponding in size to the data and a hidden layer of units which can be of an arbitrary size. The total input z i to each unit i from all other units, each denoted as j, is given as: z i = b i + j s i w ij (1) where w ij is the weight between units i and j. The b i term in Equation 1 is a bias input on unit i which is equivalent to a connection from another unit whose output is always 1. With the total input, z i, typically being passed through a sigmoid function, unit i turns on (having a value of 1) with probability: p(s i = 1) = e z i (2) Using Equation 2, the network of units can be updated one at a time. Doing so for a long period of time will allow the network to reach a Boltzmann distribution where the probability of a state vector P(v) is determined by its energy E(v): P(v) = ee(v) u ee(u) (3) E(v) = i s i v b i i<j s i v s j v w ij (4) The update rule for this model has a convenient, simple form. The updates to weights w ij are proportional to the expected values of state settings when the visible units are clamped to the data, s i s j data, compared to an equilibrium prediction of the data, s i s j model : w ij s i s j data s i s j model (5) where s i s j model is computed by alternating Gibbs sampling between the hidden and visible units until equilibrium is reached. Notice how the updates are local in the sense that they only depend on the values of s i and s j. A similar update rule applies to the bias weight updates b i : b j s j data s j model (6) All units in a Boltzmann Machine are fully connected, which is a computational limitation of this model when it comes to learning since reaching an equilibrium distribution become in- 3

11 tractable [7]. To make the learning procedure significantly faster a couple things can be done: 1) restrict the connections between units in the model, and 2) modify the learning procedure. If the connections between units are restricted to only occur between visible and hidden units (remove the hidden-hidden and visible-visible connections as shown in Figure 3), the hidden units become conditionally independent given a visible vector. This type of model is known as a Restricted Boltzmann Machine (RBM). Restricting the connections allows the units to be updated in parallel to provide unbiased samples of s j s i data in one step. Additionally, once the connections are restricted, a sample of s j s i model can be obtained by alternating parallel updates of h 1 h 2 v 1 v 2 v 3 Figure 3: Restricted Boltzmann Machine with 3 visible units and 2 hidden units. the visible and hidden units. To reach a true equilibrium distribution, these updates would have to repeat for an infinitely long time. However, for practical purposes, only a few iterations of these parallel alternating updates are required to get s j s i reconstruction which is a good approximation to s j s i model. This procedure can be summarized as follows: 1. Clamp a data vector to the visible units and update the hidden units in parallel. 2. Using the activations of the hidden units, update the visible units in parallel to get a reconstruction of the data. 3. Update all hidden units using the reconstructed visible units and repeat step 2-3 for the desired number of iterations. This procedure is an approximate gradient decent in a quantity known as contrastive divergence (CD) [7]. Another result of the connections being restricted is the ability of the RBM models to be stacked, which has been shown to improve a variational bound on the probability of the data [8]. This way layers of hidden units can be used to model additional abstract features of the data at multiple levels. When using multiple levels, a further increase in the training speed results from training each layer separately. Training one layer of hidden units using data vectors applied to the visible units is simple. Using the activations of hidden units in the trained layer while a given data vector is clamped to the visible units of this layer gives a training vector for the layer above. A complete set of these activation vectors can be formed using all the data vectors from the training set individually clamped at the visible units. This process of using activations of hidden units to train the layers above can be repeated for any number of layers in the model. Once trained, the RBM model can be used to both discriminate and generate by clamping specific units [9]. By adding labels in higher hidden layers and smoothing the gradients with backpropagation once initialized with the results of the training procedure outlined above, a clamped 4

12 visible vector can be classified by the labels through sampling. Alternatively, clamping a label can provide adequate information for the model to generate visible vectors that represent data related to that label. This generative ability is what makes RBMs particularly interesting for determining the amount and generality of information learned by the model as we wish to do throughout this thesis. While the Restricted Boltzmann Machine is capable of learning non-linear distributed representations of the data efficiently and generating visible vectors based on the learned parameters, it lacks temporal information which would make it suitable for modeling time series data such as motion capture. Since much knowledge of pigeon behaviour can be learned from motion capture data of pigeons participating in courtship routines, a model that captures temporal dependencies is desired. One capable model used throughout this research is discussed in the following section. 1.3 The Conditional Restricted Boltzmann Machine The success of a temporal extension of Restricted Boltzmann Machines in modeling human motion [1] has prompted us to consider this powerful class of models for learning on MoCap data captured from both single pigeons and pairs of pigeons in courtship. Unlike other neural network models that have been used to learn the physical constraints of a system [11], the power of this model can be used learn the dynamics as apposed to defining them analytically [1]. This has previously been done by training a Conditional Restricted Boltzmann Machine (CRBM) on motion capture data of human walking, running, and other motions [1]. A similar approach is to be applied to model pigeon behaviour. The CRBM is a non-linear generative model for time-series data which is a special case of the Temporal Restricted Boltzmann Machine (TRBM) [8], in which there are no temporal connections between hidden units. This makes filtering in the CRBM exact, and mini-batch learning possible, as training does not have to be done sequentially. This latter property can greatly speed up learning as well as smooth the learning signal, as the order of data vectors presented to the network can be randomized. The CRBM model is based on the RBM model at the current time step, but has directed connections from visible units at previous time steps to both the hidden and visible units of the current time step as shown in Figure 4. Various connections are indicated in the figure, including: directed visible-hidden weights (red), directed visible-visible weights (green), undirected visible-hidden weights (blue), and static biases (purple). The directed connections to the hidden and visible units t 3 t 2 t 1 t h Figure 4: Architecture of a 3 rd order CRBM. act as dynamic biases input into each set of units. Unlike the static biases that are present in the model which have a constant input of 1, the dynamic biases carry the value of the previous visible unit to which they are attached. The undirected connections at the current time step connect between the binary latent vari- v 5

13 ables, h and the visible variables, v. Typically for RBMs and CRBMs, the latent variables have stochastic binary states instantiated by sampling the inputs to each hidden unit. Alternatively, a mean-field setting could be used, in which case the result of Equation 2 becomes the output of the unit. In this thesis, both approaches are used in select instances to improve results. Instead of binary units for the visible variables, these models can use any function that is part of the exponential family [12]. These exponential functions have a linear effect when considering log probabilities throughout the model. In modeling motion capture for this thesis, Gaussian units were chosen which evaluate to the total input plus additive Gaussian noise at the output. At each time step, v and h receive directed connections from the visible variables at a certain number of previous time-steps, the number of which defines, N, the order of the model. The CRBM model defines a joint probability distribution over v and h, conditional on the past N observations and model parameters, θ: p(v,h {v} t 1 t N,θ) = exp( E ( v,h {v} t 1 t N,θ)) /Z E ( v,h {v} t 1 t N,θ)) = i (v i b i )2 2σ 2 i j h j b j ij w ij v i σ i h j (7) where Z is a constant called the partition function which is exponentially expensive to compute exactly. The dynamic biases, b i,b j, are affine functions of the past N observations. Such an architecture makes on-line inference efficient and allows us to train by minimizing contrastive divergence [7]. To train a CRBM model, the weight updates for the symmetric connections and the static biases remain the same as in the RBM training procedure (Equations 5 and 6). The directed connections from the previous time steps, t 1,t 2,...,t N, to the current hidden units have a slightly different weight update rule: d (t q) ij v (t q) i h t j data v (t q) i h t j reconstruction (8) where d (t q) ij is the weight between visible unit i at time t q to hidden unit j for q = 1...N where N is the order of the model. Additionally, there is a new weight update rule for the visible unit dynamic bias weights, which are often referred to as the auto-regressive weights: a (t q) ki v (t q) k vj t data v (t q) k vj t reconstruction (9) where a (t q) ki connects visible unit k at time t q to visible unit i at the current time t for q = 1...N. Each weight update is scaled by a small learning rate to prevent the training from overshooting local or perhaps global optima. Typically the auto-regressive weights have a smaller learning rate than the rest of the connections because the correlations between the visible units at previous time steps and those at the current time step are much stronger than other pairwise correlations. Just as with the static RBM model, an important feature of the CRBM is that once it is trained, 6

14 we can add layers like in a Deep Belief Network [9]. The previous layer CRBM is kept, and the sequence of hidden state vectors, while driven by the data, is treated as a new kind of fully observed data. The next level CRBM has the same architecture as the first but now has binary visible units to match the hidden units in the previous layer, while the number of hidden units in this next layer can be set arbitrarily (see Figure 5). This next layer is trained in the same way as the previous. t 3 t 2 t 1 t h 2 h 1 Figure 5: The CRBM used as a building-block in deep architectures. v This greedy procedure is justified using a variational bound [9]. Following greedy learning, both the weights and the simple inference procedure are suboptimal in all but the top layer of the network, as the weights have not changed in the lower layers since their respective stage of greedy training. However, a contrastive form of the wakesleep algorithm [13] called the up-down algorithm [9] can be used to fine-tune the generative model. Fine-tuning has been observed to improve the quality of generated sequences. Once trained, the CRBM model can generate motion by supplying a few initialization frames of MoCap and alternating updates of the units in the model. More layers can also aid in capturing multiple styles of motion, and permitting transitions between these styles as was shown by Taylor et al. [1] when training a single CRBM on MoCap data of different human walking styles. The applicability of these models for representing pigeon motion is in their ability to generate time-series data. As with RBMs, the CRBM could potentially be used to classify labeled data if the layout was modified slightly to include labeled units. However, it is the generation of novel motion from a trained model that we can use to analyze pigeon behaviour. In this thesis, one and two layer models are tested in their ability to act as generative models of pigeon motion. They are compared against other approaches in their ability to capture the subtleties of the motion, notably the characteristic head-bobbing [1], and their capabilities of predicting the location of feet which are frequently occluded during motion capture. 7

15 2 Data Gathering and Preprocessing 2.1 Motion Capture Setup The data set used for training and testing was motion capture provided by Nicholaus Troje from the Biomotion Lab at Queen s University. The data was captured using markers at various locations on four segments of the pigeon: the head, torso, and both feet. To ensure the marker positions on each segment of the pigeon did not vary with respect to each other, cardboard was placed on the head and back to hold those markers in place. Each pigeon was then recorded with an array of 12 synchronized cameras while allowed to walk in an enclosed area. The setup is shown below in Figure 6 along with one of the 12 Vicon MX Megapixel cameras used to capture the data (Figure 7). Available data sets contained various standing, walking, and courtship sequences of either one or two pigeons. All experiments were done using the data from a single pigeon. This data was either derived from motion capture data of a courtship sequence between two pigeons or from that of a single pigeon reacting to another. The collected data was cleaned to account for sensor noise and occlusion, providing (x,y,z) positions of each marker in mm with respect to a global coordinate system. Figure 6: Motion capture setup at Queen s University Biomotion Lab. Figure 7: Vicon MX-2 series 1.3 Megapixel motion capture camera. 2.2 Body Centered Coordinates As a first attempt to preprocess the data into a form suitable for learning, the pigeon segments were converted into a hierarchy of coordinate frames. From the (x,y,z) positions of each marker output from the camera system, it is desired to model the body, head, and each foot as separate coordinate frames, each with six degrees of freedom (dof). 8

16 z (mm) y (mm) x (mm) 25 2 Figure 8: Example motion capture marker placement and derived coordinate frames. An example of this conversion is shown in Figure 8. In this figure, black circles represent motion capture markers used to define each origin (black dots), and the red circles are the remaining markers. Coordinate frame origins are connected with black lines and the unit vectors for x, y, and z directions are represented by red, green, and blue lines respectively. To begin this process of conversion for the head and body, three markers were selected on each segment, two near the back on left and right sides of the pigeon and one near the front (beak or chest depending on segment). Vectors were assigned from the right marker p R towards the front marker at p F to give the vector v RF. A second vector for each segment was formed from the right marker p R to the left marker p L to get v RL. In order to calculate an origin along the v RL that would have a vector connecting the front marker location and intersecting v RL perpendicularly at the origin, a projection onto v RL was used: p origin = p R + v RL v RF v RL v RL v RL (1) Also, the normalized cross product between the two vectors, v RL and v RF, gave a vector pointing up from the plane of the three chosen points: ẑ = v RF v RL v RF v RL (11) This ẑ vector for each of these two segments formed the first basis vector of their respective coordinate systems. The ˆx (front facing) vector was found using the difference of the origin and front locations of each segment: ˆx = p F p origin p F p origin (12) The final unit vector to be calculated for each coordinate system was ŷ, which represents a leftfacing vector found using the left marker location, p L, and the origin, p origin, of the head and body segments respectively: ŷ = p L p origin p L p origin (13) Using the location of each origin, along with the three unit vectors of each coordinate system, both translations and rotations were obtained as described shortly. The feet required a slightly different 9

17 procedure to derive their coordinate systems since there were only two markers placed on each foot. The average between these two markers, p F and p B, on each foot was taken to be the position of the origin, p origin, for the respective foot: p origin = p R + p L 2 (14) The front-facing vector ˆx was again the direction from the origin p origin to the front marker p F : ˆx = p F p origin p F p origin (15) The next step was slightly different for the feet since an up-vector had to be arbitrarily chosen for determining the coordinate frame. Since the pigeon was always assumed to be walking upright, the up -vector was chosen to be: v up = [ 1] T (16) Using this vector put a constraint on the system because there could no longer be any rotations about the ˆx axis of either foot. This reduced the degrees of freedom of each foot to 5, with 3 being for translations and 2 for rotations. Despite this reduction in degrees of freedom, the representation used for each foot still stored 6 variables for all possible degrees of freedom in the model. Using the v up vector, the left-facing direction was determined using the following cross product: ŷ = v up ˆx v up ˆx (17) Finally, using both ˆx and ŷ, the vector ẑ was found: ẑ = ˆx ŷ ˆx ŷ (18) Since the process of deriving the coordinate systems was the same for both the body and head segments, and a similar process was identical for each foot, the calculations above were not shown with any subscripts indicating which segment each variable represents. Shown below as a postsubscript are the markings for body (B), head (H), left foot (LF), and right foot (RF) segments. Also, up to this point, all the calculations were in the global coordinate system, which shall be indicated with a pre-superscript G, giving the following vectors: Body Head Left Foot Right Foot origin (O) G p O(B) G p O(H) G p O(LF) G p O(RF) forward direction Gˆx (B) Gˆx (H) Gˆx (LF) Gˆx (RF) left direction Gŷ (B) Gŷ (H) Gŷ (LF) Gŷ (RF) right direction Gẑ (B) Gẑ (H) Gẑ (LF) Gẑ (RF) Table 1: Notation for Coordinate Frames. 1

18 Once the coordinate frames for each segment were determined, it was possible to find the rotation matrices that related each frame. This was done using dot products between the two frames as shown in Equation 41 of Appendix A. The pre-superscript of a rotation matrix indicates the coordinate frame that results from left multiplying a vector by the rotation matrix. The postsubscript indicates what coordinate frame the original vector should be in. To convert to a body centered representation, the G R H, G R B, G R LF, and G R RF rotation matrices were computed first, from which others were derived. Since a rotation matrix is symmetric, the inverse is its transpose and thus the following compound rotation matrices were formed: B R G = ( G R B ) 1 (19) H R B = ( G R H ) 1 G R B (2) LF R B = ( G R LF ) 1 G R B (21) RF R B = ( G R RF ) 1 G R B (22) The first of these rotation matrices represents the rotation of the body frame with respect to the global coordinate system. The remaining three rotation matrices represent the rotations of the head, left foot, and right foot with respect to the body coordinate frame respectively. All four of these matrices were converted to exponential map representation [14] giving three values which define the direction of an axis about which the rotation occurs and whose combined magnitude indicate the size of the angle through which the segment rotates about this axis. These three values for each segment were therefore the three rotational dof for each segment. The translation of the body segment was simply its location in global coordinates, but the translations of the other three segments were calculated with respect to the body origin. To do this, the origin of each segment had to be multiplied by the B T G transformation matrix which, as defined in Appendix A, is: B T G = [ ( B ) ( R B ) G p BG ] (23) where B p BG is the position of the body segment origin converted to body coordinates as: B p BG = ( G R B ) 1 ( G p GB ) = ( G R B ) 1 G p O(B) (24) Then each origin in body-centered coordinates was given by: B p O(H) = B T G G p O(H) (25) B p O(LF) = B T G G p O(LF) (26) B p O(RF) = B T G G p O(RF) (27) The four origin positions (the three directly above and G p O(B) ) gave the twelve translational de- 11

19 grees of freedom of the hierachical system. Once the data was converted into body-centered coordinates, it was then made invariant to rotations and translations in the ground plane. 2.3 Ground Plane Invariance As in [1], to make the model as general as possible, it was required to standardize the starting position and direction of the pigeon in each data sequence. Since we were interested in learning pigeon behaviour and not specific locations of the pigeon, rotations about the ground plane vertical and movement in the ground plane itself were considered as velocities. Thus, instead of modeling absolute values of position and rotation, the data was converted into differences of these quantities between frames. This still allowed the model to capture the walking behaviour and to generate new motion from any initial position and direction. The ground plane vertical used was the negative z-axis of the global coordinate system v up. Rotating the x-axis and y-axis unit vectors of the global coordinate system into body coordinates using B R G gave two vectors that we could compare to the vertical to get the bend and tilt angles of the torso: ( [ 1] BR θ tilt = cos 1 G G [1 ] T B R G G [1 ] T ( [ 1] BR θ bend = cos 1 G G [ 1 ] T B R G G [ 1 ] T The above two angles represent the first two of the six degrees of freedom stored for the body segment. By projecting the forward facing vector of the body segment into the ground plane and calculating its x and y offsets, the ground-plane rotation was computed as: ( ) θ ground = tan 1 yground x ground Note that all rotations were phase un-wrapped so that angles were not restricted to be [ π,π]. This was done because the pigeon could potentially rotate more than 2π radians throughout a sequence. Since we wanted the velocities of that rotation and the translations in the ground plane x and y directions, first differences were taken for each of the quantities: ) ) (28) (29) (3) G θ ground (i) = G θ ground (i + 1) G θ ground (i) (31) G x O(B) (i) = G x O(B) (i + 1) G x O(B) (i) (32) G y O(B) (i) = G y O(B) (i + 1) G y O(B) (i) (33) G θ ground (i) was stored as the third rotational dof for the body segment. The final step was to convert the translational velocities to body centered representation by projecting the movements onto the forward and left facing vectors Gˆx (B) and G ŷ (B) of the body. This involved computing the magnitude of the velocities, computing the angles φ these velocities made, and projecting onto 12

20 each axis using the θ and φ angles as follows: vel(i) = ( G x O(B) (i)) 2 + ( G y O(B) (i)) 2 (34) ( ) G φ(i) = tan 1 y O(B) (i) G (35) x O(B) (i) trans1 = G x O(B) (i) = vel(i) cos( θ ground (i) φ(i)) (36) trans2 = G y O(B) (i) = vel(i) sin( θ ground (i) φ(i)) (37) These final two quantities, along with the z-component of G p O(B) (expected not to vary significantly throughout the motion and therefore not represented as a velocity), are the three translational dof stored for the body segment. 2.4 Data Normalization and Batch Preparation In the final representation there were 6 degrees of freedom (dof) per frame for each of the 4 segments, giving a total of 24 real values. Also, unlike [1], all translational dof were included to account for the articulated, multi-segment nature of the neck and legs. As a final step, after all velocities had been computed, the resulting data was scaled to have zero mean and unit variance for each of the 24 dof. This provides a smaller range centered at zero for the gaussian visible units to model. The data was re-scaled before measurement or playback. When a long sequence of motion was preprocessed, it was split into mini-batches of 1 frames for training. The order in which these batches were used during training was also randomized. These two techniques vastly improved the training results. The mini-batch learning provided more frequent updates than full batch learning and smoother updates than doing online gradient descent while the randomized order prevented the model from settling into a poor local minima based on the order of the sequences presented during training. After a new sequence was generated in the normalized space, all the preceding steps were carried out in reverse to get back to the global coordinates of the marker positions. The detailed steps have not been outlined here as this procedure is straightforward using the above information. 13

21 3 Motion Generation 3.1 Generation of Novel Motion Both one and two layer models were trained using each of the available data sets to visually determine which generated sequences tended to most accurately represent pigeon behaviour. For each data set, every frame was analyzed to ensure the data was free of discontinuities. Due to the large torso of the pigeon, the foot markers are often occluded during motion capture, thus this further analysis of every frame was necessary to ensure frames where markers were occluded were not used in training. If they were used, the model would learn the large gaps in the dof values that occur when a marker is occluded, giving a poor, noisy representation of the pigeon motion. Once trained, it was visually evident that two layer models produced better results compared to single layer models. The second layer tended to reduce noise in the generated sequences while adding more variety to the walking pattern in the long term. This comparison between models is further quantified later in this report. Additionally, sequences which were split often due to occluded markers tended to produce poor models. With smaller sequences of continuous motion, the model was unable to learn the long term dependencies used to generate good quality motion. One long sequence however was uninterrupted by occlusion and provided a great training set with a large variety of standing and walking in all directions. Another aspect of the data tested heavily while training models was the frames per second (fps) at which the data was presented to the CRBM. Originally the data was 12 fps, but it was thought that limiting the data to 3 fps would result in more significant differences in each dof between frames, thus providing a better training signal for the model. This allowed CRBMs of order 3 to be used and accurately reproduce motion (at 3 fps). However, after much comparison between models at 12 fps of order 12 and the 3 fps models of order 3, it was determined that the former models produced better results. Using the single continuous data set, the training parameters were adjusted to find an optimal model. After tuning the parameters, the best model was found to be a Gaussian-binary CRBM trained on frames of pigeon Mocap data at 12 fps by following the procedure described in Section 1.3. Once this first layer was trained, a second binary-binary CRBM was trained on the real-valued probabilities of the hidden layer while driven by the training data. Each CRBM had 6 hidden units. The learning rates for all parameters were 1 1 3, except for the autoregressive weights between visible units (1 1 5 ). During training a momentum term was also used where.9 of the previous accumulated gradient was added to the current gradient. Alternating Gibbs sampling was conducted for 1 steps (i.e. CD-1) during the training. Each of the two layers were conditioned on 12 previous frames. Once trained, this CRBM was used to generate novel pigeon motion. Generation from a trained 2-layer model proceeds as follows: initialize with 24 frames of training data at the visible layer and perform a single up-pass to arrive at activations at the 1st hidden layer (H1) providing 12 frames of 14

22 H1 initialization. The current time step at H1 is then initialized with the previous frame plus some Gaussian noise. Since the hidden units are binary, the random noise is not added to the resulting activations, but instead added to the logit (inverse of the sigmoid function) term which is then passed through the sigmoid function to get updated activations. Alternating Gibbs sampling is then performed between the two hidden layers, conditioning on the 12 frames of H1 initialization. Gibbs sampling provides activations at H1 and H2. The former is used to perform a single downpass to the visible layer using the weights of the 1st level CRBM, conditioning on the last 12 frames of visible initialization data. The above procedure is repeated beginning with initialization of the next frame of H1 using the current activations plus a small amount of Gaussian noise. Figure 9: Stroboscopic image from a walking pigeon. Figure reproduced from [15]. 25 z (mm) y (mm) x (mm) Figure 1: Example hold-phases present in generated motion. 15

23 In order to carry out experiments that elicit responses in real pigeons from a virtual pigeon driven by our model, we needed to first determine if our generated motion captured the subtle characteristics of pigeon motion. Fixed foot plants while walking and standing were some of such subtleties present in generating human motion [1] that are also present in pigeon motion. There is an additional characteristic in pigeon motion that must also be modeled, namely the complex head-bobbing which is defined by distinct alternations between thrust-phase and hold-phase. In the hold-phase, the head remains stationary in both position and rotation, while in the thrust-phase, the head quickly translates and rotates to the next hold-phase as seen in a stroboscopic image of a live pigeon (Figure 9) [1]. In generating from our model, we could immediately see head motion that closely resembled this complex head-bobbing behaviour (Figure 1). The collection of head coordinate frames in this generated motion shows the distinct hold-phase present while the collection of foot coordinate frames demonstrate the concrete foot plants. For various videos of generated motion see: Quantifying Head Motion Although the hold-phase was visually present in the generated motion, we have sought a quantitative comparison to the real motion capture data. To quantify the hold-phase, we measured the head frame s yaw axis and the path length of the head in the horizontal plane with respect to the global coordinate system [1]. In order to justify the performance of our model, we implemented a 3rd order autoregressive (AR) model fit by regularized least squares. Figure 11 shows a comparison between the training data, generated data from a 2-layer CRBM model, and the AR-3 baseline method. The second derivative of each line in the plot was calculated in order to detect the hold phases automatically. Where this second derivative was a maximum or a minimum corresponded to the leveling of the hold-phase regions (grey areas in Figure 11). Within these regions the standard deviations of each measurement were calculated, and the mean of all the regions within each plot was computed. These calculations were also done for the thrust-phase regions of the plots and are shown below, along with the hold-phase results, in Table 2. Model/Phase Path Length (mm) Yaw Angle (degrees) Training Data Hold Phase Training Data Thrust Phase layer CRBM Hold Phase layer CRBM Thrust Phase AR(3) Hold Phase AR(3) Thrust Phase Table 2: Comparison of Mean Standard Deviation in Hold and Thrust Phases. As seen in Figure 11, the training data displayed a step-like pattern in both the yaw angle and path length plots where the flat portions of each plot represent the hold-phase of the head mo- 16

24 mm mm mm 1 5 Path Length Training Data Frame Path Length 2 L CRBM Generation Frame Path Length AR(12) Generation Frame degrees degrees degrees Yaw Rotation Training Data Frame Yaw Rotation 2 L CRBM Generation Frame Yaw Rotation AR(12) Generation Frame Figure 11: Hold-Phases present in path length and yaw angles (shaded grey). tion. Similarly, the data generated from the 2-layer CRBM model clearly exhibited this behaviour. This is also evident by observing the large difference in standard deviations in Table 2. Due to the stochastic nature of the CRBM model, the standard deviations in the hold phase of the generated motion were greater than those of the original motion capture data. Since the CRBMs used here model ground-plane velocities instead of positions, any noise introduced by sampling was integrated throughout the output sequence during post-processing. Another aspect to note about the generated data is the smaller standard deviations of the holdphase for rotations compared to those for translations. This is likely due to the normalization step done during pre-processing of the data. The translations are typically larger numbers (expressed in mm) compared to the rotations (expressed in exponential map representation), thus any comparable noise is magnified in the translational dimensions when the data is re-scaled. One way to combat these results would be to adjust the precision of the Gaussian units, 1/σ i. From previous experience (see [1]), the precision of the Gaussian units works well when fixed at 1, which was done throughout our experiments. However, adjusting this parameter to a larger fixed value or learning this parameter during training could improve the results. 17

25 4 Prediction of Missing Foot Markers 4.1 Marker Occlusion Due to the fixed positions of the motion capture cameras, the feet of the pigeons were often occluded by the feathers and the relatively large torso of the pigeon. Frames with missing foot data could be ignored when training, but that limits the number of good frames with which we can train. As discussed earlier, training on data without the occluded frames generally offered poor results due to the shorter continuous sequence lengths. Thus it became a goal to remedy this marker occlusion problem. One method to solve this problem, as shown in Figure 12, is to predict the 6 degrees of freedom of a missing foot (shown with black background) based on the 18 remaining frames from the other foot, the head, and the body as well as the previous visible vectors and current hidden units. h This was done using two methods: alternating Gibbs sampling much like the generation algorithm, or by minimizing the free energy over the units. This latter method was only used for 1-layer models as it should provide an optimal result regardless of additional layers. More details of these v methods appear in the following two sections, but one other consideration was made before either procedure was used. A special normalization technique was used before the prediction t 3 t 2 t 1 t took place due to the presence of bad frames in the data. If the entire Figure 12: 1-layer data set (containing both good and bad frames) were to be used for normalization, CRBM showing then the calculated mean and standard deviation of the data prediction problem. would have been skewed heavily by the bad frames. When markers were occluded during capture, they were automatically set by the motion capture system to (,,) in global coordinates. This caused a large discontinuity in the data between good and frames. Therefore, to normalize properly, all frames with missing marker data were removed. Although this decreased the amount of valid data used for normalization, it prevented skewing of the normalized values and thus was used throughout the prediction experiments. 4.2 Method 1: Alternating Gibbs Sampling The first of the two methods for predicting missing information used alternating Gibbs sampling. This method was identical to 1-step generation except only the missing markers were reconstructed during alternating Gibbs sampling. To begin, the unknown markers were initialized to their values at the previous frame plus a small amount of Gaussian noise. The alternating Gibbs sampling then proceeded for 3 iterations utilizing all the connections of the CRBM. For a 1-layer model, this sampling updated all the hidden units using the initialization of the missing information in the current frame as well as the real data for the remaining dof of the current frame and all dof of the previous frames. Since the CRBM models were found to give best 18

26 results at order 12, at least 12 good frames at the beginning of a gap were needed for initializing the 1-layer model. On each downward pass of the Gibbs sampling, only the missing dimensions were filled in, providing a reasonable ratio between valid data to condition on and the missing data to be filled in. For a 2-layer model, we used a slightly different procedure than the traditional Gibbs sampling used for 2-layer CRBM motion generation discussed in Section 4.1. Aside from the obvious method of only updating the missing dof, the alternating Gibbs sampling used for prediction implemented a linear blending between visible and hidden layers. As before, the first hidden layer at the current time step was initialized to the values from the previous time step plus Gaussian noise added to the logits. The next step consisted of simultaneously conducting an upward pass to update the second hidden layer in parallel as well as updating the missing visible dof with a downward pass. Then, the second step of the alternating Gibbs sampling blended a downward pass from the second hidden layer with an upward pass from the visibles to get activations for the first hidden layer. The ratio of blending was controlled by a parameter, λ, which scaled the H2 inputs while (1 λ) scaled the visible inputs. Thus, by setting λ = a 1-layer representation was formed while setting λ = 1 limited the Gibbs sampling to only be between hidden layers before doing a single downward pass. In practice, setting λ =.1 worked best (see Figure 13). This is likely due to the heavier reliance on the visible information at both the current and previous time steps. Since the goal of this procedure was to fill in some dof of this visible data, knowledge of the remaining dof at the current time step had a much greater impact on the prediction performance than the long term information which results from using the second hidden layer. Average Prediction Error (mm) Effect of λ on Prediction Errors λ Figure 13: Effects of changing the ratio of visible influense, (λ). In all the cases above, regardless of changing the λ value, the order of the model, or how many layers were used, the alternating Gibbs sampling was subject to noise. This noise was the reason why we looked for an alternative procedure for filling in the missing marker data which marginalizes out the hidden units. 4.3 Method 2: Free-Energy Minimization Since the hidden variables are binary, they can be integrated out giving the free energy formulation of the system given the past observations and model parameters. In the expression (Eq. 7) 19

Modeling pigeon behaviour using a Conditional Restricted Boltzmann Machine

Modeling pigeon behaviour using a Conditional Restricted Boltzmann Machine Modeling pigeon behaviour using a Conditional Restricted Boltzmann Machine Matthew D. Zeiler 1,GrahamW.Taylor 1, Nikolaus F. Troje 2 and Geoffrey E. Hinton 1 1- University of Toronto - Dept. of Computer

More information

A Fast Learning Algorithm for Deep Belief Nets

A Fast Learning Algorithm for Deep Belief Nets A Fast Learning Algorithm for Deep Belief Nets Geoffrey E. Hinton, Simon Osindero Department of Computer Science University of Toronto, Toronto, Canada Yee-Whye Teh Department of Computer Science National

More information

Energy Based Models, Restricted Boltzmann Machines and Deep Networks. Jesse Eickholt

Energy Based Models, Restricted Boltzmann Machines and Deep Networks. Jesse Eickholt Energy Based Models, Restricted Boltzmann Machines and Deep Networks Jesse Eickholt ???? Who s heard of Energy Based Models (EBMs) Restricted Boltzmann Machines (RBMs) Deep Belief Networks Auto-encoders

More information

Two Distributed-State Models For Generating High-Dimensional Time Series

Two Distributed-State Models For Generating High-Dimensional Time Series Journal of Machine Learning Research 12 (2011) 1025-1068 Submitted 7/10; Revised 1/11; Published 3/11 Two Distributed-State Models For Generating High-Dimensional Time Series Graham W. Taylor Courant Institute

More information

Deep Boltzmann Machines

Deep Boltzmann Machines Deep Boltzmann Machines Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines

More information

Efficient Feature Learning Using Perturb-and-MAP

Efficient Feature Learning Using Perturb-and-MAP Efficient Feature Learning Using Perturb-and-MAP Ke Li, Kevin Swersky, Richard Zemel Dept. of Computer Science, University of Toronto {keli,kswersky,zemel}@cs.toronto.edu Abstract Perturb-and-MAP [1] is

More information

Static Gesture Recognition with Restricted Boltzmann Machines

Static Gesture Recognition with Restricted Boltzmann Machines Static Gesture Recognition with Restricted Boltzmann Machines Peter O Donovan Department of Computer Science, University of Toronto 6 Kings College Rd, M5S 3G4, Canada odonovan@dgp.toronto.edu Abstract

More information

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer

More information

THE preceding chapters were all devoted to the analysis of images and signals which

THE preceding chapters were all devoted to the analysis of images and signals which Chapter 5 Segmentation of Color, Texture, and Orientation Images THE preceding chapters were all devoted to the analysis of images and signals which take values in IR. It is often necessary, however, to

More information

Visualization and Analysis of Inverse Kinematics Algorithms Using Performance Metric Maps

Visualization and Analysis of Inverse Kinematics Algorithms Using Performance Metric Maps Visualization and Analysis of Inverse Kinematics Algorithms Using Performance Metric Maps Oliver Cardwell, Ramakrishnan Mukundan Department of Computer Science and Software Engineering University of Canterbury

More information

Implicit Mixtures of Restricted Boltzmann Machines

Implicit Mixtures of Restricted Boltzmann Machines Implicit Mixtures of Restricted Boltzmann Machines Vinod Nair and Geoffrey Hinton Department of Computer Science, University of Toronto 10 King s College Road, Toronto, M5S 3G5 Canada {vnair,hinton}@cs.toronto.edu

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

Deep Generative Models Variational Autoencoders

Deep Generative Models Variational Autoencoders Deep Generative Models Variational Autoencoders Sudeshna Sarkar 5 April 2017 Generative Nets Generative models that represent probability distributions over multiple variables in some way. Directed Generative

More information

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used. 1 4.12 Generalization In back-propagation learning, as many training examples as possible are typically used. It is hoped that the network so designed generalizes well. A network generalizes well when

More information

Training Restricted Boltzmann Machines using Approximations to the Likelihood Gradient

Training Restricted Boltzmann Machines using Approximations to the Likelihood Gradient Training Restricted Boltzmann Machines using Approximations to the Likelihood Gradient Tijmen Tieleman tijmen@cs.toronto.edu Department of Computer Science, University of Toronto, Toronto, Ontario M5S

More information

Akarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction

Akarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction Akarsh Pokkunuru EECS Department 03-16-2017 Contractive Auto-Encoders: Explicit Invariance During Feature Extraction 1 AGENDA Introduction to Auto-encoders Types of Auto-encoders Analysis of different

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

Rate-coded Restricted Boltzmann Machines for Face Recognition

Rate-coded Restricted Boltzmann Machines for Face Recognition Rate-coded Restricted Boltzmann Machines for Face Recognition Yee Whye Teh Department of Computer Science University of Toronto Toronto M5S 2Z9 Canada ywteh@cs.toronto.edu Geoffrey E. Hinton Gatsby Computational

More information

Guidelines for proper use of Plate elements

Guidelines for proper use of Plate elements Guidelines for proper use of Plate elements In structural analysis using finite element method, the analysis model is created by dividing the entire structure into finite elements. This procedure is known

More information

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013 Machine Learning Topic 5: Linear Discriminants Bryan Pardo, EECS 349 Machine Learning, 2013 Thanks to Mark Cartwright for his extensive contributions to these slides Thanks to Alpaydin, Bishop, and Duda/Hart/Stork

More information

Variational Methods for Graphical Models

Variational Methods for Graphical Models Chapter 2 Variational Methods for Graphical Models 2.1 Introduction The problem of probabb1istic inference in graphical models is the problem of computing a conditional probability distribution over the

More information

Stacked Denoising Autoencoders for Face Pose Normalization

Stacked Denoising Autoencoders for Face Pose Normalization Stacked Denoising Autoencoders for Face Pose Normalization Yoonseop Kang 1, Kang-Tae Lee 2,JihyunEun 2, Sung Eun Park 2 and Seungjin Choi 1 1 Department of Computer Science and Engineering Pohang University

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Image Data: Classification via Neural Networks Instructor: Yizhou Sun yzsun@ccs.neu.edu November 19, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining

More information

Neural Networks for Machine Learning. Lecture 15a From Principal Components Analysis to Autoencoders

Neural Networks for Machine Learning. Lecture 15a From Principal Components Analysis to Autoencoders Neural Networks for Machine Learning Lecture 15a From Principal Components Analysis to Autoencoders Geoffrey Hinton Nitish Srivastava, Kevin Swersky Tijmen Tieleman Abdel-rahman Mohamed Principal Components

More information

Modeling Plant Succession with Markov Matrices

Modeling Plant Succession with Markov Matrices Modeling Plant Succession with Markov Matrices 1 Modeling Plant Succession with Markov Matrices Concluding Paper Undergraduate Biology and Math Training Program New Jersey Institute of Technology Catherine

More information

SPECIAL TECHNIQUES-II

SPECIAL TECHNIQUES-II SPECIAL TECHNIQUES-II Lecture 19: Electromagnetic Theory Professor D. K. Ghosh, Physics Department, I.I.T., Bombay Method of Images for a spherical conductor Example :A dipole near aconducting sphere The

More information

To be Bernoulli or to be Gaussian, for a Restricted Boltzmann Machine

To be Bernoulli or to be Gaussian, for a Restricted Boltzmann Machine 2014 22nd International Conference on Pattern Recognition To be Bernoulli or to be Gaussian, for a Restricted Boltzmann Machine Takayoshi Yamashita, Masayuki Tanaka, Eiji Yoshida, Yuji Yamauchi and Hironobu

More information

EE368 Project Report CD Cover Recognition Using Modified SIFT Algorithm

EE368 Project Report CD Cover Recognition Using Modified SIFT Algorithm EE368 Project Report CD Cover Recognition Using Modified SIFT Algorithm Group 1: Mina A. Makar Stanford University mamakar@stanford.edu Abstract In this report, we investigate the application of the Scale-Invariant

More information

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS Cognitive Robotics Original: David G. Lowe, 004 Summary: Coen van Leeuwen, s1460919 Abstract: This article presents a method to extract

More information

17. SEISMIC ANALYSIS MODELING TO SATISFY BUILDING CODES

17. SEISMIC ANALYSIS MODELING TO SATISFY BUILDING CODES 17. SEISMIC ANALYSIS MODELING TO SATISFY BUILDING CODES The Current Building Codes Use the Terminology: Principal Direction without a Unique Definition 17.1 INTRODUCTION { XE "Building Codes" }Currently

More information

3 : Representation of Undirected GMs

3 : Representation of Undirected GMs 0-708: Probabilistic Graphical Models 0-708, Spring 202 3 : Representation of Undirected GMs Lecturer: Eric P. Xing Scribes: Nicole Rafidi, Kirstin Early Last Time In the last lecture, we discussed directed

More information

Ultrasonic Multi-Skip Tomography for Pipe Inspection

Ultrasonic Multi-Skip Tomography for Pipe Inspection 18 th World Conference on Non destructive Testing, 16-2 April 212, Durban, South Africa Ultrasonic Multi-Skip Tomography for Pipe Inspection Arno VOLKER 1, Rik VOS 1 Alan HUNTER 1 1 TNO, Stieltjesweg 1,

More information

Using surface markings to enhance accuracy and stability of object perception in graphic displays

Using surface markings to enhance accuracy and stability of object perception in graphic displays Using surface markings to enhance accuracy and stability of object perception in graphic displays Roger A. Browse a,b, James C. Rodger a, and Robert A. Adderley a a Department of Computing and Information

More information

Themes in the Texas CCRS - Mathematics

Themes in the Texas CCRS - Mathematics 1. Compare real numbers. a. Classify numbers as natural, whole, integers, rational, irrational, real, imaginary, &/or complex. b. Use and apply the relative magnitude of real numbers by using inequality

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 5 Inference

More information

[ Ω 1 ] Diagonal matrix of system 2 (updated) eigenvalues [ Φ 1 ] System 1 modal matrix [ Φ 2 ] System 2 (updated) modal matrix Φ fb

[ Ω 1 ] Diagonal matrix of system 2 (updated) eigenvalues [ Φ 1 ] System 1 modal matrix [ Φ 2 ] System 2 (updated) modal matrix Φ fb Proceedings of the IMAC-XXVIII February 1 4, 2010, Jacksonville, Florida USA 2010 Society for Experimental Mechanics Inc. Modal Test Data Adjustment For Interface Compliance Ryan E. Tuttle, Member of the

More information

Dynamic Human Shape Description and Characterization

Dynamic Human Shape Description and Characterization Dynamic Human Shape Description and Characterization Z. Cheng*, S. Mosher, Jeanne Smith H. Cheng, and K. Robinette Infoscitex Corporation, Dayton, Ohio, USA 711 th Human Performance Wing, Air Force Research

More information

Level-set MCMC Curve Sampling and Geometric Conditional Simulation

Level-set MCMC Curve Sampling and Geometric Conditional Simulation Level-set MCMC Curve Sampling and Geometric Conditional Simulation Ayres Fan John W. Fisher III Alan S. Willsky February 16, 2007 Outline 1. Overview 2. Curve evolution 3. Markov chain Monte Carlo 4. Curve

More information

Neural Networks: promises of current research

Neural Networks: promises of current research April 2008 www.apstat.com Current research on deep architectures A few labs are currently researching deep neural network training: Geoffrey Hinton s lab at U.Toronto Yann LeCun s lab at NYU Our LISA lab

More information

Introduction to Solid Modeling Using SolidWorks 2008 COSMOSMotion Tutorial Page 1

Introduction to Solid Modeling Using SolidWorks 2008 COSMOSMotion Tutorial Page 1 Introduction to Solid Modeling Using SolidWorks 2008 COSMOSMotion Tutorial Page 1 In this tutorial, we will learn the basics of performing motion analysis using COSMOSMotion. Although the tutorial can

More information

Articulated Characters

Articulated Characters Articulated Characters Skeleton A skeleton is a framework of rigid body bones connected by articulated joints Used as an (invisible?) armature to position and orient geometry (usually surface triangles)

More information

Deep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES

Deep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES Deep Learning Practical introduction with Keras Chapter 3 27/05/2018 Neuron A neural network is formed by neurons connected to each other; in turn, each connection of one neural network is associated

More information

CS4758: Rovio Augmented Vision Mapping Project

CS4758: Rovio Augmented Vision Mapping Project CS4758: Rovio Augmented Vision Mapping Project Sam Fladung, James Mwaura Abstract The goal of this project is to use the Rovio to create a 2D map of its environment using a camera and a fixed laser pointer

More information

Estimating Human Pose in Images. Navraj Singh December 11, 2009

Estimating Human Pose in Images. Navraj Singh December 11, 2009 Estimating Human Pose in Images Navraj Singh December 11, 2009 Introduction This project attempts to improve the performance of an existing method of estimating the pose of humans in still images. Tasks

More information

Driven Cavity Example

Driven Cavity Example BMAppendixI.qxd 11/14/12 6:55 PM Page I-1 I CFD Driven Cavity Example I.1 Problem One of the classic benchmarks in CFD is the driven cavity problem. Consider steady, incompressible, viscous flow in a square

More information

Combining PGMs and Discriminative Models for Upper Body Pose Detection

Combining PGMs and Discriminative Models for Upper Body Pose Detection Combining PGMs and Discriminative Models for Upper Body Pose Detection Gedas Bertasius May 30, 2014 1 Introduction In this project, I utilized probabilistic graphical models together with discriminative

More information

Generalized Network Flow Programming

Generalized Network Flow Programming Appendix C Page Generalized Network Flow Programming This chapter adapts the bounded variable primal simplex method to the generalized minimum cost flow problem. Generalized networks are far more useful

More information

UNIT 1: NUMBER LINES, INTERVALS, AND SETS

UNIT 1: NUMBER LINES, INTERVALS, AND SETS ALGEBRA II CURRICULUM OUTLINE 2011-2012 OVERVIEW: 1. Numbers, Lines, Intervals and Sets 2. Algebraic Manipulation: Rational Expressions and Exponents 3. Radicals and Radical Equations 4. Function Basics

More information

Tutorial 1: Welded Frame - Problem Description

Tutorial 1: Welded Frame - Problem Description Tutorial 1: Welded Frame - Problem Description Introduction In this first tutorial, we will analyse a simple frame: firstly as a welded frame, and secondly as a pin jointed truss. In each case, we will

More information

An object in 3D space

An object in 3D space An object in 3D space An object's viewpoint Every Alice object has a viewpoint. The viewpoint of an object is determined by: The position of the object in 3D space. The orientation of the object relative

More information

Motion Capture & Simulation

Motion Capture & Simulation Motion Capture & Simulation Motion Capture Character Reconstructions Joint Angles Need 3 points to compute a rigid body coordinate frame 1 st point gives 3D translation, 2 nd point gives 2 angles, 3 rd

More information

10703 Deep Reinforcement Learning and Control

10703 Deep Reinforcement Learning and Control 10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Policy Gradient I Used Materials Disclaimer: Much of the material and slides for this lecture

More information

Hartley - Zisserman reading club. Part I: Hartley and Zisserman Appendix 6: Part II: Zhengyou Zhang: Presented by Daniel Fontijne

Hartley - Zisserman reading club. Part I: Hartley and Zisserman Appendix 6: Part II: Zhengyou Zhang: Presented by Daniel Fontijne Hartley - Zisserman reading club Part I: Hartley and Zisserman Appendix 6: Iterative estimation methods Part II: Zhengyou Zhang: A Flexible New Technique for Camera Calibration Presented by Daniel Fontijne

More information

A Short SVM (Support Vector Machine) Tutorial

A Short SVM (Support Vector Machine) Tutorial A Short SVM (Support Vector Machine) Tutorial j.p.lewis CGIT Lab / IMSC U. Southern California version 0.zz dec 004 This tutorial assumes you are familiar with linear algebra and equality-constrained optimization/lagrange

More information

NEURAL NETWORK VISUALIZATION

NEURAL NETWORK VISUALIZATION Neural Network Visualization 465 NEURAL NETWORK VISUALIZATION Jakub Wejchert Gerald Tesauro IB M Research T.J. Watson Research Center Yorktown Heights NY 10598 ABSTRACT We have developed graphics to visualize

More information

CS231A Course Notes 4: Stereo Systems and Structure from Motion

CS231A Course Notes 4: Stereo Systems and Structure from Motion CS231A Course Notes 4: Stereo Systems and Structure from Motion Kenji Hata and Silvio Savarese 1 Introduction In the previous notes, we covered how adding additional viewpoints of a scene can greatly enhance

More information

Fast Automated Estimation of Variance in Discrete Quantitative Stochastic Simulation

Fast Automated Estimation of Variance in Discrete Quantitative Stochastic Simulation Fast Automated Estimation of Variance in Discrete Quantitative Stochastic Simulation November 2010 Nelson Shaw njd50@uclive.ac.nz Department of Computer Science and Software Engineering University of Canterbury,

More information

CHAPTER 5 MOTION DETECTION AND ANALYSIS

CHAPTER 5 MOTION DETECTION AND ANALYSIS CHAPTER 5 MOTION DETECTION AND ANALYSIS 5.1. Introduction: Motion processing is gaining an intense attention from the researchers with the progress in motion studies and processing competence. A series

More information

CS 231A Computer Vision (Fall 2012) Problem Set 3

CS 231A Computer Vision (Fall 2012) Problem Set 3 CS 231A Computer Vision (Fall 2012) Problem Set 3 Due: Nov. 13 th, 2012 (2:15pm) 1 Probabilistic Recursion for Tracking (20 points) In this problem you will derive a method for tracking a point of interest

More information

Training Restricted Boltzmann Machines using Approximations to the Likelihood Gradient. Ali Mirzapour Paper Presentation - Deep Learning March 7 th

Training Restricted Boltzmann Machines using Approximations to the Likelihood Gradient. Ali Mirzapour Paper Presentation - Deep Learning March 7 th Training Restricted Boltzmann Machines using Approximations to the Likelihood Gradient Ali Mirzapour Paper Presentation - Deep Learning March 7 th 1 Outline of the Presentation Restricted Boltzmann Machine

More information

Integers & Absolute Value Properties of Addition Add Integers Subtract Integers. Add & Subtract Like Fractions Add & Subtract Unlike Fractions

Integers & Absolute Value Properties of Addition Add Integers Subtract Integers. Add & Subtract Like Fractions Add & Subtract Unlike Fractions Unit 1: Rational Numbers & Exponents M07.A-N & M08.A-N, M08.B-E Essential Questions Standards Content Skills Vocabulary What happens when you add, subtract, multiply and divide integers? What happens when

More information

Graph-based High Level Motion Segmentation using Normalized Cuts

Graph-based High Level Motion Segmentation using Normalized Cuts Graph-based High Level Motion Segmentation using Normalized Cuts Sungju Yun, Anjin Park and Keechul Jung Abstract Motion capture devices have been utilized in producing several contents, such as movies

More information

Radial Basis Function Networks: Algorithms

Radial Basis Function Networks: Algorithms Radial Basis Function Networks: Algorithms Neural Computation : Lecture 14 John A. Bullinaria, 2015 1. The RBF Mapping 2. The RBF Network Architecture 3. Computational Power of RBF Networks 4. Training

More information

Rectification and Distortion Correction

Rectification and Distortion Correction Rectification and Distortion Correction Hagen Spies March 12, 2003 Computer Vision Laboratory Department of Electrical Engineering Linköping University, Sweden Contents Distortion Correction Rectification

More information

Revision of the SolidWorks Variable Pressure Simulation Tutorial J.E. Akin, Rice University, Mechanical Engineering. Introduction

Revision of the SolidWorks Variable Pressure Simulation Tutorial J.E. Akin, Rice University, Mechanical Engineering. Introduction Revision of the SolidWorks Variable Pressure Simulation Tutorial J.E. Akin, Rice University, Mechanical Engineering Introduction A SolidWorks simulation tutorial is just intended to illustrate where to

More information

Chapter 4. The Classification of Species and Colors of Finished Wooden Parts Using RBFNs

Chapter 4. The Classification of Species and Colors of Finished Wooden Parts Using RBFNs Chapter 4. The Classification of Species and Colors of Finished Wooden Parts Using RBFNs 4.1 Introduction In Chapter 1, an introduction was given to the species and color classification problem of kitchen

More information

CS 664 Segmentation. Daniel Huttenlocher

CS 664 Segmentation. Daniel Huttenlocher CS 664 Segmentation Daniel Huttenlocher Grouping Perceptual Organization Structural relationships between tokens Parallelism, symmetry, alignment Similarity of token properties Often strong psychophysical

More information

Character Recognition

Character Recognition Character Recognition 5.1 INTRODUCTION Recognition is one of the important steps in image processing. There are different methods such as Histogram method, Hough transformation, Neural computing approaches

More information

Geometry and Gravitation

Geometry and Gravitation Chapter 15 Geometry and Gravitation 15.1 Introduction to Geometry Geometry is one of the oldest branches of mathematics, competing with number theory for historical primacy. Like all good science, its

More information

Autoencoders, denoising autoencoders, and learning deep networks

Autoencoders, denoising autoencoders, and learning deep networks 4 th CiFAR Summer School on Learning and Vision in Biology and Engineering Toronto, August 5-9 2008 Autoencoders, denoising autoencoders, and learning deep networks Part II joint work with Hugo Larochelle,

More information

An Evolutionary Approximation to Contrastive Divergence in Convolutional Restricted Boltzmann Machines

An Evolutionary Approximation to Contrastive Divergence in Convolutional Restricted Boltzmann Machines Wright State University CORE Scholar Browse all Theses and Dissertations Theses and Dissertations 2014 An Evolutionary Approximation to Contrastive Divergence in Convolutional Restricted Boltzmann Machines

More information

FAB verses tradition camera-based motion capture systems

FAB verses tradition camera-based motion capture systems FAB verses tradition camera-based motion capture systems The advent of micromachined inertial sensors, such as rate gyroscopes and accelerometers, has made new navigation and tracking technologies possible.

More information

Visual Recognition: Image Formation

Visual Recognition: Image Formation Visual Recognition: Image Formation Raquel Urtasun TTI Chicago Jan 5, 2012 Raquel Urtasun (TTI-C) Visual Recognition Jan 5, 2012 1 / 61 Today s lecture... Fundamentals of image formation You should know

More information

8 th Grade Pre Algebra Pacing Guide 1 st Nine Weeks

8 th Grade Pre Algebra Pacing Guide 1 st Nine Weeks 8 th Grade Pre Algebra Pacing Guide 1 st Nine Weeks MS Objective CCSS Standard I Can Statements Included in MS Framework + Included in Phase 1 infusion Included in Phase 2 infusion 1a. Define, classify,

More information

CS 223B Computer Vision Problem Set 3

CS 223B Computer Vision Problem Set 3 CS 223B Computer Vision Problem Set 3 Due: Feb. 22 nd, 2011 1 Probabilistic Recursion for Tracking In this problem you will derive a method for tracking a point of interest through a sequence of images.

More information

MINI-PAPER A Gentle Introduction to the Analysis of Sequential Data

MINI-PAPER A Gentle Introduction to the Analysis of Sequential Data MINI-PAPER by Rong Pan, Ph.D., Assistant Professor of Industrial Engineering, Arizona State University We, applied statisticians and manufacturing engineers, often need to deal with sequential data, which

More information

Lecture 13. Deep Belief Networks. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen

Lecture 13. Deep Belief Networks. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen Lecture 13 Deep Belief Networks Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen IBM T.J. Watson Research Center Yorktown Heights, New York, USA {picheny,bhuvana,stanchen}@us.ibm.com 12 December 2012

More information

Recurrent Neural Network (RNN) Industrial AI Lab.

Recurrent Neural Network (RNN) Industrial AI Lab. Recurrent Neural Network (RNN) Industrial AI Lab. For example (Deterministic) Time Series Data Closed- form Linear difference equation (LDE) and initial condition High order LDEs 2 (Stochastic) Time Series

More information

An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation

An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation Hugo Larochelle, Dumitru Erhan, Aaron Courville, James Bergstra, and Yoshua Bengio Université de Montréal 13/06/2007

More information

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Reference Most of the slides are taken from the third chapter of the online book by Michael Nielson: neuralnetworksanddeeplearning.com

More information

Experiments with Edge Detection using One-dimensional Surface Fitting

Experiments with Edge Detection using One-dimensional Surface Fitting Experiments with Edge Detection using One-dimensional Surface Fitting Gabor Terei, Jorge Luis Nunes e Silva Brito The Ohio State University, Department of Geodetic Science and Surveying 1958 Neil Avenue,

More information

Numenta Node Algorithms Guide NuPIC 1.7

Numenta Node Algorithms Guide NuPIC 1.7 1 NuPIC 1.7 includes early implementations of the second generation of the Numenta HTM learning algorithms. These algorithms are available as two node types: SpatialPoolerNode and TemporalPoolerNode. This

More information

Recent Developments in Model-based Derivative-free Optimization

Recent Developments in Model-based Derivative-free Optimization Recent Developments in Model-based Derivative-free Optimization Seppo Pulkkinen April 23, 2010 Introduction Problem definition The problem we are considering is a nonlinear optimization problem with constraints:

More information

Chapter 1: Introduction

Chapter 1: Introduction Chapter 1: Introduction This dissertation will describe the mathematical modeling and development of an innovative, three degree-of-freedom robotic manipulator. The new device, which has been named the

More information

Supplementary Materials for DVQA: Understanding Data Visualizations via Question Answering

Supplementary Materials for DVQA: Understanding Data Visualizations via Question Answering Supplementary Materials for DVQA: Understanding Data Visualizations via Question Answering Kushal Kafle 1, Brian Price 2 Scott Cohen 2 Christopher Kanan 1 1 Rochester Institute of Technology 2 Adobe Research

More information

8 th Grade Mathematics Unpacked Content For the new Common Core standards that will be effective in all North Carolina schools in the

8 th Grade Mathematics Unpacked Content For the new Common Core standards that will be effective in all North Carolina schools in the 8 th Grade Mathematics Unpacked Content For the new Common Core standards that will be effective in all North Carolina schools in the 2012-13. This document is designed to help North Carolina educators

More information

Supplementary Information. Design of Hierarchical Structures for Synchronized Deformations

Supplementary Information. Design of Hierarchical Structures for Synchronized Deformations Supplementary Information Design of Hierarchical Structures for Synchronized Deformations Hamed Seifi 1, Anooshe Rezaee Javan 1, Arash Ghaedizadeh 1, Jianhu Shen 1, Shanqing Xu 1, and Yi Min Xie 1,2,*

More information

Occluded Facial Expression Tracking

Occluded Facial Expression Tracking Occluded Facial Expression Tracking Hugo Mercier 1, Julien Peyras 2, and Patrice Dalle 1 1 Institut de Recherche en Informatique de Toulouse 118, route de Narbonne, F-31062 Toulouse Cedex 9 2 Dipartimento

More information

Logistic Regression and Gradient Ascent

Logistic Regression and Gradient Ascent Logistic Regression and Gradient Ascent CS 349-02 (Machine Learning) April 0, 207 The perceptron algorithm has a couple of issues: () the predictions have no probabilistic interpretation or confidence

More information

Using Subspace Constraints to Improve Feature Tracking Presented by Bryan Poling. Based on work by Bryan Poling, Gilad Lerman, and Arthur Szlam

Using Subspace Constraints to Improve Feature Tracking Presented by Bryan Poling. Based on work by Bryan Poling, Gilad Lerman, and Arthur Szlam Presented by Based on work by, Gilad Lerman, and Arthur Szlam What is Tracking? Broad Definition Tracking, or Object tracking, is a general term for following some thing through multiple frames of a video

More information

Keywords: clustering, construction, machine vision

Keywords: clustering, construction, machine vision CS4758: Robot Construction Worker Alycia Gailey, biomedical engineering, graduate student: asg47@cornell.edu Alex Slover, computer science, junior: ais46@cornell.edu Abstract: Progress has been made in

More information

10-701/15-781, Fall 2006, Final

10-701/15-781, Fall 2006, Final -7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly

More information

/10/$ IEEE 4048

/10/$ IEEE 4048 21 IEEE International onference on Robotics and Automation Anchorage onvention District May 3-8, 21, Anchorage, Alaska, USA 978-1-4244-54-4/1/$26. 21 IEEE 448 Fig. 2: Example keyframes of the teabox object.

More information

Ray Tracing through Viewing Portals

Ray Tracing through Viewing Portals Ray Tracing through Viewing Portals Introduction Chris Young Igor Stolarsky April 23, 2008 This paper presents a method for ray tracing scenes containing viewing portals circular planes that act as windows

More information

Artificial Neural Network-Based Prediction of Human Posture

Artificial Neural Network-Based Prediction of Human Posture Artificial Neural Network-Based Prediction of Human Posture Abstract The use of an artificial neural network (ANN) in many practical complicated problems encourages its implementation in the digital human

More information

An Efficient Method for Solving the Direct Kinematics of Parallel Manipulators Following a Trajectory

An Efficient Method for Solving the Direct Kinematics of Parallel Manipulators Following a Trajectory An Efficient Method for Solving the Direct Kinematics of Parallel Manipulators Following a Trajectory Roshdy Foaad Abo-Shanab Kafr Elsheikh University/Department of Mechanical Engineering, Kafr Elsheikh,

More information

Irradiance Gradients. Media & Occlusions

Irradiance Gradients. Media & Occlusions Irradiance Gradients in the Presence of Media & Occlusions Wojciech Jarosz in collaboration with Matthias Zwicker and Henrik Wann Jensen University of California, San Diego June 23, 2008 Wojciech Jarosz

More information

A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models

A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models Gleidson Pegoretti da Silva, Masaki Nakagawa Department of Computer and Information Sciences Tokyo University

More information

Slides credited from Dr. David Silver & Hung-Yi Lee

Slides credited from Dr. David Silver & Hung-Yi Lee Slides credited from Dr. David Silver & Hung-Yi Lee Review Reinforcement Learning 2 Reinforcement Learning RL is a general purpose framework for decision making RL is for an agent with the capacity to

More information

Logical Templates for Feature Extraction in Fingerprint Images

Logical Templates for Feature Extraction in Fingerprint Images Logical Templates for Feature Extraction in Fingerprint Images Bir Bhanu, Michael Boshra and Xuejun Tan Center for Research in Intelligent Systems University of Califomia, Riverside, CA 9252 1, USA Email:

More information