Grounded Compositional Semantics for Finding and Describing Images with Sentences
|
|
- Emerald Nichols
- 5 years ago
- Views:
Transcription
1 Grounded Compositional Semantics for Finding and Describing Images with Sentences R. Socher, A. Karpathy, V. Le,D. Manning, A Y. Ng Ali Gharaee 1 Alireza Keshavarzi 2 1 Department of Computational Linguistic University of Tuebingen 2 Department of Computer Science University of Tuebingen July 13, 2017 Describing Images with Sentences July 13, / 38
2 Outline 1 Introduction 2 Related Work 3 DT-RNN Inputs Forward Propagation 4 Learning Images Setup Deep Neural Network Training 5 Multimodal Mapping 6 Experiment 7 Conclusion Describing Images with Sentences July 13, / 38
3 Introduction Introduction Single word vector spaces can capture meaning of the single words. Describing Images with Sentences July 13, / 38
4 Introduction Introduction Single word vector spaces can capture meaning of the single words. BUT words rarely appear in isolation Describing Images with Sentences July 13, / 38
5 Introduction Introduction Single word vector spaces can capture meaning of the single words. BUT words rarely appear in isolation Play vs. Two children are playing in a park Describing Images with Sentences July 13, / 38
6 Introduction Introduction The paper introduces a model which learns to map sentences and images into a common embedding space in order to be able to retrieve one from the other. Describing Images with Sentences July 13, / 38
7 Introduction Introduction The paper introduces a model which learns to map sentences and images into a common embedding space in order to be able to retrieve one from the other. The model for mapping sentences into this space is based on ideas from Recursive Neural Networks (RNNs), it computes compositional vector representations inside dependency trees. Describing Images with Sentences July 13, / 38
8 Introduction Introduction Find images that show such a scene: A man wearing a helmet jumps on his bike near a beach. Describing Images with Sentences July 13, / 38
9 Introduction Introduction Find images that show such a scene: A man wearing a helmet jumps on his bike near a beach. Conversely, when given a query image, we would like to find a description that goes beyond a single label by providing a correct sentence describing it, a task that has recently garnered a lot of attention. Describing Images with Sentences July 13, / 38
10 Related Work Related Work The presented model is connected to several areas of NLP and vision research: Describing Images with Sentences July 13, / 38
11 Related Work Related Work The presented model is connected to several areas of NLP and vision research: 1 Semantic Vector Spaces and Their Compositionality: Describing Images with Sentences July 13, / 38
12 Related Work Related Work The presented model is connected to several areas of NLP and vision research: 1 Semantic Vector Spaces and Their Compositionality: Most of the compositionality algorithms and related datasets capture two-word compositions. (Mitchell and Lapata, 2010) Describing Images with Sentences July 13, / 38
13 Related Work Related Work The presented model is connected to several areas of NLP and vision research: 1 Semantic Vector Spaces and Their Compositionality: Most of the compositionality algorithms and related datasets capture two-word compositions. (Mitchell and Lapata, 2010) 2 Multimodal Embeddings: Describing Images with Sentences July 13, / 38
14 Related Work Related Work The presented model is connected to several areas of NLP and vision research: 1 Semantic Vector Spaces and Their Compositionality: Most of the compositionality algorithms and related datasets capture two-word compositions. (Mitchell and Lapata, 2010) 2 Multimodal Embeddings: Multimodal embedding methods project data from multiple sources such as sound and video or images and text. Describing Images with Sentences July 13, / 38
15 Related Work Related Work The presented model is connected to several areas of NLP and vision research: 1 Semantic Vector Spaces and Their Compositionality: Most of the compositionality algorithms and related datasets capture two-word compositions. (Mitchell and Lapata, 2010) 2 Multimodal Embeddings: Multimodal embedding methods project data from multiple sources such as sound and video or images and text. Recently, single word vector embeddings have been used for zero shot learning. Describing Images with Sentences July 13, / 38
16 Related Work Related Work The presented model is connected to several areas of NLP and vision research: 1 Semantic Vector Spaces and Their Compositionality: Most of the compositionality algorithms and related datasets capture two-word compositions. (Mitchell and Lapata, 2010) 2 Multimodal Embeddings: Multimodal embedding methods project data from multiple sources such as sound and video or images and text. Recently, single word vector embeddings have been used for zero shot learning. Mapping images to word vectors enabled their system to classify images as depicting objects such as cat without seeing any examples of this class. (Socher et al.,2013c) Describing Images with Sentences July 13, / 38
17 Related Work Related Work 3 Detailed Image Annotation Describing Images with Sentences July 13, / 38
18 Related Work Related Work 3 Detailed Image Annotation Early work in this area includes generating single words or fixed phrases from images. Describing Images with Sentences July 13, / 38
19 Related Work Related Work 3 Detailed Image Annotation Early work in this area includes generating single words or fixed phrases from images. Describing images with more detailed, longer textual description.(yao 2010) Describing Images with Sentences July 13, / 38
20 Related Work Related Work 3 Detailed Image Annotation Early work in this area includes generating single words or fixed phrases from images. Describing images with more detailed, longer textual description.(yao 2010) The model of this paper is based on Compositional sentence vector representation and doesn t require specific language generation techniques and sophisticated inference methods. Describing Images with Sentences July 13, / 38
21 Related Work Related Work 3 Detailed Image Annotation Early work in this area includes generating single words or fixed phrases from images. Describing images with more detailed, longer textual description.(yao 2010) The model of this paper is based on Compositional sentence vector representation and doesn t require specific language generation techniques and sophisticated inference methods. Since it s based on neural networks inference, it s fast and simple. Describing Images with Sentences July 13, / 38
22 DT-RNN Inputs Word vector How to build representation for longer phrases Describing Images with Sentences July 13, / 38
23 DT-RNN Inputs Word vector How to build representation for longer phrases Single Word Vector: simple by averaging Describing Images with Sentences July 13, / 38
24 DT-RNN Inputs Word vector How to build representation for longer phrases Single Word Vector: simple by averaging Bag of Word: good performance, but cannot distinguish important visual differences Describing Images with Sentences July 13, / 38
25 DT-RNN Inputs Word vector How to build representation for longer phrases Single Word Vector: simple by averaging Bag of Word: good performance, but cannot distinguish important visual differences The car crashed into the bike The bike crashed into the car Describing Images with Sentences July 13, / 38
26 DT-RNN Inputs Word vector How to build representation for longer phrases Single Word Vector: simple by averaging Bag of Word: good performance, but cannot distinguish important visual differences The car crashed into the bike The bike crashed into the car Constituency Tree: very good, but too much syntactic structure Describing Images with Sentences July 13, / 38
27 DT-RNN Inputs Word vector How to build representation for longer phrases Single Word Vector: simple by averaging Bag of Word: good performance, but cannot distinguish important visual differences The car crashed into the bike The bike crashed into the car Constituency Tree: very good, but too much syntactic structure The child was hugged by its mother The mother hugged her child Describing Images with Sentences July 13, / 38
28 DT-RNN Inputs Different Constituency tree for a passive and active form of sentence. Describing Images with Sentences July 13, / 38
29 DT-RNN Inputs Word vector How to build representation for longer phrases Single Word Vector: simple by averaging Bag of Word: good performance, but cannot distinguish important visual differences The car crashed into the bike The bike crashed into the car Constituency Tree: very good, but too much syntactic structure The child was hugged by its mother The mother hugged her child Dependency Tree: focuses more on recognizing actions and agents Describing Images with Sentences July 13, / 38
30 DT-RNN Inputs Agent and action remain same in Dependency tree. Describing Images with Sentences July 13, / 38
31 DT-RNN Inputs Word vector A sentence or Phrase with m words A word with d-dimensional feature (d = 50) Describing Images with Sentences July 13, / 38
32 DT-RNN Inputs Word vector A sentence or Phrase with m words A word with d-dimensional feature (d = 50) Construct a Neural Network that outputs high scores for windows and documents that occur in a large unlabeled corpus and low scores for window-documents pairs where one word is replaced by a random word. Describing Images with Sentences July 13, / 38
33 DT-RNN Inputs Word vector Optimize Neural network with Gradient descent Derivative backpropagate into a word embedding matrix A which stores word vectors as columns Use embedding matrix X that contains columns vector of A of each word in our sentences Then we represent Input sentence s = ((w 1, x w1 ),..., (w m, x wm )) As an ordered list of (word,vector) pairs. Describing Images with Sentences July 13, / 38
34 DT-RNN Inputs Dependecy Tree Using Dependency Parser to parse the Sentence s = (w 1,..., w m ) Describing Images with Sentences July 13, / 38
35 DT-RNN Inputs Dependecy Tree Using Dependency Parser to parse the Sentence s = (w 1,..., w m ) Define d(s) as an ordered list of (child,parent) indices: d(s) = {(i, j)}, i = 1,..., m and j {1,..., m} {0} Describing Images with Sentences July 13, / 38
36 DT-RNN Inputs Dependecy Tree Using Dependency Parser to parse the Sentence s = (w 1,..., w m ) Define d(s) as an ordered list of (child,parent) indices: d(s) = {(i, j)}, i = 1,..., m and j {1,..., m} {0} d = {(1, 2), (2, 0), ((3, 2), (4, 2), (5, 4)} The Final input is pair of Dependency Tree and words vector of sentence (s, d) Describing Images with Sentences July 13, / 38
37 DT-RNN Forward Propagation Forward Propagation Define a Compositionality function: h c = g θ (x c ) = f(w v x c ), W v R n d Describing Images with Sentences July 13, / 38
38 DT-RNN Forward Propagation Forward Propagation Define a Compositionality function: h c = g θ (x c ) = f(w v x c ), W v R n d Use tanh as activation function Describing Images with Sentences July 13, / 38
39 DT-RNN Forward Propagation Forward Propagation Define a Compositionality function: h c = g θ (x c ) = f(w v x c ), W v R n d Use tanh as activation function In our example first we should compute leaf node (c = 1, 3, 5) h 1 = g θ (x 1 ) = f(w v x 1 ) Describing Images with Sentences July 13, / 38
40 DT-RNN Forward Propagation Forward Propagation Define a Compositionality function: h c = g θ (x c ) = f(w v x c ), W v R n d Use tanh as activation function In our example first we should compute leaf node (c = 1, 3, 5) h 1 = g θ (x 1 ) = f(w v x 1 ) The final sentence representation is h 2 but we need to compute h 4 Describing Images with Sentences July 13, / 38
41 DT-RNN Forward Propagation Forward Propagation For h 4 we have a sum over child nodes h 4 = gθ(x 4, h 5 ) = f(w v x 4 + W r1 h 5 ), W r1 R n n Describing Images with Sentences July 13, / 38
42 DT-RNN Forward Propagation Forward Propagation For h 4 we have a sum over child nodes h 4 = gθ(x 4, h 5 ) = f(w v x 4 + W r1 h 5 ), W r1 R n n Generally, we have multiple matrices for composing with hidden child vectors. W r = (W r1,..., W rkr ) W l = (W l1,..., W lkl ) K is the number of maximum needed matrices in training data Describing Images with Sentences July 13, / 38
43 DT-RNN Forward Propagation Forward Propagation For h 4 we have a sum over child nodes h 4 = gθ(x 4, h 5 ) = f(w v x 4 + W r1 h 5 ), W r1 R n n Generally, we have multiple matrices for composing with hidden child vectors. W r = (W r1,..., W rkr ) W l = (W l1,..., W lkl ) K is the number of maximum needed matrices in training data How about if test sentence needs hidden child vector greater than k? Use Identity matrix Divide sentence Trim sentence Describing Images with Sentences July 13, / 38
44 DT-RNN Forward Propagation Forward Propagation For h 4 we have a sum over child nodes h 4 = gθ(x 4, h 5 ) = f(w v x 4 + W r1 h 5 ), W r1 R n n Generally, we have multiple matrices for composing with hidden child vectors. W r = (W r1,..., W rkr ) W l = (W l1,..., W lkl ) K is the number of maximum needed matrices in training data How about if test sentence needs hidden child vector greater than k? Use Identity matrix Divide sentence Trim sentence Describing Images with Sentences July 13, / 38
45 DT-RNN Forward Propagation Forward Propagation Now, we can compute the root: h 2 h 2 = g θ (x 2, h 1, h 3, h 4 ) = f(w v x 2 + W l1 h 1 + W r1 h 3 + W r2 h 4 ) Describing Images with Sentences July 13, / 38
46 DT-RNN Forward Propagation Forward Propagation Now, we can compute the root: h 2 h 2 = g θ (x 2, h 1, h 3, h 4 ) = f(w v x 2 + W l1 h 1 + W r1 h 3 + W r2 h 4 ) Different results for small sentences and large sentences! Describing Images with Sentences July 13, / 38
47 DT-RNN Forward Propagation Forward Propagation Now, we can compute the root: h 2 h 2 = g θ (x 2, h 1, h 3, h 4 ) = f(w v x 2 + W l1 h 1 + W r1 h 3 + W r2 h 4 ) Different results for small sentences and large sentences! Describing Images with Sentences July 13, / 38
48 DT-RNN Forward Propagation Normalization Normalize hidden nodes: h i = f 1 W v x i + l(i) j C(i) l(i) = the number of leaf nodes under node i we can compute l(i) = 1 + j C(i) l(j) l(j)w pos(i,j) h i C(i, y) = a set of child nodes of node i in dependency tree y pos(i, j) is relative position of child j with respect to node i e.g l1 or r3 Describing Images with Sentences July 13, / 38
49 DT-RNN Forward Propagation Comparison to Constituency Tree RNN Why Constituency tree (CT-RNN) doesn t work well? It is a binary tree. each node has only two child node (c 1, c 2 ) Describing Images with Sentences July 13, / 38
50 DT-RNN Forward Propagation Comparison to Constituency Tree RNN Why Constituency tree (CT-RNN) doesn t work well? It is a binary tree. each node has only two child node (c 1, c 2 ) Composition function is: h = f(w l1 c 1 + W r1 c 2 ) W R d 2d Describing Images with Sentences July 13, / 38
51 DT-RNN Forward Propagation Comparison to Constituency Tree RNN Why Constituency tree (CT-RNN) doesn t work well? It is a binary tree. each node has only two child node (c 1, c 2 ) Composition function is: h = f(w l1 c 1 + W r1 c 2 ) W R d 2d DT-RNN allows n-ary nodes in tree Describing Images with Sentences July 13, / 38
52 DT-RNN Forward Propagation Comparison to Constituency Tree RNN Why Constituency tree (CT-RNN) doesn t work well? It is a binary tree. each node has only two child node (c 1, c 2 ) Composition function is: h = f(w l1 c 1 + W r1 c 2 ) W R d 2d DT-RNN allows n-ary nodes in tree in CT-RNN: last in, larger weight = last words are more important. Describing Images with Sentences July 13, / 38
53 DT-RNN Forward Propagation Comparison to Constituency Tree RNN Why Constituency tree (CT-RNN) doesn t work well? It is a binary tree. each node has only two child node (c 1, c 2 ) Composition function is: h = f(w l1 c 1 + W r1 c 2 ) W R d 2d DT-RNN allows n-ary nodes in tree in CT-RNN: last in, larger weight = last words are more important. CT-RNN capture the syntactic of sentences more than DT-RNN Describing Images with Sentences July 13, / 38
54 DT-RNN Forward Propagation Comparison to Constituency Tree RNN Why Constituency tree (CT-RNN) doesn t work well? It is a binary tree. each node has only two child node (c 1, c 2 ) Composition function is: h = f(w l1 c 1 + W r1 c 2 ) W R d 2d DT-RNN allows n-ary nodes in tree in CT-RNN: last in, larger weight = last words are more important. CT-RNN capture the syntactic of sentences more than DT-RNN But To describe an Image, we need agents and action Describing Images with Sentences July 13, / 38
55 DT-RNN Forward Propagation Comparison to Constituency Tree RNN Why Constituency tree (CT-RNN) doesn t work well? It is a binary tree. each node has only two child node (c 1, c 2 ) Composition function is: h = f(w l1 c 1 + W r1 c 2 ) W R d 2d DT-RNN allows n-ary nodes in tree in CT-RNN: last in, larger weight = last words are more important. CT-RNN capture the syntactic of sentences more than DT-RNN But To describe an Image, we need agents and action The dependency tree structures push the central content words such as the main action or verb and its subject and object to be merged last Describing Images with Sentences July 13, / 38
56 DT-RNN Forward Propagation Comparison to Constituency Tree RNN Why Constituency tree (CT-RNN) doesn t work well? It is a binary tree. each node has only two child node (c 1, c 2 ) Composition function is: h = f(w l1 c 1 + W r1 c 2 ) W R d 2d DT-RNN allows n-ary nodes in tree in CT-RNN: last in, larger weight = last words are more important. CT-RNN capture the syntactic of sentences more than DT-RNN But To describe an Image, we need agents and action The dependency tree structures push the central content words such as the main action or verb and its subject and object to be merged last Final sentence representation in DT-RNN is more robust to less important adjectival modifiers, word order changes, etc. Describing Images with Sentences July 13, / 38
57 Learning Images Setup Deep Neural Network Data representation Two dataset: 20 million Random web images (unsupervised learning) 14 million labeled images to classify 22,000 categories (supervised learning) Describing Images with Sentences July 13, / 38
58 Learning Images Setup Deep Neural Network Data representation Two dataset: 20 million Random web images (unsupervised learning) 14 million labeled images to classify 22,000 categories (supervised learning) Input Image: Resize and Rescale to pixel Describing Images with Sentences July 13, / 38
59 Learning Images Setup Deep Neural Network Layer Architecture 3 layers, 3 stages (9 layers): Describing Images with Sentences July 13, / 38
60 Learning Images Setup Deep Neural Network Layer Architecture 3 layers, 3 stages (9 layers): Filtering: learnable parameters! Describing Images with Sentences July 13, / 38
61 Learning Images Setup Deep Neural Network Layer Architecture 3 layers, 3 stages (9 layers): Filtering: learnable parameters! L2-Pooling: taking the square of the filtering units, summing them up in a small area in the image, and taking the square root Describing Images with Sentences July 13, / 38
62 Learning Images Setup Deep Neural Network Layer Architecture 3 layers, 3 stages (9 layers): Filtering: learnable parameters! L2-Pooling: taking the square of the filtering units, summing them up in a small area in the image, and taking the square root Local contrast normalization: takes inputs in a small area of the lower layer, subtracts the mean and divides by the standard deviation Describing Images with Sentences July 13, / 38
63 Learning Images Setup Deep Neural Network Filtering Describing Images with Sentences July 13, / 38
64 Learning Images Setup Deep Neural Network Filtering The values of Filter (in first layer) after training. Describing Images with Sentences July 13, / 38
65 Learning Images Training Training Unsupervised objective: Trying to reconstruct the input while keeping the neurons sparse. Describing Images with Sentences July 13, / 38
66 Learning Images Training Training Unsupervised objective: Trying to reconstruct the input while keeping the neurons sparse. Supervised objective: Adjust the features in the entire network. Describing Images with Sentences July 13, / 38
67 Learning Images Training Training Unsupervised objective: Trying to reconstruct the input while keeping the neurons sparse. Supervised objective: Adjust the features in the entire network. Adding a bottle-neck layer in between the last layer and the classifier: To reduce the number of connections to d = 4096 Describing Images with Sentences July 13, / 38
68 Learning Images Training Training Unsupervised objective: Trying to reconstruct the input while keeping the neurons sparse. Supervised objective: Adjust the features in the entire network. Adding a bottle-neck layer in between the last layer and the classifier: To reduce the number of connections to d = 4096 Performs a feedforward computation to compute the values of the bottleneck layer Describing Images with Sentences July 13, / 38
69 Learning Images Training Training Unsupervised objective: Trying to reconstruct the input while keeping the neurons sparse. Supervised objective: Adjust the features in the entire network. Adding a bottle-neck layer in between the last layer and the classifier: To reduce the number of connections to d = 4096 Performs a feedforward computation to compute the values of the bottleneck layer Use a linear activation for this layer Describing Images with Sentences July 13, / 38
70 Learning Images Training Training Unsupervised objective: Trying to reconstruct the input while keeping the neurons sparse. Supervised objective: Adjust the features in the entire network. Adding a bottle-neck layer in between the last layer and the classifier: To reduce the number of connections to d = 4096 Performs a feedforward computation to compute the values of the bottleneck layer Use a linear activation for this layer Using the features at the bottleneck layer as the feature vector z of an image in multimodal space. Describing Images with Sentences July 13, / 38
71 Learning Images Training Some Facts! In total number of connections of this network is approximately 1.36 billion. How long does it take to train this network? 1 Describing Images with Sentences July 13, / 38
72 Learning Images Training Some Facts! In total number of connections of this network is approximately 1.36 billion. How long does it take to train this network? It takes 8 days on a large cluster of machines. On 169 machines (where each machine had 16 CPU cores) 1 Describing Images with Sentences July 13, / 38
73 Learning Images Training Some Facts! In total number of connections of this network is approximately 1.36 billion. How long does it take to train this network? It takes 8 days on a large cluster of machines. On 169 machines (where each machine had 16 CPU cores) How about the number of connections in the human visual cortex? 1 Describing Images with Sentences July 13, / 38
74 Learning Images Training Some Facts! In total number of connections of this network is approximately 1.36 billion. How long does it take to train this network? It takes 8 days on a large cluster of machines. On 169 machines (where each machine had 16 CPU cores) How about the number of connections in the human visual cortex? 10 6 times larger in terms of the number of neurons and synapses How long does it take a person to recognize objects? 1 Describing Images with Sentences July 13, / 38
75 Learning Images Training Some Facts! In total number of connections of this network is approximately 1.36 billion. How long does it take to train this network? It takes 8 days on a large cluster of machines. On 169 machines (where each machine had 16 CPU cores) How about the number of connections in the human visual cortex? 10 6 times larger in terms of the number of neurons and synapses How long does it take a person to recognize objects? Training time of an infant is between 3 to 5 months! Describing Images with Sentences July 13, / 38
76 Multimodal Mapping Multimodal Mapping The previous two sections described how we can map sentences into a d = 50-dimensional space and how to extract high quality image feature vectors of 4096 dimensions. Describing Images with Sentences July 13, / 38
77 Multimodal Mapping Multimodal Mapping The previous two sections described how we can map sentences into a d = 50-dimensional space and how to extract high quality image feature vectors of 4096 dimensions. We now define the final multimodal objective function for learning joint image-sentence representations with these models. Describing Images with Sentences July 13, / 38
78 Multimodal Mapping Multimodal Mapping The previous two sections described how we can map sentences into a d = 50-dimensional space and how to extract high quality image feature vectors of 4096 dimensions. We now define the final multimodal objective function for learning joint image-sentence representations with these models. The training set consists of N images and their feature vectors (z i ) and each image has 5 sentence descriptions s i1,..., s i5 for which we use the DT-RNN to compute vector representations. Describing Images with Sentences July 13, / 38
79 Multimodal Mapping Multimodal Mapping Figure: Sentence length varies greatly and different objects can be mentioned first. Hence, models have to be invariant to word ordering Describing Images with Sentences July 13, / 38
80 Multimodal Mapping Multimodal Mapping For training, we use a max-margin objective function. Describing Images with Sentences July 13, / 38
81 Multimodal Mapping Multimodal Mapping For training, we use a max-margin objective function. The ranking cost function to minimize is: J(W I, θ) = max(0, vi T y j + vi T y c ) i,j P c S\S(i) + i,j P c I\I(j) max(0, v T i y j + v T c y j ) Describing Images with Sentences July 13, / 38
82 Multimodal Mapping Multimodal Mapping For training, we use a max-margin objective function. The ranking cost function to minimize is: J(W I, θ) = max(0, vi T y j + vi T y c ) i,j P c S\S(i) + i,j P c I\I(j) max(0, v T i y j + v T c y j ) The objective function is very similar to the Hinge loss function in SVM. Describing Images with Sentences July 13, / 38
83 Multimodal Mapping Multimodal Mapping For training, we use a max-margin objective function. The ranking cost function to minimize is: J(W I, θ) = max(0, vi T y j + vi T y c ) i,j P c S\S(i) + i,j P c I\I(j) max(0, v T i y j + v T c y j ) The objective function is very similar to the Hinge loss function in SVM. The final objective also includes the regularization term: λ ( θ W I F ) (here the Frobenius norm is l 2 norm) Describing Images with Sentences July 13, / 38
84 Multimodal Mapping Multimodal Mapping For training, we use a max-margin objective function. The ranking cost function to minimize is: J(W I, θ) = max(0, vi T y j + vi T y c ) i,j P c S\S(i) + i,j P c I\I(j) max(0, v T i y j + v T c y j ) The objective function is very similar to the Hinge loss function in SVM. The final objective also includes the regularization term: λ ( θ W I F ) (here the Frobenius norm is l 2 norm) A modified version of AdaGrad for optimization of both W I and the DT-RNN as well as the other baselines. Describing Images with Sentences July 13, / 38
85 Multimodal Mapping Multimodal Mapping An alternative objective function is based on the squared loss J(W I, θ) = (i,j) P v i y j 2 2. Describing Images with Sentences July 13, / 38
86 Experiment Experiment The authors use dataset of Rashtchian et al. (2010) which consists of 1000 images, each with 5 sentences. Figure 1 Describing Images with Sentences July 13, / 38
87 Experiment Experiment The authors use dataset of Rashtchian et al. (2010) which consists of 1000 images, each with 5 sentences. Figure 1 Three different experiments to evaluate and compare the DT-RNN: 1 Analyzing how well the sentence vectors capture similarity in visual meaning. Describing Images with Sentences July 13, / 38
88 Experiment Experiment The authors use dataset of Rashtchian et al. (2010) which consists of 1000 images, each with 5 sentences. Figure 1 Three different experiments to evaluate and compare the DT-RNN: 1 Analyzing how well the sentence vectors capture similarity in visual meaning. 2 Analyzing Image Search with Query Sentences. Describing Images with Sentences July 13, / 38
89 Experiment Experiment The authors use dataset of Rashtchian et al. (2010) which consists of 1000 images, each with 5 sentences. Figure 1 Three different experiments to evaluate and compare the DT-RNN: 1 Analyzing how well the sentence vectors capture similarity in visual meaning. 2 Analyzing Image Search with Query Sentences. 3 Describing Images by Finding Suitable Sentences. In the experiments data is split into 800 training, 100 development and 100 test images. Since there are 5 sentences describing each image, there are 4000 training sentences and 500 testing sentences. The dataset has 3020 unique words. Describing Images with Sentences July 13, / 38
90 Experiment Experiment For both DT-RNNs the weight matrices are initialized to block identity matrices plus Gaussian noise Describing Images with Sentences July 13, / 38
91 Experiment Experiment For both DT-RNNs the weight matrices are initialized to block identity matrices plus Gaussian noise Length of word vectors and hidden vectors = 50. λ = 0.08, the learning rate of AdaGrad = (by using the development split) The best model uses a margin of = 3. Describing Images with Sentences July 13, / 38
92 Experiment Experiment For both DT-RNNs the weight matrices are initialized to block identity matrices plus Gaussian noise Length of word vectors and hidden vectors = 50. λ = 0.08, the learning rate of AdaGrad = (by using the development split) The best model uses a margin of = 3. Similarity of Sentences Describing the Same Image: First all 500 sentences from the test set have been mapped into the multimodal space. Then for each sentence, we find the nearest neighbor sentences in terms of inner products. Describing Images with Sentences July 13, / 38
93 Experiment Experiment Figure: Left: Comparison of methods for sentence similarity judgments. Lower numbers are better. Center: Comparison of methods for image search with query sentences. Shown is the average rank of the single correct image that is being described. Right: Average rank of a correct sentence description for a query image. Describing Images with Sentences July 13, / 38
94 Experiment Experiment Image Search with Query Sentences This experiment evaluates how well we can find images that display the visual meaning of a given sentence. First a query sentence is mapped into the vector space and then images will be found in the same space using simple inner products. As shown in Table 1 (center), the new DT-RNN outperforms all other models. Describing Images with Sentences July 13, / 38
95 Experiment Experiment Image Search with Query Sentences This experiment evaluates how well we can find images that display the visual meaning of a given sentence. First a query sentence is mapped into the vector space and then images will be found in the same space using simple inner products. As shown in Table 1 (center), the new DT-RNN outperforms all other models. Describing Images by Finding Suitable Sentences For an image, suitable textual descriptions again have been searched simply by finding closeby sentence vectors in the multi-modal embedding space. Describing Images with Sentences July 13, / 38
96 Experiment Experiment Image Search with Query Sentences This experiment evaluates how well we can find images that display the visual meaning of a given sentence. First a query sentence is mapped into the vector space and then images will be found in the same space using simple inner products. As shown in Table 1 (center), the new DT-RNN outperforms all other models. Describing Images by Finding Suitable Sentences For an image, suitable textual descriptions again have been searched simply by finding closeby sentence vectors in the multi-modal embedding space. The average ranking of 25.3 for a correct sentence description is out of 500 possible sentences. Describing Images with Sentences July 13, / 38
97 Experiment Multimodal Mapping Figure: Images and their sentence descriptions assigned by the DT-RNN. Describing Images with Sentences July 13, / 38
98 Experiment Experiment Main failure mode of the SDT-RNN occurs when a sentence that should describe the same image does not use a verb but the other sentences of that image do include a verb. Describing Images with Sentences July 13, / 38
99 Experiment Experiment Main failure mode of the SDT-RNN occurs when a sentence that should describe the same image does not use a verb but the other sentences of that image do include a verb. For example, the following sentence pair has vectors that are very far apart from each other even though they are supposed to describe the same image: 1. A blue and yellow airplane flying straight down while emitting white smoke 2. Airplane in dive position Describing Images with Sentences July 13, / 38
100 Conclusion Conclusion Our new model outperforms baselines and other commonly used models that can compute continuous vector representations for sentences. Describing Images with Sentences July 13, / 38
101 Conclusion Conclusion Our new model outperforms baselines and other commonly used models that can compute continuous vector representations for sentences. In comparison to related models, the DTRNN is more invariant and robust to surface changes such as word order. Describing Images with Sentences July 13, / 38
102 Appendix For Further Reading For Further Reading I Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan. Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge Chris Shallue. A TensorFlow implementation of the image-to-text model described above Andrej Karpathy, Li Fei-Fei. Deep Visual-Semantic Alignments for Generating Image Descriptions Describing Images with Sentences July 13, / 38
Semantic image search using queries
Semantic image search using queries Shabaz Basheer Patel, Anand Sampat Department of Electrical Engineering Stanford University CA 94305 shabaz@stanford.edu,asampat@stanford.edu Abstract Previous work,
More informationContext Encoding LSTM CS224N Course Project
Context Encoding LSTM CS224N Course Project Abhinav Rastogi arastogi@stanford.edu Supervised by - Samuel R. Bowman December 7, 2015 Abstract This project uses ideas from greedy transition based parsing
More informationRecursive Deep Models for Semantic Compositionality Over a Sentiment Treebank text
Philosophische Fakultät Seminar für Sprachwissenschaft Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank text 06 July 2017, Patricia Fischer & Neele Witte Overview Sentiment
More informationMachine Learning 13. week
Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of
More informationLSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia
1 LSTM for Language Translation and Image Captioning Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia 2 Part I LSTM for Language Translation Motivation Background (RNNs, LSTMs) Model
More informationNovel Image Captioning
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationA Neuro Probabilistic Language Model Bengio et. al. 2003
A Neuro Probabilistic Language Model Bengio et. al. 2003 Class Discussion Notes Scribe: Olivia Winn February 1, 2016 Opening thoughts (or why this paper is interesting): Word embeddings currently have
More informationCOMP 551 Applied Machine Learning Lecture 16: Deep Learning
COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all
More informationDeep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies
http://blog.csdn.net/zouxy09/article/details/8775360 Automatic Colorization of Black and White Images Automatically Adding Sounds To Silent Movies Traditionally this was done by hand with human effort
More informationDeep Learning Applications
October 20, 2017 Overview Supervised Learning Feedforward neural network Convolution neural network Recurrent neural network Recursive neural network (Recursive neural tensor network) Unsupervised Learning
More informationMultimodal Learning. Victoria Dean. MIT 6.S191 Intro to Deep Learning IAP 2017
Multimodal Learning Victoria Dean Talk outline What is multimodal learning and what are the challenges? Flickr example: joint learning of images and tags Image captioning: generating sentences from images
More informationDEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla
DEEP LEARNING REVIEW Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature 2015 -Presented by Divya Chitimalla What is deep learning Deep learning allows computational models that are composed of multiple
More informationConvolutional-Recursive Deep Learning for 3D Object Classification
Convolutional-Recursive Deep Learning for 3D Object Classification Richard Socher, Brody Huval, Bharath Bhat, Christopher D. Manning, Andrew Y. Ng NIPS 2012 Iro Armeni, Manik Dhar Motivation Hand-designed
More informationLecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa
Instructors: Parth Shah, Riju Pahwa Lecture 2 Notes Outline 1. Neural Networks The Big Idea Architecture SGD and Backpropagation 2. Convolutional Neural Networks Intuition Architecture 3. Recurrent Neural
More informationCSE 250B Project Assignment 4
CSE 250B Project Assignment 4 Hani Altwary haltwa@cs.ucsd.edu Kuen-Han Lin kul016@ucsd.edu Toshiro Yamada toyamada@ucsd.edu Abstract The goal of this project is to implement the Semi-Supervised Recursive
More informationVISION & LANGUAGE From Captions to Visual Concepts and Back
VISION & LANGUAGE From Captions to Visual Concepts and Back Brady Fowler & Kerry Jones Tuesday, February 28th 2017 CS 6501-004 VICENTE Agenda Problem Domain Object Detection Language Generation Sentence
More informationAkarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction
Akarsh Pokkunuru EECS Department 03-16-2017 Contractive Auto-Encoders: Explicit Invariance During Feature Extraction 1 AGENDA Introduction to Auto-encoders Types of Auto-encoders Analysis of different
More informationECG782: Multidimensional Digital Signal Processing
ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting
More informationMachine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,
Machine Learning 10-701, Fall 2015 Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October 6, 2015 Eric Xing @ CMU, 2015 1 A perennial challenge in computer vision: feature engineering SIFT Spin image
More informationLSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University
LSTM and its variants for visual recognition Xiaodan Liang xdliang328@gmail.com Sun Yat-sen University Outline Context Modelling with CNN LSTM and its Variants LSTM Architecture Variants Application in
More informationDeepWalk: Online Learning of Social Representations
DeepWalk: Online Learning of Social Representations ACM SIG-KDD August 26, 2014, Rami Al-Rfou, Steven Skiena Stony Brook University Outline Introduction: Graphs as Features Language Modeling DeepWalk Evaluation:
More informationTransition-based Parsing with Neural Nets
CS11-747 Neural Networks for NLP Transition-based Parsing with Neural Nets Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Two Types of Linguistic Structure Dependency: focus on relations between
More informationLearning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li
Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,
More informationLearning Meanings for Sentences with Recursive Autoencoders
Learning Meanings for Sentences with Recursive Autoencoders Tsung-Yi Lin and Chen-Yu Lee Department of Electrical and Computer Engineering University of California, San Diego {tsl008, chl260}@ucsd.edu
More informationFastText. Jon Koss, Abhishek Jindal
FastText Jon Koss, Abhishek Jindal FastText FastText is on par with state-of-the-art deep learning classifiers in terms of accuracy But it is way faster: FastText can train on more than one billion words
More informationUsing Machine Learning for Classification of Cancer Cells
Using Machine Learning for Classification of Cancer Cells Camille Biscarrat University of California, Berkeley I Introduction Cell screening is a commonly used technique in the development of new drugs.
More informationShow, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks
Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks Zelun Luo Department of Computer Science Stanford University zelunluo@stanford.edu Te-Lin Wu Department of
More informationSEMANTIC COMPUTING. Lecture 9: Deep Learning: Recurrent Neural Networks (RNNs) TU Dresden, 21 December 2018
SEMANTIC COMPUTING Lecture 9: Deep Learning: Recurrent Neural Networks (RNNs) Dagmar Gromann International Center For Computational Logic TU Dresden, 21 December 2018 Overview Handling Overfitting Recurrent
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Image Data: Classification via Neural Networks Instructor: Yizhou Sun yzsun@ccs.neu.edu November 19, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining
More informationRecurrent Neural Networks. Nand Kishore, Audrey Huang, Rohan Batra
Recurrent Neural Networks Nand Kishore, Audrey Huang, Rohan Batra Roadmap Issues Motivation 1 Application 1: Sequence Level Training 2 Basic Structure 3 4 Variations 5 Application 3: Image Classification
More informationNeural Networks for Machine Learning. Lecture 15a From Principal Components Analysis to Autoencoders
Neural Networks for Machine Learning Lecture 15a From Principal Components Analysis to Autoencoders Geoffrey Hinton Nitish Srivastava, Kevin Swersky Tijmen Tieleman Abdel-rahman Mohamed Principal Components
More informationNatural Language Processing
Natural Language Processing Machine Learning Potsdam, 26 April 2012 Saeedeh Momtazi Information Systems Group Introduction 2 Machine Learning Field of study that gives computers the ability to learn without
More informationINTRODUCTION TO DEEP LEARNING
INTRODUCTION TO DEEP LEARNING CONTENTS Introduction to deep learning Contents 1. Examples 2. Machine learning 3. Neural networks 4. Deep learning 5. Convolutional neural networks 6. Conclusion 7. Additional
More information16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning. Spring 2018 Lecture 14. Image to Text
16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning Spring 2018 Lecture 14. Image to Text Input Output Classification tasks 4/1/18 CMU 16-785: Integrated Intelligence in Robotics
More informationNeural Networks for unsupervised learning From Principal Components Analysis to Autoencoders to semantic hashing
Neural Networks for unsupervised learning From Principal Components Analysis to Autoencoders to semantic hashing feature 3 PC 3 Beate Sick Many slides are taken form Hinton s great lecture on NN: https://www.coursera.org/course/neuralnets
More informationA Deep Relevance Matching Model for Ad-hoc Retrieval
A Deep Relevance Matching Model for Ad-hoc Retrieval Jiafeng Guo 1, Yixing Fan 1, Qingyao Ai 2, W. Bruce Croft 2 1 CAS Key Lab of Web Data Science and Technology, Institute of Computing Technology, Chinese
More informationContents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation
Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Learning 4 Supervised Learning 4 Unsupervised Learning 4
More informationAutoencoder. Representation learning (related to dictionary learning) Both the input and the output are x
Deep Learning 4 Autoencoder, Attention (spatial transformer), Multi-modal learning, Neural Turing Machine, Memory Networks, Generative Adversarial Net Jian Li IIIS, Tsinghua Autoencoder Autoencoder Unsupervised
More informationStructural and Syntactic Pattern Recognition
Structural and Syntactic Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent
More informationObject Detection Lecture Introduction to deep learning (CNN) Idar Dyrdal
Object Detection Lecture 10.3 - Introduction to deep learning (CNN) Idar Dyrdal Deep Learning Labels Computational models composed of multiple processing layers (non-linear transformations) Used to learn
More informationAn Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation
An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation Hugo Larochelle, Dumitru Erhan, Aaron Courville, James Bergstra, and Yoshua Bengio Université de Montréal 13/06/2007
More informationComputer vision: teaching computers to see
Computer vision: teaching computers to see Mats Sjöberg Department of Computer Science Aalto University mats.sjoberg@aalto.fi Turku.AI meetup June 5, 2018 Computer vision Giving computers the ability to
More informationPouya Kousha Fall 2018 CSE 5194 Prof. DK Panda
Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda 1 Observe novel applicability of DL techniques in Big Data Analytics. Applications of DL techniques for common Big Data Analytics problems. Semantic indexing
More informationEvery Picture Tells a Story: Generating Sentences from Images
Every Picture Tells a Story: Generating Sentences from Images Ali Farhadi, Mohsen Hejrati, Mohammad Amin Sadeghi, Peter Young, Cyrus Rashtchian, Julia Hockenmaier, David Forsyth University of Illinois
More informationCSC321: Neural Networks. Lecture 13: Learning without a teacher: Autoencoders and Principal Components Analysis. Geoffrey Hinton
CSC321: Neural Networks Lecture 13: Learning without a teacher: Autoencoders and Principal Components Analysis Geoffrey Hinton Three problems with backpropagation Where does the supervision come from?
More informationarxiv:submit/ [cs.cv] 13 Jan 2018
Benchmark Visual Question Answer Models by using Focus Map Wenda Qiu Yueyang Xianzang Zhekai Zhang Shanghai Jiaotong University arxiv:submit/2130661 [cs.cv] 13 Jan 2018 Abstract Inferring and Executing
More informationImage-Sentence Multimodal Embedding with Instructive Objectives
Image-Sentence Multimodal Embedding with Instructive Objectives Jianhao Wang Shunyu Yao IIIS, Tsinghua University {jh-wang15, yao-sy15}@mails.tsinghua.edu.cn Abstract To encode images and sentences into
More informationImage Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction
Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction by Noh, Hyeonwoo, Paul Hongsuck Seo, and Bohyung Han.[1] Presented : Badri Patro 1 1 Computer Vision Reading
More informationCS231N Section. Video Understanding 6/1/2018
CS231N Section Video Understanding 6/1/2018 Outline Background / Motivation / History Video Datasets Models Pre-deep learning CNN + RNN 3D convolution Two-stream What we ve seen in class so far... Image
More informationEmpirical Evaluation of RNN Architectures on Sentence Classification Task
Empirical Evaluation of RNN Architectures on Sentence Classification Task Lei Shen, Junlin Zhang Chanjet Information Technology lorashen@126.com, zhangjlh@chanjet.com Abstract. Recurrent Neural Networks
More informationMachine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart
Machine Learning The Breadth of ML Neural Networks & Deep Learning Marc Toussaint University of Stuttgart Duy Nguyen-Tuong Bosch Center for Artificial Intelligence Summer 2017 Neural Networks Consider
More informationMulti-Glance Attention Models For Image Classification
Multi-Glance Attention Models For Image Classification Chinmay Duvedi Stanford University Stanford, CA cduvedi@stanford.edu Pararth Shah Stanford University Stanford, CA pararth@stanford.edu Abstract We
More informationUnsupervised learning in Vision
Chapter 7 Unsupervised learning in Vision The fields of Computer Vision and Machine Learning complement each other in a very natural way: the aim of the former is to extract useful information from visual
More informationTransition-Based Dependency Parsing with Stack Long Short-Term Memory
Transition-Based Dependency Parsing with Stack Long Short-Term Memory Chris Dyer, Miguel Ballesteros, Wang Ling, Austin Matthews, Noah A. Smith Association for Computational Linguistics (ACL), 2015 Presented
More informationIndoor Object Recognition of 3D Kinect Dataset with RNNs
Indoor Object Recognition of 3D Kinect Dataset with RNNs Thiraphat Charoensripongsa, Yue Chen, Brian Cheng 1. Introduction Recent work at Stanford in the area of scene understanding has involved using
More informationRobust PDF Table Locator
Robust PDF Table Locator December 17, 2016 1 Introduction Data scientists rely on an abundance of tabular data stored in easy-to-machine-read formats like.csv files. Unfortunately, most government records
More informationDeep neural networks II
Deep neural networks II May 31 st, 2018 Yong Jae Lee UC Davis Many slides from Rob Fergus, Svetlana Lazebnik, Jia-Bin Huang, Derek Hoiem, Adriana Kovashka, Why (convolutional) neural networks? State of
More informationJOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation
JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS Puyang Xu, Ruhi Sarikaya Microsoft Corporation ABSTRACT We describe a joint model for intent detection and slot filling based
More informationFacial Expression Classification with Random Filters Feature Extraction
Facial Expression Classification with Random Filters Feature Extraction Mengye Ren Facial Monkey mren@cs.toronto.edu Zhi Hao Luo It s Me lzh@cs.toronto.edu I. ABSTRACT In our work, we attempted to tackle
More informationRecurrent Neural Nets II
Recurrent Neural Nets II Steven Spielberg Pon Kumar, Tingke (Kevin) Shen Machine Learning Reading Group, Fall 2016 9 November, 2016 Outline 1 Introduction 2 Problem Formulations with RNNs 3 LSTM for Optimization
More informationImage Captioning with Object Detection and Localization
Image Captioning with Object Detection and Localization Zhongliang Yang, Yu-Jin Zhang, Sadaqat ur Rehman, Yongfeng Huang, Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017
3/0/207 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/0/207 Perceptron as a neural
More informationImageCLEF 2011
SZTAKI @ ImageCLEF 2011 Bálint Daróczy joint work with András Benczúr, Róbert Pethes Data Mining and Web Search Group Computer and Automation Research Institute Hungarian Academy of Sciences Training/test
More informationDeep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group
Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group 1D Input, 1D Output target input 2 2D Input, 1D Output: Data Distribution Complexity Imagine many dimensions (data occupies
More informationDeep Learning. Volker Tresp Summer 2014
Deep Learning Volker Tresp Summer 2014 1 Neural Network Winter and Revival While Machine Learning was flourishing, there was a Neural Network winter (late 1990 s until late 2000 s) Around 2010 there
More informationLarge-scale Video Classification with Convolutional Neural Networks
Large-scale Video Classification with Convolutional Neural Networks Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, Li Fei-Fei Note: Slide content mostly from : Bay Area
More informationApplying Supervised Learning
Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains
More informationNeural Network Neurons
Neural Networks Neural Network Neurons 1 Receives n inputs (plus a bias term) Multiplies each input by its weight Applies activation function to the sum of results Outputs result Activation Functions Given
More informationCapsule Networks. Eric Mintun
Capsule Networks Eric Mintun Motivation An improvement* to regular Convolutional Neural Networks. Two goals: Replace max-pooling operation with something more intuitive. Keep more info about an activated
More informationCSC 578 Neural Networks and Deep Learning
CSC 578 Neural Networks and Deep Learning Fall 2018/19 7. Recurrent Neural Networks (Some figures adapted from NNDL book) 1 Recurrent Neural Networks 1. Recurrent Neural Networks (RNNs) 2. RNN Training
More informationClustering algorithms and autoencoders for anomaly detection
Clustering algorithms and autoencoders for anomaly detection Alessia Saggio Lunch Seminars and Journal Clubs Université catholique de Louvain, Belgium 3rd March 2017 a Outline Introduction Clustering algorithms
More informationA Taxonomy of Semi-Supervised Learning Algorithms
A Taxonomy of Semi-Supervised Learning Algorithms Olivier Chapelle Max Planck Institute for Biological Cybernetics December 2005 Outline 1 Introduction 2 Generative models 3 Low density separation 4 Graph
More informationMachine Learning in Biology
Università degli studi di Padova Machine Learning in Biology Luca Silvestrin (Dottorando, XXIII ciclo) Supervised learning Contents Class-conditional probability density Linear and quadratic discriminant
More informationAnalysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009
Analysis: TextonBoost and Semantic Texton Forests Daniel Munoz 16-721 Februrary 9, 2009 Papers [shotton-eccv-06] J. Shotton, J. Winn, C. Rother, A. Criminisi, TextonBoost: Joint Appearance, Shape and Context
More informationDCU-UvA Multimodal MT System Report
DCU-UvA Multimodal MT System Report Iacer Calixto ADAPT Centre School of Computing Dublin City University Dublin, Ireland iacer.calixto@adaptcentre.ie Desmond Elliott ILLC University of Amsterdam Science
More informationDeep Face Recognition. Nathan Sun
Deep Face Recognition Nathan Sun Why Facial Recognition? Picture ID or video tracking Higher Security for Facial Recognition Software Immensely useful to police in tracking suspects Your face will be an
More informationData Mining and Analytics
Data Mining and Analytics Aik Choon Tan, Ph.D. Associate Professor of Bioinformatics Division of Medical Oncology Department of Medicine aikchoon.tan@ucdenver.edu 9/22/2017 http://tanlab.ucdenver.edu/labhomepage/teaching/bsbt6111/
More informationA Hybrid Neural Model for Type Classification of Entity Mentions
A Hybrid Neural Model for Type Classification of Entity Mentions Motivation Types group entities to categories Entity types are important for various NLP tasks Our task: predict an entity mention s type
More informationReal-time Object Detection CS 229 Course Project
Real-time Object Detection CS 229 Course Project Zibo Gong 1, Tianchang He 1, and Ziyi Yang 1 1 Department of Electrical Engineering, Stanford University December 17, 2016 Abstract Objection detection
More informationMultimodal Medical Image Retrieval based on Latent Topic Modeling
Multimodal Medical Image Retrieval based on Latent Topic Modeling Mandikal Vikram 15it217.vikram@nitk.edu.in Suhas BS 15it110.suhas@nitk.edu.in Aditya Anantharaman 15it201.aditya.a@nitk.edu.in Sowmya Kamath
More informationNeural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision
Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision Anonymized for review Abstract Extending the success of deep neural networks to high level tasks like natural language
More informationArtificial Intelligence Introduction Handwriting Recognition Kadir Eren Unal ( ), Jakob Heyder ( )
Structure: 1. Introduction 2. Problem 3. Neural network approach a. Architecture b. Phases of CNN c. Results 4. HTM approach a. Architecture b. Setup c. Results 5. Conclusion 1.) Introduction Artificial
More informationRobotics Programming Laboratory
Chair of Software Engineering Robotics Programming Laboratory Bertrand Meyer Jiwon Shin Lecture 8: Robot Perception Perception http://pascallin.ecs.soton.ac.uk/challenges/voc/databases.html#caltech car
More informationLecture 20: Neural Networks for NLP. Zubin Pahuja
Lecture 20: Neural Networks for NLP Zubin Pahuja zpahuja2@illinois.edu courses.engr.illinois.edu/cs447 CS447: Natural Language Processing 1 Today s Lecture Feed-forward neural networks as classifiers simple
More informationMachine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center
Machine Learning With Python Bin Chen Nov. 7, 2017 Research Computing Center Outline Introduction to Machine Learning (ML) Introduction to Neural Network (NN) Introduction to Deep Learning NN Introduction
More informationBilevel Sparse Coding
Adobe Research 345 Park Ave, San Jose, CA Mar 15, 2013 Outline 1 2 The learning model The learning algorithm 3 4 Sparse Modeling Many types of sensory data, e.g., images and audio, are in high-dimensional
More informationCS224n: Natural Language Processing with Deep Learning 1 Lecture Notes: Part IV Dependency Parsing 2 Winter 2019
CS224n: Natural Language Processing with Deep Learning 1 Lecture Notes: Part IV Dependency Parsing 2 Winter 2019 1 Course Instructors: Christopher Manning, Richard Socher 2 Authors: Lisa Wang, Juhi Naik,
More informationCPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:
More information11. Neural Network Regularization
11. Neural Network Regularization CS 519 Deep Learning, Winter 2016 Fuxin Li With materials from Andrej Karpathy, Zsolt Kira Preventing overfitting Approach 1: Get more data! Always best if possible! If
More informationArtificial Intelligence. Programming Styles
Artificial Intelligence Intro to Machine Learning Programming Styles Standard CS: Explicitly program computer to do something Early AI: Derive a problem description (state) and use general algorithms to
More informationDiscriminative classifiers for image recognition
Discriminative classifiers for image recognition May 26 th, 2015 Yong Jae Lee UC Davis Outline Last time: window-based generic object detection basic pipeline face detection with boosting as case study
More informationStacked Denoising Autoencoders for Face Pose Normalization
Stacked Denoising Autoencoders for Face Pose Normalization Yoonseop Kang 1, Kang-Tae Lee 2,JihyunEun 2, Sung Eun Park 2 and Seungjin Choi 1 1 Department of Computer Science and Engineering Pohang University
More informationEncoding RNNs, 48 End of sentence (EOS) token, 207 Exploding gradient, 131 Exponential function, 42 Exponential Linear Unit (ELU), 44
A Activation potential, 40 Annotated corpus add padding, 162 check versions, 158 create checkpoints, 164, 166 create input, 160 create train and validation datasets, 163 dropout, 163 DRUG-AE.rel file,
More informationMusic Genre Classification
Music Genre Classification Matthew Creme, Charles Burlin, Raphael Lenain Stanford University December 15, 2016 Abstract What exactly is it that makes us, humans, able to tell apart two songs of different
More informationNatural Language Processing with Deep Learning CS224N/Ling284. Christopher Manning Lecture 4: Backpropagation and computation graphs
Natural Language Processing with Deep Learning CS4N/Ling84 Christopher Manning Lecture 4: Backpropagation and computation graphs Lecture Plan Lecture 4: Backpropagation and computation graphs 1. Matrix
More informationNeural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer
More informationarxiv: v1 [cs.cv] 6 Jul 2016
arxiv:607.079v [cs.cv] 6 Jul 206 Deep CORAL: Correlation Alignment for Deep Domain Adaptation Baochen Sun and Kate Saenko University of Massachusetts Lowell, Boston University Abstract. Deep neural networks
More informationApparel Classifier and Recommender using Deep Learning
Apparel Classifier and Recommender using Deep Learning Live Demo at: http://saurabhg.me/projects/tag-that-apparel Saurabh Gupta sag043@ucsd.edu Siddhartha Agarwal siagarwa@ucsd.edu Apoorve Dave a1dave@ucsd.edu
More informationNatural Language Processing with Deep Learning CS224N/Ling284
Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 8: Recurrent Neural Networks Christopher Manning and Richard Socher Organization Extra project office hour today after lecture Overview
More informationNatural Language Processing. SoSe Question Answering
Natural Language Processing SoSe 2017 Question Answering Dr. Mariana Neves July 5th, 2017 Motivation Find small segments of text which answer users questions (http://start.csail.mit.edu/) 2 3 Motivation
More information