Word2vec and beyond. presented by Eleni Triantafillou. March 1, 2016

Similar documents
Mining Human Trajectory Data: A Study on Check-in Sequences. Xin Zhao Renmin University of China,

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2018

16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning. Spring 2018 Lecture 14. Image to Text

Vector Semantics. Dense Vectors

Semantic image search using queries

FastText. Jon Koss, Abhishek Jindal

Supervised classification of law area in the legal domain

DeepWalk: Online Learning of Social Representations

Multimodal topic model for texts and images utilizing their embeddings

Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction

Bridging Semantic Gaps between Natural Languages and APIs with Word Embedding

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

CPSC 340: Machine Learning and Data Mining. Multi-Dimensional Scaling Fall 2017

Exploring the Structure of Data at Scale. Rudy Agovic, PhD CEO & Chief Data Scientist at Reliancy January 16, 2019

Grounded Compositional Semantics for Finding and Describing Images with Sentences

MainiwayAI at IJCNLP-2017 Task 2: Ensembles of Deep Architectures for Valence-Arousal Prediction

CS 224N: Assignment #1

CS 224N: Assignment #1

CS 224d: Assignment #1

Vector Semantics. Dense Vectors

CAP 6412 Advanced Computer Vision

Novel Image Captioning

ECG782: Multidimensional Digital Signal Processing

Word2Vec Approach to NLP

Object Detection Based on Deep Learning

PTE : Predictive Text Embedding through Large-scale Heterogeneous Text Networks

Multimodal Learning. Victoria Dean. MIT 6.S191 Intro to Deep Learning IAP 2017

Gradient of the lower bound

Day 3 Lecture 1. Unsupervised Learning

Semantic Word Embedding Neural Network Language Models for Automatic Speech Recognition

Spatial Localization and Detection. Lecture 8-1

CSE 6242 A / CX 4242 DVA. March 6, Dimension Reduction. Guest Lecturer: Jaegul Choo

Efficient Segmentation-Aided Text Detection For Intelligent Robots

LSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia

Image Classification pipeline. Lecture 2-1

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

Exploratory Analysis: Clustering

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda

USING OF THE K NEAREST NEIGHBOURS ALGORITHM (k-nns) IN THE DATA CLASSIFICATION

A Brief Review of Representation Learning in Recommender 赵鑫 RUC

Towards Optimized Multimodal Concept Indexing

Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks

Announcements. HW3 problem 4c Kevin Jamieson

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

Learning Binary Code with Deep Learning to Detect Software Weakness

Cake and Grief Counseling Will be Available: Using Artificial Intelligence for Forensics Without Jeopardizing Humanity.

Qualitative Evaluation of Word2Vec for Recommendations

Self-tuning ongoing terminology extraction retrained on terminology validation decisions

Generative Adversarial Text to Image Synthesis

Neural Networks and Deep Learning

Machine Learning for Natural Language Processing. Alice Oh January 17, 2018

Intro to Artificial Intelligence

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015

Semantic Estimation for Texts in Software Engineering

The Boundary Graph Supervised Learning Algorithm for Regression and Classification

Data Mining and Data Warehousing Classification-Lazy Learners

Convolutional Networks for Text

Figure (5) Kohonen Self-Organized Map

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

arxiv: v1 [cs.dm] 14 Sep 2017

Classification and Regression

Towards Building Large-Scale Multimodal Knowledge Bases

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

Representation Learning using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval

Query Intent Detection using Convolutional Neural Networks

Enhancing Automatic Wordnet Construction Using Word Embeddings

Image-Sentence Multimodal Embedding with Instructive Objectives

All You Want To Know About CNNs. Yukun Zhu

Plagiarism Detection Using FP-Growth Algorithm

Machine Learning Practice and Theory

Domain-specific user preference prediction based on multiple user activities

Unsupervised Learning of Spatiotemporally Coherent Metrics

What is a... Manifold?

A Neuro Probabilistic Language Model Bengio et. al. 2003

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank text

Computer Vision Lecture 16

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

Image Classification pipeline. Lecture 2-1

ECS289: Scalable Machine Learning

YOLO9000: Better, Faster, Stronger

Lecture 6 K- Nearest Neighbors(KNN) And Predictive Accuracy

Sparse Non-negative Matrix Language Modeling

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

CSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo

A bit of theory: Algorithms

Lecture 8 May 7, Prabhakar Raghavan

LING 130: Quantified Noun Phrases

Information Retrieval. CS630 Representing and Accessing Digital Information. What is a Retrieval Model? Basic IR Processes

5 Learning hypothesis classes (16 points)

Encoding RNNs, 48 End of sentence (EOS) token, 207 Exploding gradient, 131 Exponential function, 42 Exponential Linear Unit (ELU), 44

Machine Learning in GATE

Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

Towards Large-Scale Semantic Representations for Actionable Exploitation. Prof. Trevor Darrell UC Berkeley

10-701/15-781, Fall 2006, Final

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

TextJoiner: On-demand Information Extraction with Multi-Pattern Queries

Inception Network Overview. David White CS793

Transcription:

Word2vec and beyond presented by Eleni Triantafillou March 1, 2016

The Big Picture There is a long history of word representations Techniques from information retrieval: Latent Semantic Analysis (LSA) Self-Organizing Maps (SOM) Distributional count-based methods Neural Language Models Important take-aways: 1. Don t need deep models to get good embeddings 2. Count-based models and neural net predictive models are not qualitatively different source: http://gavagai.se/blog/2015/09/30/a-brief-history-of-word-embeddings/

Continuous Word Representations Contrast with simple n-gram models (words as atomic units) Simple models have the potential to perform very well...... if we had enough data Need more complicated models Continuous representations take better advantage of data by modelling the similarity between the words

Continuous Representations source: http://www.codeproject.com/tips/788739/visualization-of- High-Dimensional-Data-using-t-SNE

Skip Gram Learn to predict surrounding words Use a large training corpus to maximize: 1 T log p(w t+j w t ) T where: t=1 c j c, j 0 T: training set size c: context size w j : vector representation of the j th word

Skip Gram: Think of it as a Neural Network Learn W and W in order to maximize previous objective Output layer y 1,j Input layer Hidden layer W' N V x k h i W' N V W V N y 2,j V-dim N-dim W' N V y C,j C V-dim source: word2vec parameter learning explained. ([4])

CBOW Input layer x 1k W V N Hidden layer Output layer x 2k W V N h i W' N V y j N-dim V-dim W V N x Ck C V-dim source: word2vec parameter learning explained. ([4])

word2vec Experiments Evaluate how well syntactic/semantic word relationships are captured Understand effect of increasing training size / dimensionality Microsoft Research Sentence Completion Challenge

Semantic / Syntactic Word Relationships Task

Semantic / Syntactic Word Relationships Results

Learned Relationships

Microsoft Research Sentence Completion

Linguistic Regularities king - man + woman = queen! Demo Check out gensim (python library for topic modelling): https://radimrehurek.com/gensim/models/word2vec.html

Multimodal Word Embeddings: Motivation Are these two objects similar?

Multimodal Word Embeddings: Motivation And these?

Multimodal Word Embeddings: Motivation What do you think should be the case? sim(, ) < sim(, )? or sim(, ) > sim(, )?

When do we need image features? It s surely task-specific. In many cases can benefit from visual features! Text-based Image Retrieval Visual Paraphrasing Common Sense Assertion Classification They are better-suited for zero shot learning (learn mapping between text and images)

Two Multimodal Word Embeddings approaches... 1. Combining Language and Vision with a Multimodal Skip-gram Model (Lazaridou et al, 2013) 2. Visual Word2Vec (vis-w2v): Learning Visually Grounded Word Embeddings Using Abstract Scenes (Kottur et al, 2015)

Two Multimodal Word Embeddings approaches... 1. Combining Language and Vision with a Multimodal Skip-gram Model (Lazaridou et al, 2013) 2. Visual Word2Vec (vis-w2v): Learning Visually Grounded Word Embeddings Using Abstract Scenes (Kottur et al, 2015)

Multimodal Skip-Gram The main idea: Use visual features for the (very) small subset of the training data for which images are available. Visual vectors are obtained by CNN and are fixed during training! Recall, Skip-Gram objective: T L ling (w t ) = log(p(w t+j w t )) t=1 c j c,j 0 New Multimodal Skip-Gram objective: L = 1 T (L ling (w t ) + L vision (w t )), T where t=1 L vision (w t ) = 0 if w t does not have an entry in ImageNet, and otherwise L vision (w t ) = max(0, γ cos(u wt, v wt ) + cos(u wt, v w )) w P(w)

Multimodal Skip-Gram: An example

Multimodal Skip-Gram: An example

Multimodal Skip-Gram: An example

Multimodal Skip-Gram: An example

Multimodal Skip-Gram: An example

Multimodal Skip-Gram: An example

Multimodal Skip-Gram: An example

Multimodal Skip-Gram: An example

Multimodal Skip-Gram: Comparing to Human Judgements MEN: general relatedness ( pickles, hamburgers ), Simplex-999: taxonomic similarity ( pickles, food ), SemSim: Semantic similarity ( pickles, onions ), VisSim: Visual Similarity ( pen, screwdriver )

Multimodal Skip-Gram: Examples of Nearest Neighbors Only donut and owl trained with direct visual information.

Multimodal Skip-Gram: Zero-shot image labelling and image retrieval

Multimodal Skip-Gram: Survey to evaluate on Abstract Words Metric: Proportion (percentage) of words for which number votes in favour of neighbour image significantly above chance. Unseen: Discard words for which visual info was accessible during training.

Multimodal Skip-Gram: Survey to evaluate on Abstract Words Left: subject preferred the nearest neighbour to the random image freedom theory wrong god together place

Two Multimodal Word Embeddings approaches... 1. Combining Language and Vision with a Multimodal Skip-gram Model (Lazaridou et al, 2013) 2. Visual Word2Vec (vis-w2v): Learning Visually Grounded Word Embeddings Using Abstract Scenes (Kottur et al, 2015)

Visual Word2Vec (vis-w2v): Motivation w2v : farther eating stares at vis-w2v : closer eating stares at Word Embedding girl eating ice cream girl stares at ice cream

Visual Word2Vec (vis-w2v): Approach Multimodal train set: tuples of (description, abstract scene) Finetune word2vec to add visual features obtained by abstract scenes (clipart) Obtain surrogate (visual) classes by clustering those features W I : initialized from word2vec N K : number of clusters of abstract scene features

Clustering abstract scenes Interestingly, prepare to cut, hold, give are clustered together with stare at etc. It would be hard to infer these semantic relationships from text alone. lay next to stand near enjoy stare at

Visual Word2Vec (vis-w2v): Relationship to CBOW (word2vec) Surrogate labels play the role of visual context.

Visual Word2Vec (vis-w2v): Visual Paraphrasing Results

Visual Word2Vec (vis-w2v): Visual Paraphrasing Results Approach Visual Paraphrasing AP (%) w2v-wiki 94.1 w2v-wiki 94.4 w2v-coco 94.6 vis-w2v-wiki 95.1 vis-w2v-coco 95.3 Table: Performance on visual paraphrasing task

Visual Word2Vec (vis-w2v): Common Sense Assertion Classification Results Given a tuple (Primary Object, Relation, Secondary Object), decide if it is plausible or not. Approach common sense AP (%) w2v-coco 72.2 w2v-wiki 68.1 w2v-coco + vision 73.6 vis-w2v-coco (shared) 74.5 vis-w2v-coco (shared) + vision 74.2 vis-w2v-coco (separate) 74.8 vis-w2v-coco (separate) + vision 75.2 vis-w2v-wiki (shared) 72.2 vis-w2v-wiki (separate) 74.2 Table: Performance on the common sense task

Thank you! [-0.0665592-0.0431451... -0.05182673-0.07418852-0.04472357 0.02315103-0.04419742-0.01104935] [ 0.08773034 0.00566679... 0.03735885-0.04323553 0.02130294-0.09108844-0.05708769 0.04659363]

Bibliography Mikolov, Tomas, et al. Efficient estimation of word representations in vector space. arxiv preprint arxiv:1301.3781 (2013). Kottur, Satwik, et al. Visual Word2Vec (vis-w2v): Learning Visually Grounded Word Embeddings Using Abstract Scenes. arxiv preprint arxiv:1511.07067 (2015). Lazaridou, Angeliki, Nghia The Pham, and Marco Baroni. Combining language and vision with a multimodal skip-gram model. arxiv preprint arxiv:1501.02598 (2015). Rong, Xin. word2vec parameter learning explained. arxiv preprint arxiv:1411.2738 (2014). Mikolov, Tomas, et al. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems. 2013.