A Joint Model of Language and Perception for Grounded Attribute Learning

Size: px

Start display at page:

Download "A Joint Model of Language and Perception for Grounded Attribute Learning"

Maximillian Hawkins
5 years ago
Views:

1 A Joint Model of Language and Perception for Grounded Attribute Learning Cynthia Matuszek*, Nicholas FitzGerald*, Liefeng Bo, Luke Zettlemoyer, Dieter Fox"

Learning Attributes 2 u Physically grounded systems à opportunities for learning u Motivation: robots learning from interaction u With people u With

2 Learning Attributes 2 u Physically grounded systems à opportunities for learning u Motivation: robots learning from interaction u With people u With environment u Need a model to learn from real-world input u Language and sensory data about novel, physical things u Goal: learn new ideas from interacting with the world.

3 Some Related Work 3 u Vision u Object recognition (Felzenszwalb et al., TPAMI 2009; et al., CVPR 2009) u Visual attributes of objects (CVPR Farhadi et al., 2009; Parikh & Grauman, ICCV 2011) u Kernel descriptors (Bo et al., NIPS 2010; IROS 2011) u Semantic Parsing u Inducing semantic parsers (Liang et al., ACL 2011; Wong & Mooney, ACL 2007; ) u CCG parsers (Zettlemoyer & Collins, ACL 2009; Kwiatkowski et al., EMNLP 2011; ) u Grounded Language Acquisition and Parsing for Robotics u Parsing NL in known world and action models (Matuszek et al., ISER 2012; Tellex et al. AAAI 2011; Kollar et al., HRI 2010; ) u Parsing NL for RoboCup and navigation (Mooney et al, Chen & Mooney, AAAI 2011) u Language grounding for semantic mapping (Kruijff & Zender, HRI 2006) u And many more!

4 Outline 4 u Introduction & Motivation u Related Work u Task Description u Background u Joint Model & Model Learning u Experimental Results u Discussion and Future Work

5 Task Description 5 u Learning to select objects described by attribute u Learn previously unknown attributes u Yellow: new word describing new idea u NLP-style semantic parsing: mapping NL to formal representation u Grounded in real-world perception Which are the yellow objects?

6 Task Description 6 Language yellow Formal Expression color-new-1(x) Classifier C color-new-1 (x) = {0,1} Ground Truth

7 Outline 7 u Introduction & Motivation u Task Description u Related Work u Background u Semantic parsing model; perceptual model u Joint Model & Model Learning u Experimental Results u Discussion and Future Work

8 [Steedman (book) 2000, Kwiatkowski et al EMNLP 2010, 2011] Categorial Combinatory Grammars 8 u Capture syntax and semantics of language u Parse sentences to expressions in λ-calculus u Space of possible parses defined by: lexical entries along with combinatory rules. u Probabilistic CCGs define a log-linear model over: sentence x parse y logical form z red block N : λx. color-red(x) N \ N : λx. x p( y,z x;θ,λ) = eθ φ(x,y,z) e θ φ(x,y',z') y',z'

9 Perceptual Model [Bo et al IROS 2011, CVPR 2011] 9 u Visual model is a set of binary classifiers, one/attribute u Each perceptual classifier is applied independently c color-yellow (x) u Kernel descriptors u Trained using linear SVM

10 Outline 10 u Introduction & Motivation u Task Description u Related Work u Background u Joint Model & Model Learning u Experimental Results u Discussion and Future Work

11 Joint Model 11 u Goal: compute the probability of an indicated set u World model (product of possible classifier assignments to all objects o in O): u Joint Model: Joint Probability Parsing Model World Model Grounding Query

12 Joint Model 12 u Goal: compute the probability of an indicated set x O u World model (product of possible z classifier assignments to all objects o in O): G w u Joint Model: Joint Probability Parsing Model World Model Grounding Query

13 Inference 13 These are the ones that are blue Semantic Parsing λx.color-blue(x) Blue Green Binary Attribute Classifiers Round Broccoli. Grounded Query P(G,z,w x,o) = P(z x) Joint Probability Parsing Model o O c C P(w o, c o) Vision Model P(G z,w) Grounding Query

Why Joint Learning? 14 u Language helps determine attribute relations u New language can be ambiguous: This is <new word>. u New color attribute? This is red.

14 Why Joint Learning? 14 u Language helps determine attribute relations u New language can be ambiguous: This is <new word>. u New color attribute? This is red. u New shape attribute? This is square. u Synonym? This is saffron. u No attribute at all This is toy. u Vision helps decide among these possible classifiers

Supervised Learning 15 u With labeled sentence meaning, object groups, alignment u Decomposes into two independent learning problems: Semantic Parsing These are the ones that are blue λx.

15 Supervised Learning 15 u With labeled sentence meaning, object groups, alignment u Decomposes into two independent learning problems: Semantic Parsing These are the ones that are blue λx.color-blue(x) This is a round toy λx.shape-spheroid(x) conjugate gradient descent Attribute Classification Blue Green Round SVM on kernel descriptors P(z x) supervised semantic parser o O c C P(w o, c o) classification of all attributes

16 Unsupervised Learning 16 u Labeling is expensive can it be avoided? ( It s the,, ) blue one u Initialization u Train an initial supervised model from labeled scenes (sentence/logic and object/attributes) u Add N new, unknown attribute classifiers u Initialize to a small, near-uniform distribution u Pair with every unknown word/phrase u Objective:

17 Unsupervised Learning 17 u Using an EM-style algorithm: 1. Compute latent P(z x) and P(w c,o o) 2. Re-estimate parameters of parsing and vision models

18 Outline 18 u Introduction & Motivation u Task Description u Related Work u Background u Joint Model & Model Learning u Experimental Results u Discussion and Future Work

1: Initialization λx.color-orange(x) shape-spheroid(x) This is an orange ball. 2: Training 3: Testing It s the yellow block. All of these toys are yellow. colornew(x)?

19 1: Initialization λx.color-orange(x) shape-spheroid(x) This is an orange ball. 2: Training 3: Testing It s the yellow block. All of these toys are yellow. colornew(x)?? color-red(x) color-blue(x)? shapenew(x)?? shape-cube(x) shape-rect(x)? color-red(x) NULL?? shape-cyl.(x)? color-blue(x) shape-rect(x) color-red(x) NULL?? shape-cyl.(x) color-new(x)

20 Your answer should be the sentence(s) the parent said while poin5ng to these things. This one s an orange ball. λx. orange(x) spheroid(x)

Experimental Evaluation 21 u 142 scenes, 6 color and 6 shape attributes u ~1,000 NL sentences from Mechanical Turk u Ground truth formulas and classifier assignments u 20 splits into u 30%

21 Experimental Evaluation 21 u 142 scenes, 6 color and 6 shape attributes u ~1,000 NL sentences from Mechanical Turk u Ground truth formulas and classifier assignments u 20 splits into u 30% training items for initialization (3 colors, 3 shapes) u 55% training items for training (3 new colors, 3 new shapes) u 10% test cases with new colors+shapes Precision = 82%, Recall = 71%, F1 = 76%

22 Object ClassiJier Results 22 x-axis: Novel NL words y-axis: Selected classifiers (including null)

23 Initialization Data 23 u Primarily initializing language model u Training focused on learning new attributes/words

24 Failure cases 24 u Bad parses: This is a red, toy rectangle. λx.rect(x) triangle(x) u Bad classification: Cylinders (lengthwise) look like rectangular solids Humans made the same errors data is noisy!

25 Failure cases 25 u Incorrect human input u u u Typos Visual errors This is a blue toy shaped like a half-pipe. Unexpected human input It s brown, with blue specular highlights. (This confused the classifiers, too) This object is a fake piece of green lettuce. Do not try to eat!

26 Discussion 26 u Accurate language and attribute models can be learned from data: u Language, raw percepts, and target objects u Language and vision combine to learn previously-unrepresented ideas u extending the world model by interaction, not programming. u Future Work u Scale complexity of language, sensory streams u Extend to learning tasks, goals, and more attributes u Implement in interactive setting

27 Thanks! 27 u Questions?

A Two-Phased Approach for Natural Language Parsing into Formal Logic. Giancarlo Sturla

A Two-Phased Approach for Natural Language Parsing into Formal Logic by Giancarlo Sturla B.S., Massachusetts Institute of Technology, 2016 Submitted to the Department of Electrical Engineering and Computer