Machine Learning for Natural Language Processing. Alice Oh January 17, 2018

Size: px

Start display at page:

Download "Machine Learning for Natural Language Processing. Alice Oh January 17, 2018"

Joel Green
5 years ago
Views:

1 Machine Learning for Natural Language Processing Alice Oh January 17, 2018

2 Overview Distributed representation Temporal neural networks RNN LSTM GRU Sequence-to-sequence models Machine translation Response generation Question answering 1/18/18 Users & Information Lab, KAIST 2

3 Vector Representation Word Embedding Figures by Sungjoon Park Vector Representation Word [ ] NLP tasks Word2vec (Mikolov, 2013) Glove (Pennington, et al., 2014) 1/17/18 Users & Information Lab, KAIST 3

4 Vector Representation Word Embedding Figures by Sungjoon Park CBOW (Mikolov et al., 2013) Input Projection Output Objective Given a sequence of training words w D ) w(t-2) 1 V # # log p w ) w )*+ 3 -./+/.,+12 w(t-1) [ ] sum w(t) where p w 4 w 5 is a softmax: w(t+1) p w 4 w 5 = exp (u =? > v =A ) = exp (u? = v =A ) w(t+2) computed by hierarchical softmax or negative sampling 1/18/18 Users & Information Lab, KAIST 4

5 Vector Representation Word Embedding Figures by Sungjoon Park Skip-Gram (Mikolov et al., 2013) Input Projection Output Objective Given a sequence of training words w D ) w(t-2) 1 V # # log p w )*+ w ) 3 -./+/.,+12 w(t) w(t-1) [ ] where p w 4 w 5 is a softmax: w(t+1) p w 4 w 5 = exp (u = >? v =A ) exp (u =? v =A ) = w(t+2) computed by hierarchical softmax or negative sampling 1/18/18 Users & Information Lab, KAIST 5

6 Vector Representation Word Embedding Figures by Sungjoon Park Glove (Pennington et al., 2014) Motivation Use global information (co-occurrence over corpus) while learning word vectors Objective Dot product between two word vectors should be equal to log of the words probability of co-occurrence (with given context words) [ ] F w G, w +, w H = P GH P +H 3 J = # f(x G+ )(w G ) wm + + b G + bp + log X G+ ) R G,+SD 1/18/18 Users & Information Lab, KAIST 6

7 Vector Representation Word Embedding Figures by Sungjoon Park Vector Representation??? [ ] [ ]??? Not clear what it means 1/18/18 Users & Information Lab, KAIST 7

8 Vector Representation Word Embedding Figures by Sungjoon Park Interpretable Vector Representation Company Fruits Nouns Apple [ ] [ ] Sports Numbers Sparsity (Faruqui et al., 2015) Understanding semantic / syntactic compositionality of words Increasing efficiency of storage Reducing complexity of higher-level models 1/18/18 Users & Information Lab, KAIST 8

9 Vector Representation Word Embedding Figures by Sungjoon Park Interpretable Vector Representation Company Fruits Nouns [ ] [ ] Sports Numbers Rotate Dimensions (as a post-processing method) 1/18/18 Users & Information Lab, KAIST 9

10 Vector Representation Word Embedding Figures by Sungjoon Park x R Superintendent Chairman President Commissioner [ ] Minister Superintendent Chairman President Commissioner Minister Russia China Italy Germany France x D Russia China Italy Germany France 1/18/18 Users & Information Lab, KAIST 10

11 Vector Representation Word Embedding Figures by Sungjoon Park x R x D x R [ ] x D Superintendent Chairman President Commissioner Minister Russia China Italy Germany France 1/18/18 Users & Information Lab, KAIST 11

12 Vector Representation Word Embedding Figures by Sungjoon Park To Compute Λ: Λ = AT Ø Satisfying: T _ T = I Orthogonal diag(t -D T -D_ ) = I Oblique Ø Minimize: Z Y Y f λ = 1 κ # # # λ [ ] G+ GSD +SD X1+,XSD R R λ GX Row complexity Y Z Y + κ # # # λ R R G+ λ X+ +SD GSD X1G,XSD Column complexity ü κ : weighting parameter Quartimax Varimax Parsimax Factor Parsimony 0 1/p m-1/p+m-2 1 1/18/18 Users & Information Lab, KAIST 12

Vector Representation Word Embedding Figures by Sungjoon Park Experimental Settings Training [.21.32.63 -.04.

3M articles 83M sentences 1,676 tokens Word2Vec, Glove 306491 words 300 dimensions For each kappa (4) For each Embedding (2) For each constraint (2)

13 Vector Representation Word Embedding Figures by Sungjoon Park Experimental Settings Training [ ] [ ] [ ] English Wikipedia 2 Vector Representations [ ] [ ] [ ] 16 Rotated Representations 5.3M articles 83M sentences 1,676 tokens Word2Vec, Glove words 300 dimensions For each kappa (4) For each Embedding (2) For each constraint (2) Implementation Algorithm : Gradient Projection (Jennrich. 2001) Github: (TensorFlow) 1/18/18 Users & Information Lab, KAIST 13

14 Vector Representation Word Embedding Figures by Sungjoon Park Task : Word Intrusion To measure semantic coherence of words: Intruder! { daughter, wife, sister, mother, son, bigram } Choosing an intruder (for that dimension): [ ] Rank Dim 2 Dim 3 Dim Topwords (k=5) daughter wife sister mother son bigram (also in top 10% in other dim) Intruder bigram (Below half) 1/18/18 Users & Information Lab, KAIST 14

15 Vector Representation Word Embedding Figures by Sungjoon Park Measure: Distance Ratio DR ghijkxx = 1 d m ksd m ksd k D Gl)ij k D Gl)jk, where k DR Gl)jk k DR Gl)ij = Σ = o Σ =p dist(w G, w + ) k(k 1) = Σ = o dist(w G, w Gl)jtmij ) k Example: [ ] k DR Gl)ij x R intruder topword2 topword3 topword1 topword5 topword4 k DR Gl)jk x D 1/18/18 Users & Information Lab, KAIST 15

16 Vector Representation Word Embedding Figures by Sungjoon Park Results Quantitative Results Qualitative Examples Distance Ratio SG Glove Original SOV SOV (non-neg) Quartimax (orthogonal) [ ] Varimax (orthogonal) Parsimax (orthogonal) FacParsim (orthogonal) Quartimax (oblique) Varimax (oblique) Parsimax (oblique) FacParsim (oblique) /18/18 Users & Information Lab, KAIST 16

17 Temporal Models Recurrent Neural Network There are many types of data that are sequential Language: sequence of words DNA: sequence of nucleotides Stock market: sequence of gains/losses Recurrent neural network is a flexible way to represent sequential data of arbitrary length 1/17/18 Users & Information Lab, KAIST 17

18 Temporal Models Recurrent Neural Network Figures by Chris Dyer 1/17/18 Users & Information Lab, KAIST 18

19 Temporal Models Recurrent Neural Network Figures by Chris Dyer 1/17/18 Users & Information Lab, KAIST 19

20 Temporal Models Recurrent Neural Network Figures by Chris Dyer 1/17/18 Users & Information Lab, KAIST 20

21 Temporal Models Recurrent Neural Network Figures by Chris Dyer 1/17/18 Users & Information Lab, KAIST 21

22 Temporal Models RNN Language Model Figures by Chris Dyer 1/17/18 Users & Information Lab, KAIST 22

23 Temporal Models RNN Language Model Figures by Chris Dyer 1/17/18 Users & Information Lab, KAIST 23

24 Temporal Models RNN Language Model Figures by Chris Dyer 1/17/18 Users & Information Lab, KAIST 24

25 Temporal Models RNN Language Model Figures by Chris Dyer 1/17/18 Users & Information Lab, KAIST 25

26 Temporal Models Long Short Term Memory Figures by Chris Dyer 1/17/18 Users & Information Lab, KAIST 26

27 Temporal Models Long Short Term Memory Figures by Chris Dyer 1/17/18 Users & Information Lab, KAIST 27

28 Temporal Models Long Short Term Memory Figures by Chris Dyer 1/17/18 Users & Information Lab, KAIST 28

29 Temporal Models Long Short Term Memory Figures by Chris Dyer 1/17/18 Users & Information Lab, KAIST 29

30 Temporal Models Gated Recurrent Unit Figures by Chris Dyer 1/17/18 Users & Information Lab, KAIST 30

31 Temporal Models Gated Recurrent Unit Figures by Chris Dyer 1/18/18 Users & Information Lab, KAIST 31

32 Sequence to Sequence Machine Translation Figures by Kyunghyun Cho 1/17/18 Users & Information Lab, KAIST 32 25

33 Sequence to Sequence Machine Translation Figures by Kyunghyun Cho 1/18/18 1/17/18 Users & Information Lab, KAIST 33 25

34 Sequence to Sequence Machine Translation Figures by Kyunghyun Cho 1/18/18 1/17/18 Users & Information Lab, KAIST 34 25

35 Sequence to Sequence Machine Translation Figures by Kyunghyun Cho 1/18/18 1/17/18 Users & Information Lab, KAIST 35 25

36 Sequence to Sequence Machine Translation Figures by Kyunghyun Cho 1/18/18 1/17/18 Users & Information Lab, KAIST 36 25

37 Sequence to Sequence Machine Translation Figures by Kyunghyun Cho 1/18/18 1/17/18 Users & Information Lab, KAIST 37 25

38 Sequence to Sequence Machine Translation Figures by Kyunghyun Cho 1/18/18 1/17/18 Users & Information Lab, KAIST 38 25

39 Sequence to Sequence Machine Translation Figures by Kyunghyun Cho 1/18/18 1/17/18 Users & Information Lab, KAIST

40 Sequence to Sequence Machine Translation Figures by Kyunghyun Cho 1/18/18 1/17/18 Users & Information Lab, KAIST 40 25

41 Sequence to Sequence Machine Translation Figures by Kyunghyun Cho 1/18/18 1/17/18 Users & Information Lab, KAIST 41 25

42 Sequence to Sequence Machine Translation Figures by Kyunghyun Cho 1/18/18 1/17/18 Users & Information Lab, KAIST 42 25

43 Sequence to Sequence Machine Translation Figures by Kyunghyun Cho 1/17/18 1/18/18 Users & Information Lab, KAIST 25 43

44 Sequence to Sequence Machine Translation Figures by Kyunghyun Cho 1/18/18 1/17/18 Users & Information Lab, KAIST 44 25

45 Sequence to Sequence Response Generation Slide From Bill MacCartney 1/17/18 Users & Information Lab, KAIST 45

46 Sequence to Sequence Response Generation Figures by Jiwei Lee; References Sutskever et al., 2014; Jean et al., 2014; Luong et al., 2015 Source : Input Messages Target : Responses I m fine. EOS Encoding Decoding how are you? eos I m fine. 1/18/18 Users & Information Lab, KAIST 46

47 Sequence to Sequence Response Generation Figures by Jiwei Lee How old are you? I don t know. 1/18/18 Users & Information Lab, KAIST 47

48 Sequence to Sequence Response Generation Figures by Jiwei Lee How is life? I don t know what you are talking about. 1/18/18 Users & Information Lab, KAIST 48

49 Sequence to Sequence Response Generation Figures by Jiwei Lee Do you love me? I don t know what you are talking about. 30% percent of all generated responses 1/18/18 Users & Information Lab, KAIST 49

50 Sequence to Sequence Response Generation Figures by Jiwei Lee What one asks I don t know 1/18/18 Users & Information Lab, KAIST 50

51 Sequence to Sequence Response Generation Figures by Jiwei Lee 1/18/18 Users & Information Lab, KAIST 51

52 Sequence to Sequence Response Generation Figures by Jiwei Lee Bayesian Rule Standard Seq2Seq model 1/18/18 Users & Information Lab, KAIST 52

53 Sequence to Sequence Response Generation Figures by Jiwei Lee Bayesian Rule 1/18/18 Users & Information Lab, KAIST 53

54 Sequence to Sequence Response Generation Figures by Jiwei Lee BLEU on Twitter Dataset 1/18/18 Users & Information Lab, KAIST 54

55 Sequence to Sequence Response Generation Figures by Jiwei Lee # Distinct Tokens in generated targets (divided by total #) on Opensubtitle dataset +385% +122% Users & Information Lab, KAIST 1/18/18 55

56 Sequence to Sequence Response Generation Figures by Jiwei Lee Standard Seq2Seq p(t s) Mutual Information 1/18/18 Users & Information Lab, KAIST 56

57 Attention Mechanism Sentiment Classification 1/17/18 Users & Information Lab, KAIST 57

58 Memory Networks Question Answering babi tasks (Weston, et al. ICLR 2016) John dropped the milk. John took the milk there. Sandra went to the bathroom. John moved to the hallway. Mary went to the bedroom. Where is the milk? Hallway Task 3: Two supporting facts 1/17/18 Users & Information Lab, KAIST 58

59 Memory Networks Question Answering Figures by Sukhbaatar 1/18/18 Users & Information Lab, KAIST 59

60 Memory Networks Question Answering Figures by Sukhbaatar 1/18/18 Users & Information Lab, KAIST 60

61 Memory Networks Question Answering Figures by Sukhbaatar 1/18/18 Users & Information Lab, KAIST 61

62 Memory Networks Question Answering Figures by Sukhbaatar 1/18/18 Users & Information Lab, KAIST 62

63 Memory Networks Question Answering Figures by Sukhbaatar 1/18/18 Users & Information Lab, KAIST 63

64 Summary Distributed representation Temporal neural networks RNN LSTM GRU Sequence-to-sequence models Machine translation Response generation Question answering 1/18/18 Users & Information Lab, KAIST 64

16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning. Spring 2018 Lecture 14. Image to Text

16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning Spring 2018 Lecture 14. Image to Text Input Output Classification tasks 4/1/18 CMU 16-785: Integrated Intelligence in Robotics