Speech Recognition with Quaternion Neural Networks

Size: px

Start display at page:

Download "Speech Recognition with Quaternion Neural Networks"

Annabelle Booker
5 years ago
Views:

1 Speech Recognition with Quaternion Neural Networks LIA Titouan Parcollet, Mohamed Morchid, Georges Linarès University of Avignon, France ORKIS, France

2 Summary I. Problem definition II. Quaternion numbers III. Quaternion convolutional neural networks IV. Experimentation and discussions 1

3 Summary I. Problem definition II. Quaternion numbers III. Quaternion convolutional neural networks IV. Experimentation and discussions 1

4 Summary I. Problem definition II. Quaternion numbers III. Quaternion neural networks IV. Experimentation and discussions 1

5 Summary I. Problem definition II. Quaternion numbers III. Quaternion neural networks IV. Experiments and discussions 1

6 Problem definition 2

7 Problem definition Established fact n 1: the bigger the model is, the better the results are (with a good training procedure and enough data) 2

8 Problem definition Established fact n 1: the bigger the model is, the better the results are (with a good training procedure and enough data) Is the model really efficient? 2

9 Problem definition Established fact n 1: the bigger the model is, the better the results are (with a good training procedure and enough data) Is the model really efficient? Established fact n 2: Input features are often multidimensional 2

10 Problem definition Established fact n 1: the bigger the model is, the better the results are (with a good training procedure and enough data) Is the model really efficient? Established fact n 2: Input features are often multidimensional Is the usual flat real-valued representation good? 2

11 Problem definition Can we define a more natural representation of multidimensional input features than the real-valued one, that will helps neural networks to be more efficient? 3

12 Quaternion numbers 4

13 Quaternion numbers Q = r1+xi + yj + zk 4

14 Quaternion numbers Real part Q = r1+xi + yj + zk 4

15 Quaternion numbers Imaginary part Q = r1+xi + yj + zk 4

16 Quaternion numbers Real part Imaginary part Q = r1+xi + yj + zk Quaternions solve the multidimensionality problem! 4

17 Quaternion numbers Real part Imaginary part Q = r1+xi + yj + zk Quaternions solve the multidimensionality problem! Acoustic quaternion for speech processing Q(f,t) j 2 t k 4

18 Quaternion numbers Real part Imaginary part Q = r1+xi + yj + zk Quaternions solve the multidimensionality problem! Acoustic quaternion for speech processing MFCC, Mel-filter-banks Q(f,t) j 2 t k 4

19 Quaternion numbers Real part Imaginary part Q = r1+xi + yj + zk Quaternions solve the multidimensionality problem! Acoustic quaternion for speech processing First and second order derivatives Q(f,t) j 2 t k 4

20 Quaternion numbers Real part Imaginary part Q = r1+xi + yj + zk Quaternions solve the multidimensionality problem! Acoustic quaternion for speech processing Purely imaginary acoustic quaternion Q(f,t) j 2 t k 4

21 Quaternion numbers Real part Imaginary part Q = r1+xi + yj + zk Quaternions solve the multidimensionality problem! Pixel quaternion for image processing Purely imaginary pixel quaternion Q(p) =0+Red (p)i + Green(p)j + Blue(p)k 4

22 Quaternion numbers Hamilton product 5

23 Quaternion numbers Hamilton product Components are related to each others 5

24 Quaternion numbers Hamilton product in neural networks Real-valued layer 6

25 Quaternion numbers Hamilton product in neural networks Real-valued layer 6

26 Quaternion numbers Hamilton product in neural networks Real-valued layer Quaternion-valued layer 4 x 4 = 16 weights 6

27 Quaternion numbers Hamilton product in neural networks Real-valued layer Quaternion-valued layer r i j k r i j k 4 x 4 = 16 weights 6

28 Quaternion numbers Hamilton product in neural networks Real-valued layer Quaternion-valued layer r i j k r i j k 4 x 4 = 16 weights W q X q =(w r x r w x x x w y x y w z x z )+ (w r x x + w x x r + w y x z w z x y )i+ (w r x y w x x z + w y x r + w z x x )j+ (w r x z + w x x y w y x x + w z x r )k 6

29 Quaternion numbers Hamilton product in neural networks Real-valued layer Quaternion-valued layer r i j k r i j k 4 x 4 = 16 weights W q X q =(w r x r w x x x w y x y w z x z )+ (w r x x + w x x r + w y x z w z x y )i+ (w r x y w x x z + w y x r + w z x x )j+ (w r x z + w x x y w y x x + w z x r )k 6

30 Quaternion numbers Hamilton product in neural networks Real-valued layer Quaternion-valued layer r i j k r i j k 4 x 4 = 16 weights W q X q =(w r x r w x x x w y x y w z x z )+ (w r x x + w x x r + w y x z w z x y )i+ (w r x y w x x z + w y x r + w z x x )j+ (w r x z + w x x y w y x x + w z x r )k 1 weight = 4 parameters 6

31 Quaternion numbers Hamilton product in neural networks Quaternions can learn internal relations within input features! Quaternions reduce the number of neural parameters! 7

32 Quaternion Neural Networks (QNN)

33 Quaternion Neural Networks (QNN) QNN = NN with all parameters being quaternions 8

34 Quaternion Neural Networks (QNN) QNN = NN with all parameters being quaternions QNN = NN with to replace 8

35 Quaternion Neural Networks (QNN) QNN = NN with all parameters being quaternions QNN = NN with to replace QNN backpropagation and update differ from NN[1] [1] P. Arena, L. Fortuna, L. Occhipinti, and M. G. Xibilia, Neural networks for quaternion-valued function approximation, in Circuits and Systems, ISCAS 94., 1994 IEEE International Symposium on, vol. 6. IEEE, 1994, pp

36 Quaternion Neural Networks (QNN) Activation function: the «split» approach [1] Q = r1+xi + yj + zk [1] P. Arena, L. Fortuna, L. Occhipinti, and M. G. Xibilia, Neural networks for quaternion-valued function approximation, in Circuits and Systems, ISCAS 94., 1994 IEEE International Symposium on, vol. 6. IEEE, 1994, pp

37 Quaternion Neural Networks (QNN) Activation function: the «split» approach [1] Q = r1+xi + yj + zk f(q) =f(r)+f(x)i + f(y)j + f(z)k The function f can be any real-valued activation function Sigmoid, TanH, ReLU, ELU [1] P. Arena, L. Fortuna, L. Occhipinti, and M. G. Xibilia, Neural networks for quaternion-valued function approximation, in Circuits and Systems, ISCAS 94., 1994 IEEE International Symposium on, vol. 6. IEEE, 1994, pp

38 Quaternion Neural Networks (QNN) Neural parameters initialization 10

39 Quaternion Neural Networks (QNN) Neural parameters initialization 10

40 Quaternion Neural Networks (QNN) Neural parameters initialization 10

41 Experiments and discussions

42 Experiments and discussions Neural Networks reminder

43 Experiments and discussions Neural Networks reminder Convolutional neural networks (CNN) 11

44 Experiments and discussions Neural Networks reminder Recurrent neural networks (RNN) 12

45 Experiments and discussions Neural Networks reminder Long-Short Term Memory recurrent neural networks (LSTM) 13

46 Experiments and discussions Speech Recognition tasks Where are we using neural networks? 14

47 Experiments and discussions Speech Recognition tasks Automatic Speech Recognition (ASR) system overly simplified 15

48 Experiments and discussions Speech Recognition tasks Automatic Speech Recognition (ASR) system overly simplified 15

49 Experiments and discussions Acoustic Modelling Speech Recognition tasks 16

50 Experiments and discussions End-to-End - TIMIT speakers training set - 50 speakers validation set sentences as a core test set - SA records are removed from the training Acoustic Modelling Speech Recognition tasks 16

51 Experiments and discussions End-to-End - TIMIT speakers training set - 50 speakers validation set sentences as a core test set - SA records are removed from the training Acoustic Modelling - Q-Convolutional Neural Network + CTC [2] Speech Recognition tasks [2] A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks, in Proceedings of the 23rd international conference on Machine learning. ACM, 2006, pp

52 Experiments and discussions Acoustic Modelling Speech Recognition tasks End-to-End - TIMIT speakers training set - 50 speakers validation set sentences as a core test set - SA records are removed from the training - Q-Convolutional Neural Network + CTC [2] Traditional HMM - TIMIT speakers training set - 50 speakers validation set sentences as a core test set - SA records are removed from the training - Wall Street Journal (WSJ) - 14h and 81h training set - test-dev93 used as a validation set - test-eval92 used as a test set [2] A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks, in Proceedings of the 23rd international conference on Machine learning. ACM, 2006, pp

53 Experiments and discussions Acoustic Modelling Speech Recognition tasks End-to-End - TIMIT speakers training set - 50 speakers validation set sentences as a core test set - SA records are removed from the training - Q-Convolutional Neural Network + CTC [2] Traditional HMM - TIMIT speakers training set - 50 speakers validation set sentences as a core test set - SA records are removed from the training - Wall Street Journal (WSJ) - 14h and 81h training set - test-dev93 used as a validation set - test-eval92 used as a test set - Q-Recurrent Neural Networks (QRNN) - Q-Long-Short Term Memory NN (QLSTM) [2] A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks, in Proceedings of the 23rd international conference on Machine learning. ACM, 2006, pp

54 Experiments and discussions Acoustic features 17

55 Experiments and discussions Acoustic features acoustic quaternion 40 mel-filter-banks + delta + dd + ddd = 160 real-valued inputs 40 MFCC + deltas + deltas-deltas = 40 quaternion-valued inputs 17

56 Experiments and discussions End-to-End results on TIMIT - QCNN + CTC Models architectures will be discussed during the questions! ;) 18

57 Experiments and discussions End-to-End results on TIMIT - QCNN + CTC Error expressed in Phoneme Error Rate (PER %) FM = Feature Maps 19

58 Experiments and discussions End-to-End results on TIMIT - QCNN + CTC Error expressed in Phoneme Error Rate (PER %) FM = Feature Maps 19

59 Experiments and discussions End-to-End results on TIMIT - QCNN + CTC Error expressed in Phoneme Error Rate (PER %) FM = Feature Maps 4x less parameters! 19

60 Experiments and discussions No more End-to-End results 20

61 Experiments and discussions HMM on TIMIT with PyTorch-Kaldi - QRNN 21

62 Experiments and discussions HMM on TIMIT with PyTorch-Kaldi - QRNN Error expressed in Phoneme Error Rate (PER %) 22

63 Experiments and discussions HMM on TIMIT with PyTorch-Kaldi - QRNN Error expressed in Phoneme Error Rate (PER %) 22

64 Experiments and discussions HMM on TIMIT with PyTorch-Kaldi - QRNN Error expressed in Phoneme Error Rate (PER %) 2.5x less parameters! 22

65 Experiments and discussions HMM on TIMIT with PyTorch-Kaldi - QLSTM 23

66 Experiments and discussions HMM on TIMIT with PyTorch-Kaldi - QLSTM Error expressed in Phoneme Error Rate (PER %) 24

67 Experiments and discussions HMM on TIMIT with PyTorch-Kaldi - QLSTM Error expressed in Phoneme Error Rate (PER %) 24

68 Experiments and discussions HMM on TIMIT with PyTorch-Kaldi - QLSTM Error expressed in Phoneme Error Rate (PER %) 3.2x less parameters! 24

69 Experiments and discussions HMM on WSJ with PyTorch-Kaldi - QLSTM 25

70 Experiments and discussions HMM on WSJ with PyTorch-Kaldi - QLSTM Error expressed in Word Error Rate (WER %) 26

71 Experiments and discussions HMM on WSJ with PyTorch-Kaldi - QLSTM Error expressed in Word Error Rate (WER %) 26

72 Experiments and discussions HMM on WSJ with PyTorch-Kaldi - QLSTM Error expressed in Word Error Rate (WER %) 26

73 Conclusion Can we define a more natural representation of multidimensional input features than the real-valued one, that will helps neural networks to be more efficient? 27

74 Conclusion Can we define a more natural representation of multidimensional input features than the real-valued one, that will helps neural networks to be more efficient? Better and more natural representation multidimensional features 27

75 Conclusion Can we define a more natural representation of multidimensional input features than the real-valued one, that will helps neural networks to be more efficient? Better and more natural representation multidimensional features The Hamilton product alongside with neural networks allow QNN to well-learn both internal and contextual dependencies 27

76 Conclusion Can we define a more natural representation of multidimensional input features than the real-valued one, that will helps neural networks to be more efficient? Better and more natural representation multidimensional features The Hamilton product alongside with neural networks allow QNN to well-learn both internal and contextual dependencies Reduction of the number of free parameters 27

77 Conclusion Can we define a more natural representation of multidimensional input features than the real-valued one, that will helps neural networks to be more efficient? Better and more natural representation multidimensional features The Hamilton product alongside with neural networks allow QNN to well-learn both internal and contextual dependencies Reduction of the number of free parameters Y E S W E C A N 27

78 Ressources Related to this presentation: - «Quaternion Recurrent Neural Networks» ICLR 2019, Titouan Parcollet, Mirco Ravanelli, Mohamed Morchid, Chiheb Trabelsi, Georges Linarès, Renato De Mori, Yoshua Bengio - «Speech Recognition with Quaternion Neural Networks», NIPS (NeurIPS) IRASL, Titouan Parcollet, Mirco Ravanelli, Mohamed Morchid, Georges Linarès, Renato De Mori - «Quaternion Convolutional Neural Networks for End-to-End Speech Recognition» Interspeech 2018 Oral Session on «End-to-End ASR», Titouan Parcollet,Ying Zhang, Mohamed Morchid, Chiheb Trabelsi, Georges Linarès, Renato De Mori, Yoshua Bengio - «Bidirectional Quaternion Long-Short Term Memory Recurrent Neural Networks for Speech Recognition» Submitted ICASSP 2019, Titouan Parcollet, Mohamed Morchid, Georges Linarès, Renato De Mori - «The Pytorch-Kaldi Speech Recognition Toolkit» Submitted ICASSP 2019, Mirco Ravanelli, Titouan Parcollet, Yoshua Bengio QNN with PyTorch and Keras: PyTorch-Kaldi:

79 Thanks you! Questions? Related to this presentation: *Eve follows a rotation described by a unit quaternion around Wall-e - «Quaternion Recurrent Neural Networks» ICLR 2019, Titouan Parcollet, Mirco Ravanelli, Mohamed Morchid, Chiheb Trabelsi, Georges Linarès, Renato De Mori, Yoshua Bengio - «Speech Recognition with Quaternion Neural Networks», NIPS (NeurIPS) IRASL, Titouan Parcollet, Mirco Ravanelli, Mohamed Morchid, Georges Linarès, Renato De Mori - «Quaternion Convolutional Neural Networks for End-to-End Speech Recognition» Interspeech 2018 Oral Session on «End-to-End ASR», Titouan Parcollet,Ying Zhang, Mohamed Morchid, Chiheb Trabelsi, Georges Linarès, Renato De Mori, Yoshua Bengio - «Bidirectional Quaternion Long-Short Term Memory Recurrent Neural Networks for Speech Recognition» Submitted ICASSP 2019, Titouan Parcollet, Mohamed Morchid, Georges Linarès, Renato De Mori - «The Pytorch-Kaldi Speech Recognition Toolkit» Submitted ICASSP 2019, Mirco Ravanelli, Titouan Parcollet, Yoshua Bengio QNN with PyTorch and Keras: PyTorch-Kaldi:

80 Quaternion numbers Hamilton product in neural networks

81 Quaternion convolution

82 Computations W q X q =(w r x r w x x x w y x y w z x z )+ (w r x x + w x x r + w y x z w z x y )i+ (w r x y w x x z + w y x r + w z x x )j+ (w r x z + w x x y w y x x + w z x r )k 28 operations that should be computed in parallel with CUDA and GPUs

83 Quaternion equations Q = r x y z x r z y 7 y z r x5 Q = r1 xi yj zk z y x r Q / = Q p r2 + x 2 + y 2 + z 2 Q = Q e n = Q (cos( )+nsin( )) n = xi + yj + zk Q sin( )

Connectionist Temporal Classification[2] Hannun, "Sequence Modeling with CTC", Distill, 2017. [2] A. Graves, S. Fernández, F. Gomez, and J.

84 Connectionist Temporal Classification[2] Hannun, "Sequence Modeling with CTC", Distill, [2] A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks, in Proceedings of the 23rd international conference on Machine learning. ACM, 2006, pp

85 Learning Internal relations with QCNN

arxiv: v1 [cs.sd] 20 Jun 2018

arxiv: v1 [cs.sd] 20 Jun 2018 Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition Titouan Parcollet 1,2,4, Ying Zhang 2,5, Mohamed Morchid 1, Chiheb Trabelsi 2, Georges Linarès 1, Renato De Mori 1,3