A Survey on Speech Recognition Using HCI

Size: px

Start display at page:

Download "A Survey on Speech Recognition Using HCI"

Marshall Hart
5 years ago
Views:

1 A Survey on Speech Recognition Using HCI Done By- Rohith Pagadala Department Of Data Science Michigan Technological University CS5760 Topic Assignment 2 rpagadal@mtu.edu

2 ABSTRACT In this paper, we will have a survey on Speech recognition using Human Computer Interface(HCI). Speech recognition software uses different methodologies and technologies that convert speech in to text. There are different models and algorithms to convert speech in to text and there are being used in many ways and a new method to use the speech recognition is to use it in the Automatic driving car. So, In this application, the user can use his voice to control his car such as pulling out of the car from the parking area to the location of the user and it can be done by using the key as a primary mode of communication which is connected to the car. INDEX TERMS Human Computer Interaction(HCI), Automatic Speech Recognition(ASR),Finger Print Censor, Hidden Markov Model(HMM), Deep Belief Networks(DBN). INTRODUCTION The new concept is that the car uses the concept of speech recognition algorithms which contains both electrical and mechanical domains and even the finger print sensor is also being used. Voice recognition system is used in car keys which is connected to car navigation system. The user will give instructions through the car keys which contains the finger print sensor and a microphone through which the command is received. The car navigation system which is a fully digital system consists of all the words which are used to control the car and the navigation system should consist off all the commands that is used to control the car such as when the user is far distance from the car and if he/she uses the car keys to unlock the voice recognition system by using the finger print censor and this can be done through

3 End-to-End Automatic Speech Recognition (ASR), it is a recognition system which mainly used for diction. Through this application, the user can get his car to his/her location without walking up to the car or for searching their car in a big parking lot. FUNCTION OF SPEECH RECOGNITION Speech recognition is used in many electronic devices such as phone and laptop, it is converted from speech to text using some specific software and algorithms. One such model is known as Hidden Markov Model(HMM), which is widely used because they are trained automatically and they are also very simple to use. Each word in the Hidden Markov model will have a different output. The speech can be decoded by using various algorithms to find the best output and it can be used to get accurate output. The other type of model for speech recognition would be Automatic Speech Recognition model, in the model the input is converted into a sequence of fixed size vectors and this process is known as feature extraction. In this model, it extracts a set of features from the given input speech and another type of model which comes under ASR is Deep Belief Networks(DBN). Deep Belief Networks come under neural network which consists of Restricted Boltzmann Machine(RBM) that consists of an input layer and a hidden layer which are connected to weight matrix and the matrix of the same weight is used for recognition and reconstruction. The model for voice recognition uses the complex relation between inputs and outputs to find patterns in data and the data input training can be done through MATLAB and the output that are in the form of waves can be decoded using MATLAB.

4 Training Data Applying Constraints Acoustic models Language Model Speech Feature Extraction Decoder Text Components of the flow chart diagram: Training Data: In a dataset, a training set is implemented to build up a model, while a test set is to validate the model built. Data points in the training set are excluded from the test set. Usually, a dataset is divided into a training set divided into a training set, a validation set and a test set in each iteration. Acoustic Models: It is a model that is used in ASR that represents the relation between an audio signal and other units that make up speech. It is created by taking audio recording of speech and the text.

Feature Extraction: It involves in the analysis of the speech signal and there are two types of techniques: 1.Temporal Analysis: In this, the speech waveform itself is used for analysis. 2.

5 Feature Extraction: It involves in the analysis of the speech signal and there are two types of techniques: 1.Temporal Analysis: In this, the speech waveform itself is used for analysis. 2. Spectral Analysis: In this, the speech signal is used for Analysis. Decoder: In this step, it decodes the speech and gives the output in the form of text. PROPOSED SYSTEM Our proposed system would be an application based on voice recognition and finger print censor which are used on car keys. Once the user parks his car and when tries to get back to the car in a big parking lot, the user can take his keys and unlock the speech recognition system by using the finger print censor. The finger print censor is being used for safety purpose so that no other person other than the owner can use the keys and control the car. The speech that is identified by giving some set of commands is being passed to car automatic navigation system and once the car gets the command the automatic

driving system starts and then the car reaches the user through the GPS navigation system and by using the system the user can control various option in car from a far distance such as switching on

6 driving system starts and then the car reaches the user through the GPS navigation system and by using the system the user can control various option in car from a far distance such as switching on the heater system and switching on the headlights. COMPARISION OF EXISTING SYSTEM WITH PROPOSED SYSTEM In this section, we will be comparing the existing system with the proposed system. In the existing system, the user can control various option by operating them in the car navigation system such as using of wipers and opening/closing of windows. Based on the input given to the car navigation system, the application displays the outcome of it. The design of the existing application allows the user to control the car with his car keys where the user can give instructions to the car using the voice commands and after the instruction is given to the car navigation system, the application will try to process the command suchas if the user is using a command named Come To My Location the automatic driving car will get to the location of the user using the GPS coordinates. Another existing system is that the keys of the car does not contain any safety feature to identify the owner of the car. By including the Finger Print Censor, we can identify that the owner of the car is using the keys because if the keys are lost anyone could control the car and by using this safety feature we can use the voice recognition system without any problem. The idea to use voice recognition system along with finger print censor could help the owner of the car because the user need not search for his car in big parking places and he could just use a voice command and then the car would be coming to him.

7 CHANLLENGES Accuracy: Sometimes the input that is the audio may not be clear and it could impact the recognition accuracy. Talk-over problem: It occurs when two or people speak at the same time and it may also impact the recognition accuracy. User Responsiveness: The user may issue a command before the start of the voice recognition device. So the user must be careful and start issuing a command once the device is ready. Performance: Sometimes the device may not understand the user s grammar and it could misunderstand in a different way. Reliability: It is the ability of the system to operate accurately over a long period of time. CONCLUSION: The proposed system intends to develop a method and model that helps users to communicate with the vehicle in a better way. The proposed system is cost-effective and it works effectively. It is implemented by using voice recognition and finger print censor. This application would be more interactive with the user, as the ultimate aim of this application is to improve communication with the vehicle and with the help of finger print censor, the protection of the vehicle can also be increased.

8 REFERENCES: [1] C. Nikias and J. Mendel, Signal processing with higher-order spectra, Signal Processing Magazine, vol. 10, no. 3, pp , [2] Z. Koldovsky, J. M ` alek, J. Nouza, and M. Balık, Chime data separation based on target signal cancellation and noise masking, Proc. CHiME, [3] K. Wilson, B. Raj, P. Smaragdis, and A. Divakaran, Speech denoising using nonnegative matrix factorization with priors, in Proc. ICASSP, [4] F. Weninger, M. Wollmer, J. Geiger, B. Schuller, J. Gemmeke, A. Hurmalainen, T. Virtanen, and G. Rigoll, Non-negative matrix factorization for highly noiserobust asr: To enhance or to recognize? [5] G. E. Hinton, S. Osindero, and Y. W. Teh, A fast learning algorithm for deep belief nets [6] Mark JF Gales, Maximum likelihood linear transformations for hmm-based speech recognition, Computer speech & language [7] Spyros Matsoukas, Rich Schwartz, Hubert Jin, and Long Nguyen, Practical implementations of speakeradaptive training [8]Michael L Seltzer, Dong Yu, and Yongqiang Wang, An investigation of deep neural networks for noise robust speech recognition [9] Arun Narayanan and DeLiang Wang, Joint noise adaptive training for robust automatic speech recognition [10] T. T., Hewett, R., Baecker, S., Card and et. al., Report of the ACM Special Interest Group (SIG) on Computer-Human Interaction Curriculum Development Group: Curricula for Human-Computer Interaction [11] A.J., Dix, J., Finlay, G., Abowd, and R., Beale, Human-Computer Interaction. [12] J.P., Martens, Continuous Speech Recognition over the Telephone. Electronics and Information Systems, Ghent University, Belgium [13] M., Forsberg, Why is Speech Recognition Difficult?. [14] D., Jurafsky and J.H., Martin, Speech and Language Processing an Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition.

SPEECH FEATURE DENOISING AND DEREVERBERATION VIA DEEP AUTOENCODERS FOR NOISY REVERBERANT SPEECH RECOGNITION. Xue Feng, Yaodong Zhang, James Glass

2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) SPEECH FEATURE DENOISING AND DEREVERBERATION VIA DEEP AUTOENCODERS FOR NOISY REVERBERANT SPEECH RECOGNITION Xue Feng,