Human Computer Interaction Using Speech Recognition Technology

Size: px

Start display at page:

Download "Human Computer Interaction Using Speech Recognition Technology"

Shauna Randall
6 years ago
Views:

1 International Bulletin of Mathematical Research Volume 2, Issue 1, March 2015 Pages , ISSN: Human Computer Interaction Using Recognition Technology Madhu Joshi 1 and Saurabh Ranjan Srivastava 2 Department of Computer Engineering, SKIT, Jaipur madhujoshi896@gmail.com, srs@skit.ac.in Abstract recognition technology is a kind of technology that provides a communication between a man and machine. In this paper we describe the speech recognition technology. Here we implement the comparing of input string with the inbuilt dictionary. Here we provide the security feature by asking random questions when we login. We create a database dictionary through which user interact with machine. Here we create Input database and Output database. The Input database contains the various interfaces of machine. When user provides input to the machine by speaking then system will perform matching of this string with the system dictionary. When the string matches then machine recognizes the input string and it performs the output task. Here the user interacts with machine by asking various queries or performing various tasks. Here the machine recognizes the users voice and gives output to user. 1 INTRODUCTION recognition technology is a kind of technology that allows a machine to identify the words that a person speaks into a microphone and convert it to the written text. recognition is thus sometimes referred to as speech-to-text [1]. recognition allows us to provide input to an application with our voice. Just like clicking on mouse, typing on keyboard, or pressing a key on the phone keypad provides input to an application, speech recognition allows us to provide input by talking. In the desktop world, we need a microphone to be able to do this. The speech recognition process is performed by a software component known as the speech recognition engine [2]. The main goal of the speech recognition engine is to process the spoken input and translate it into the text that an application understands. In this paper we describe the main work of the speech recognition engine. Here firstly the user log-in in which the machine ask various random questions. After that when the speech recognition engine recognizes the input then the application can interpret the result of the recognition as a command. This application is a command and control application. Grammar(s) Audio Input Recognition Engine Recognized Text Acoustic model Figure 1.1 The basic working principle of speech recognition engine 2 DATA MODEL The basic data model of internal processing of speech recognition engine is defined by a graphical representation. It can be shown as Received: February 20, 2015 Keywords: component; formatting; recognition; API; User voice; Technology application

2 232 Madhu Joshi, Mr. Saurabh Ranjan Srivastava Signal Processing Unit Comparison Unit Models Word Sequence Search Language Models Recognized Word Sequence Perform output Command Figure 2.1 The data modules of our speech recognition application The voice or speech is firstly processed by the signal processing module. The speech processing module translated the speech waveform into a speech pattern representation. The speech pattern consists of a sequence of feature vectors. The speech pattern is compared with the reference pattern that is stored with class identities. Here when the speech pattern matched then the input voice is recognized. And the respective command is executed and the application performs the output command. The output may be a external interface or response by speaking. The whole processing of our application can be define as Analog to Digital Converter (Sound Card) Recognition Engine User defined Dictionary Security For login procedure User take Action Action Performed by the Machine Add Database External Interface Speaking Figure 2.2 The whole execution procedure of our speech recognition application Here, the voice is converted into digital signals through the analog to digital converter. Firstly the project loaded then the user voice is converted into digital signal which passes through the speech recognition engine. Then the speech recognition engine recognized the voice and the user logins by answering the various random questions. And then the user performs the task by taking action or speaking commands. Now the machine gives the response according to the users command.

3 Human Computer Interaction Using Recognition Technology 233 Here, the user take action by speaking the commands. There are various commands present here like open notepad, open command prompt, open Google and the many more commands are present here. When the user say open command prompt then the voice of the user is recognized by the speech recognition engine and the command prompt will open easily and interactively [2]. Here we create a large recognition vocabulary for interacting the user to machine. We define a user dictionary in which all the commands are present and the user can interact with machine through these commands. Here, we define mathematical expression for speech to text conversion which is performed in our implementation. It is assumed that each utterance consists of a sequence of linguistically meaningful and structured words, and our main goal is convert the spoken signal into the word sequence as accurately as possible [8]. The output of the utterances depends on recognized sequence of words. This task is sometimes known as the speech to text conversion. The following approach shows the word decoding task. Word decoding formulation basically depends on the Bayes decision theory [8]. Here we define this theorm based on our architecture as follows: Let Q = (,,, ) be a set of observations and W = (,, ) be a sequence of words Here, the observation Q is the realization of the sequence of words W where each ε V in the defined dictionary [8]. Here we determine W R that defines the recognized phrase. For finding out W R, speech recognizer implements the maximum a posteriori rule as: W R = max (W Q) By using bayes theorem it can be defined as W R = max (Q W) (W)/ (Q) Here, these (Q W) (W) key quantities take decision for recognizing the word sequence. These parameters are the decision making parameters. Here, the key (Q) is not involved in optimization process [8]. The first benefit is that speech offers a way of issuing commands while allowing hands and eyes to remain free. Operations normally carried out through the direct manipulation modality such as open Calculator, open WordPad etc. Thus multiple actions can be simultaneously carried out. This is particularly useful in cases when hands/eyes are already busy, but other tasks need to be dealt with from time to time; for example, when direct manipulation is used to drive a car[1], speech can be used to control the radio, car phone, and other on-board systems. The second benefit is that users can refer to objects which are not present in their current view of the virtual world; in a direct manipulation interface, actions can only be applied to objects which are visually present. The most observable benefit of speech is naturalness, or more precisely, familiarity. Users are familiar with using English language to act in the world. A central issue in developing a speech interface to virtual worlds is the nature of the relationship between the system and user. Here we define the speech interface through which a ser and system interact to each other very interactively [6]. Performance Most speech recognition engines try very hard to find a match and are usually very forgiving. But it is very important to note that the engine is always returning its best guess for what was said. Performance of our speech recognition engine application can be defined as per the graph representation as: Fig 2.3 Performance evaluation of our application

234 Madhu Joshi, Mr. Saurabh Ranjan Srivastava Here the performance graph shows recognition speed or fluency of recognition increases with increases the size of the dictionary.

There is a trade-off between coverage and accuracy in speech recognition systems: the larger the user vocabulary and grammar, the greater the potential for recognition errors.

4 234 Madhu Joshi, Mr. Saurabh Ranjan Srivastava Here the performance graph shows recognition speed or fluency of recognition increases with increases the size of the dictionary. In our application the whole string is recognized in some seconds. There is a trade-off between coverage and accuracy in speech recognition systems: the larger the user vocabulary and grammar, the greater the potential for recognition errors. API In our application, we use the speech application programming interface. The version of speech API which is used by our application is speech API IMPLEMENTATION We implement the human computer interaction using the speech recognition engine. Here the user interact with the machine and can perform the various commands. We show here some snapshot which describes the implementation part of our work. The user login is shown here. The user can login by speaking their name. Here, After the login process the user can interact with machine through the various commands. By using these commands the user can perform various tasks. Here In the above snapshot one command is executed by the user. The user opens calculator by speaking to computer. In the same way user can interact with the machine through various commands by their voice recognition. 4 CONCLUSION We have concluded that by using our project every person can interact with machine through their voice. We have described the human computer interaction application using speech recognition engine. Here we provide the various commands to user for interacting with the machine. The user can perform various tasks through these commands by speaking. ACKNOWLEDGMENT I am very thankful to my co-author who help me for researching on this project.

5 Human Computer Interaction Using Recognition Technology 235 REFERENCES [1] Youhao Yu (2012)_Research on Recognition Technology and Its Application International Conference on Computer Science and Electronics Engineering. [2] Jianliang Meng, Junwei Zhang,Haoquan Zhao (2012) overview of the Recognition Technology, Fourth International Conference on Computational and Information Sciences. [3] Mohammad A. M. Abu Shariah, Raja N. Ainon1, Roziati Zainuddin, Othman O. Khalifa (2007), Human Computer Interaction Using Isolated-Words Recognition Technology, International Conference on Intelligent and Advanced Systems. [4] Kazuyo Tanaka (1998), Next Major Application Systems and key Techniques in Recognition Technology, /9. [5] K. H. Davis, R. Biddulph, and S. Balashek, (1952), Automatic Recognition of Spoken Digits, J. Acoust. Soc. Amer., 24. 6, [6] Rabiner L R, Juang B H (1993), Fundamentals of Recognition, Englewood Cliffs: Prentice Hall. [7] International Workshop on Robot and Human Interactive Communication, IEEE Press, Sept [8] Biing-Hwang Juang And Sadaoki Furui Automatic Recognition and Understanding of Spoken Language A First Step Toward Natural Human Machine Communication, IEEE 2000

Speech Recognizing Robotic Arm for Writing Process

Speech Recognizing Robotic Arm for Writing Process 1 Dhanshri R. Pange, 2 Dr. Anil R. Karwankar 1 M. E. Electronics Student, 2 Professor, Department of Electronics and Telecommunication Govt. Engineering