EVENT VERIFICATION THROUGH VOICE PROCESS USING ANDROID. Kodela Divya* 1, J.Pratibha 2

Size: px

Start display at page:

Download "EVENT VERIFICATION THROUGH VOICE PROCESS USING ANDROID. Kodela Divya* 1, J.Pratibha 2"

Candace Sharp
6 years ago
Views:

ISSN 2277-2685 IJESR/May 2015/ Vol-5/Issue-5/179-183 Kodela Divya et. al.

1 ISSN IJESR/May 2015/ Vol-5/Issue-5/ Kodela Divya et. al./ International Journal of Engineering & Science Research EVENT VERIFICATION THROUGH VOICE PROCESS USING ANDROID ABSTRACT Kodela Divya* 1, J.Pratibha 2 1 M.Tech Scholar, Vignan University, Vadlamudi, Guntur (A.P), India. 2 Asst. Prof, Vignan University, Vadlamudi, Guntur (A.P), India. In this paper, we present the problem of mixed sound event verification in a wireless sensor network for home automation systems. In home automation systems, the sound recognized by the system becomes the basis for performing certain tasks. However, if a target source is mixed with another sound due to simultaneous occurrence, the system would generate poor recognition results, subsequently leading to inappropriate responses. Home automation needs to be simpler and easy to use and yet cost effective to be widely acceptable. Wireless home automation today needs to use the latest technology advances in order to be user friendly and powerful. There has been a lot of work done already in this area. In this project, current technology components will be used and home automation will be implemented using the communication technologies like Internet and speech recognition. The project will research and evaluate different user interface possibilities for home automation for mobile devices. The automation centers on using relatively cheap wireless communication modules. The intended home automation system will control the lights and electrical appliances in home using voice commands. Keywords: WSN, VAD, VHD. INTRODUCTION In this project, we present the problem of mixed sound event verification in a wireless sensor network for home automation systems. In home automation systems, the sound recognized by the system becomes the basis for performing certain tasks. However, if a target source is mixed with another sound due to simultaneous, the system would generate poor recognition results, subsequently leading to inappropriate responses. To handle such problems, this study proposes a framework, which consists of sound separation and sound verification techniques based on a wireless sensor network (WSN), to realize sound-triggered automation. In the sound separation phase, we present a convolutive blind source separation system with source number estimation using time-frequency clustering. An accurate mixing matrix can be estimated by the proposed phase compensation technique and used for reconstructing the separated sound sources. In the verification phase, Mel frequency cepstral coefficients and Fisher scores that are derived from the wavelet packet decomposition of signals are used as features for support vector machines. Finally, a sound of interest can be selected for triggering automated services according to the verification result. The experimental results demonstrate the robustness and feasibility of the proposed system for mixed sound verification in WSN-based home environments. EXISTING SYSTEM The existing system if a target source is mixed with another sound due to simultaneous Occurrence, the system would generate poor recognition results, subsequently leading to inappropriate responses. We consider the capturing and processing of sounds of interest that are mixed with other sounds. For example, the sound of doorbell ringing or door knocking captured by sensor nodes is usually mixed with sounds of human speech simultaneously occurring in the same environment Limitations The existing generates poor recognition results. They don t improve the sound verification performance significantly. *Corresponding Author 179

2 Less efficiency. PROPOSED SYSTEM Embodied Conversational Agents (ECAs) are animated virtual characters that emulate human behavior and communication. We consider the development of wireless enabled smart systems that can achieve seamless monitoring and control of localized devices or device networks with a Smart Phone, through secure two-way communications between the Smart Phone and the managed devices. ECA-based mobile applications rely on an external server that performs the processor intensive tasks, such as speech recognition, language understanding and text-to-speech. The proposed platform is based on free and open source libraries. We developed a prototype installed on a tablet for controlling a home automation system. Advantages Proposed system includes 6 factors Voice Activity Detector, Automatic Speech Recognition, Conversational Engine, Control Interface, Text-To-Speech, Virtual Head Animation. It designs mobile-based device monitoring and control, which can be applied in both fixed or moving LAN scenarios, such as vehicle electronics, power and energy systems, etc., ARCHITECTURE Voice Activity Detector Automatic Speech Virtual Head Animation Conversational Engine Text To Speech Control Interface LITERATURE SURVEY 1) Humor and Embodied onversational Agents This report surveys the role of humour in human-to-human interaction and the possible role of humor in human computer interaction. The aim is to see whether it is useful for embodied conversational agents to integrate humour capabilities in their internal model of intelligence, emotions and interaction (verbal and nonverbal)capabilities. A current state of the art of research in embodied conversational agents, affective computing and verbal and nonverbal interaction is presented. 2) An Intelligent TV interface based on Statistical Dialogue Management In this paper, we propose an intelligent TV interface using a voice-enable dialogue system. This paper rests on the both directions: a new type of dialogue management model and its use for practical systems to commercialize. We devise a practical dialogue management model based on statistical learning methods. To analyze discourse context, we utilize statistical learning techniques for anaphora resolution and discourse history Copyright 2015 Published by IJESR. All rights reserved 180

3 management. Contrary to the rule-based system, we develop an incremental learning method to construct dialogue strategies from the training corpus. 3) Grounded Language Modelling for Automatic Speech Recognition of Sports Video Michael Fleischman This paper describe show they are learned from large corpora of unlabeled video, and are applied to the task of automatic speech recognition of sports video. Results show that grounded language models improve perplexity and word errorate over text based language models, and further, support video information retrieval better than human generated speech transcriptions. 4) How was your day? An affective companion ECA prototype arc Cavazza This paper presents a dialogue system in the form of an ECA that acts as a sociable and emotionally intelligent companion for the user. The system dialogue is not task-driven but is social conversation in which the user talks about his/her day at the office. During conversations the system monitors the emotional state of the user and uses that information to inform its dialogue turns. The system is able to respond to spoken interruptions by the user. MODULES Voice Activity Detector: Automatic Speech Recognition Conversational Engine Control Interface Text-To-Speech Virtual Head Animation Voice Activity Detector: The Voice Activity Detector s (VAD) role is to discriminate the user s voice frames from those containing noise. This module reads the digitized audio samples acquired from a microphone and sends the filtered raw audio to the ASR. The actual implementation of the VAD module is based on the Sphinx Base library, which was modified so it can work with the OpenSL ES native audio libraries present on Android. libraries Micropho ne VAD ASR Filtered Raw audio Automatic Speech Recognition The Automatic Speech Recognition (ASR) module performs speech to text conversion. It takes as input the utterance with the user s speech that come from the VAD and sends the resultant text to the CE. In the proposed platform, the ASR module is based on the Pocket Sphinx speech recognition library. Copyright 2015 Published by IJESR. All rights reserved 181

4 speech recognition Text VAD ASR CE Conversational Engine The Conversational Engine (CE) extracts the meaning of the utterance, manages the dialog flow and produces the actions appropriate for the target domain. It generates a response based on the input, the current state of the conversation and the dialog history. It was also added support for an object-oriented database that can decrease the dynamic memory usage at the expense of an increment of the response time speech ASR CE CI Control Interface The Control Interface translates the commands said by then user to a format that can be understood by the target applications or services running on the same device or accessible remotely. This module is domain-specific and has to be reimplemented or adapted for every new target application. User commands CI Target Text-To-Speech The TTS module implementation is based on the espeak library. The Text-To-Speech (TTS) subsystem carries out the generation of the synthetic output voice from the text that comes as a response from the CE. it sends to the VHA module a list of the phonemes with their duration so animation and artificial speech match up. The TTS module implementation is based on the espeak library. Speak Synthetic Voice CE TTS VHA Copyright 2015 Published by IJESR. All rights reserved 182

5 Virtual Head Animation This module receives as inputs both the mood information from the CE and the list of the phonemes durations from the TTS module. By processing the inputs, it generates the visemes (the visual representation of the phonemes) and the facial expression that will be rendered along with the synthetic voice. CE Mood information VHA TTS Phonemes durations CONCLUSION The main goal of this work was to explain a platform aimed at developing ECA-based interfaces on hand-held devices equipped with golem. Thus, we have a tendency to planned a doable design and gave implementation details for such platform. The whole platform is predicated on free and open supply libraries and a primary example was developed for dominant a home automation system. The future work consists of to convey some experiments with real users to live the quality, usability and performance of the platform. REFERENCES [1] Louwerse MM, Graesser AC, McNamara DS, Lu S. Embodied conversational agents as conversational partners. Applied Cognitive Psychology 2009; 23(9): [2] De Carolis B, Mazzotta I, Novielli N, Pizzutilo S. Social robots and ECAs for accessing smart environments services. Proceedings of the International Conference on Advanced Visual Interfaces, ser. AVI 10. New York, NY, USA: ACM, 2010; [3] Cavazza M, Santos de la C amara R, Turunen M, Rela no Gil J, Hakulinen J, Crook N, Field D. How was your day? An Affective Companion ECA Prototype, in Proceedings of the SIGDIAL 2010 Conference, Association for Computational Linguistics. Tokyo, Japan: Association for Computational Linguistics, September 2010; [4] Oh HJ, Lee CH, Jang MG, Lee KY. An Intelligent TV interface based on Statistical Dialogue Management. Consumer Electronics, IEEE Transactions 2007; 53(4): [5] Santos-Perez M, Gonzalez-Parada E, Cano-Garcıa JM. Efficient Use of Voice Activity Detector and Automatic Speech Recognition in Embedded Platforms for Natural Language Interaction. Highlights in Practical Applications of Agents and Multiagent Systems, ser. Advances in Intelligent and Soft Computing. Springer Berlin Heidelberg, 2011; 89: [6] Santos-Perez M, Gonzalez-Parada E, Cano-Garcıa JM. Topic Dependent Language Model Switching for Embedded Automatic Speech Recognition. Ambient Intelligence - Software and Applications, ser. Advances in Intelligent and Soft Computing. Springer Berlin / Heidelberg, 2012; 153: [7] Santos-Perez M, Gonzalez-Parada E, Cano-Garcıa JM. Embedded Conversational Engine for Natural Language Interaction in Spanish. Proceedings of the Paralinguistic Information and its Integration in Spoken Dialogue Systems Workshop. Springer New York, 2011; Copyright 2015 Published by IJESR. All rights reserved 183

A personal digital assistant as an advanced remote control for audio/video equipment

A personal digital assistant as an advanced remote control for audio/video equipment John de Vet & Vincent Buil Philips Research Prof. Holstlaan 4 5656 AA Eindhoven The Netherlands Email: {devet, builv}@natlab.research.philips.com