Implementation of Speech Based Stress Level Monitoring System

Similar documents
Voice Command Based Computer Application Control Using MFCC

Authentication of Fingerprint Recognition Using Natural Language Processing

Voice & Speech Based Security System Using MATLAB

Text-Independent Speaker Identification

Aditi Upadhyay Research Scholar, Department of Electronics & Communication Engineering Jaipur National University, Jaipur, Rajasthan, India

Real Time Speaker Recognition System using MFCC and Vector Quantization Technique

RECOGNITION OF EMOTION FROM MARATHI SPEECH USING MFCC AND DWT ALGORITHMS

IJETST- Vol. 03 Issue 05 Pages May ISSN

Design of Feature Extraction Circuit for Speech Recognition Applications

STUDY OF SPEAKER RECOGNITION SYSTEMS

ACEEE Int. J. on Electrical and Power Engineering, Vol. 02, No. 02, August 2011

GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS. Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1)

Speech Based Voice Recognition System for Natural Language Processing

Implementing a Speech Recognition System on a GPU using CUDA. Presented by Omid Talakoub Astrid Yi

Analyzing Mel Frequency Cepstral Coefficient for Recognition of Isolated English Word using DTW Matching

Image Transformation Techniques Dr. Rajeev Srivastava Dept. of Computer Engineering, ITBHU, Varanasi

Neetha Das Prof. Andy Khong

Device Activation based on Voice Recognition using Mel Frequency Cepstral Coefficients (MFCC s) Algorithm

SPEAKER RECOGNITION. 1. Speech Signal

Environment Independent Speech Recognition System using MFCC (Mel-frequency cepstral coefficient)

Available online Journal of Scientific and Engineering Research, 2016, 3(4): Research Article

A Novel Template Matching Approach To Speaker-Independent Arabic Spoken Digit Recognition

Software/Hardware Co-Design of HMM Based Isolated Digit Recognition System

Chapter 3. Speech segmentation. 3.1 Preprocessing

Intelligent Hands Free Speech based SMS System on Android

2014, IJARCSSE All Rights Reserved Page 461

Input speech signal. Selected /Rejected. Pre-processing Feature extraction Matching algorithm. Database. Figure 1: Process flow in ASR

Secure E- Commerce Transaction using Noisy Password with Voiceprint and OTP

Processing and Recognition of Voice

Make Garfield (6 axis robot arm) Smart through the design and implementation of Voice Recognition and Control

Acoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing

Volume 2, Issue 9, September 2014 ISSN

Introduction to Massive Data Interpretation

A text-independent speaker verification model: A comparative analysis

ON THE PERFORMANCE OF SEGMENT AVERAGING OF DISCRETE COSINE TRANSFORM COEFFICIENTS ON MUSICAL INSTRUMENTS TONE RECOGNITION

DETECTING INDOOR SOUND EVENTS

MP3 Speech and Speaker Recognition with Nearest Neighbor. ECE417 Multimedia Signal Processing Fall 2017

Machine Perception of Music & Audio. Topic 10: Classification

Cepstral Analysis Tools for Percussive Timbre Identification

Dietrich Paulus Joachim Hornegger. Pattern Recognition of Images and Speech in C++

Emotion recognition using Speech Signal: A Review

A NEURAL NETWORK APPLICATION FOR A COMPUTER ACCESS SECURITY SYSTEM: KEYSTROKE DYNAMICS VERSUS VOICE PATTERNS

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Pitch Prediction from Mel-frequency Cepstral Coefficients Using Sparse Spectrum Recovery

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION

NON-UNIFORM SPEAKER NORMALIZATION USING FREQUENCY-DEPENDENT SCALING FUNCTION

MATLAB Apps for Teaching Digital Speech Processing

Separation of speech mixture using time-frequency masking implemented on a DSP

Complex Identification Decision Based on Several Independent Speaker Recognition Methods. Ilya Oparin Speech Technology Center

AUDIO. Henning Schulzrinne Dept. of Computer Science Columbia University Spring 2015

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

SPEECH WATERMARKING USING DISCRETE WAVELET TRANSFORM, DISCRETE COSINE TRANSFORM AND SINGULAR VALUE DECOMPOSITION

Fatima Michael College of Engineering & Technology

Approach to Increase Accuracy of Multimodal Biometric System for Feature Level Fusion

K-modes Clustering Algorithm for Categorical Data

Keywords- Classification algorithm, Hypertensive, K Nearest Neighbor, Naive Bayesian, Data normalization

A Brief Overview of Audio Information Retrieval. Unjung Nam CCRMA Stanford University

Incremental K-means Clustering Algorithms: A Review

Fingerprint Based Gender Classification Using Block-Based DCT

Judul: EleectroLarynx, Esopahgus, and Normal Speech Classsification using Grradient Disceent, Gradient discent with momentum and learningg rate, and

Mahdi Amiri. February Sharif University of Technology

Speech Recognition on DSP: Algorithm Optimization and Performance Analysis

AUDIO COMPRESSION USING WAVELET TRANSFORM

Classification and Vowel Recognition

Multimedia Systems Speech II Hmid R. Rabiee Mahdi Amiri February 2015 Sharif University of Technology

Figure (5) Kohonen Self-Organized Map

Principles of Audio Coding

Self-Organizing Maps for Analysis of Expandable Polystyrene Batch Process

Analyzing Vocal Patterns to Determine Emotion Maisy Wieman, Andy Sun

Detection of goal event in soccer videos

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

SPEECH FEATURE EXTRACTION USING WEIGHTED HIGHER-ORDER LOCAL AUTO-CORRELATION

Comparing MFCC and MPEG-7 Audio Features for Feature Extraction, Maximum Likelihood HMM and Entropic Prior HMM for Sports Audio Classification

A Texture Feature Extraction Technique Using 2D-DFT and Hamming Distance

Keywords Wavelet decomposition, SIFT, Unibiometrics, Multibiometrics, Histogram Equalization.

: A MATLAB TOOL FOR SPEECH PROCESSING, ANALYSIS AND RECOGNITION: SAR-LAB

2.4 Audio Compression

1 Introduction. 3 Data Preprocessing. 2 Literature Review

Source Coding Basics and Speech Coding. Yao Wang Polytechnic University, Brooklyn, NY11201

New Approach for K-mean and K-medoids Algorithm

Keywords:- Fingerprint Identification, Hong s Enhancement, Euclidian Distance, Artificial Neural Network, Segmentation, Enhancement.

Introduction to Artificial Intelligence

Audio-Visual Speech Activity Detection

Fully Automatic Methodology for Human Action Recognition Incorporating Dynamic Information

Multimedia Database Systems. Retrieval by Content

Available online at ScienceDirect. Procedia Computer Science 89 (2016 )

Source Coding Techniques

Tactile Sensor System Processing Based On K-means Clustering

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Biometric Security System Using Palm print

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Computer Aided Diagnosis Based on Medical Image Processing and Artificial Intelligence Methods

CT516 Advanced Digital Communications Lecture 7: Speech Encoder

ISSN: (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

Seismic regionalization based on an artificial neural network

Classifying Building Energy Consumption Behavior Using an Ensemble of Machine Learning Methods

Authentication and Secret Message Transmission Technique Using Discrete Fourier Transformation

Music Genre Classification

Agenda for supervisor meeting the 22th of March 2011

Transcription:

4 th International Conference on Computing, Communication and Sensor Network, CCSN2015 Implementation of Speech Based Stress Level Monitoring System V.Naveen Kumar 1,Dr.Y.Padma sai 2, K.Sonali Swaroop 3 Department Of Electronics And Communication Engineering 1,2,3 VNRVJIET Hyderabad,India naveenkumar_v@vnrvjiet.in 1 ABSTRACT A human voice has many features of which pathological voice is one of the features. A pathological voice is the presence of abnormality in the speech. The variations in these pathological voices differentiate hypertension and hypotension. The speech sample with higher blood pressure has higher variation in the frequency of the speech and viceversa. The hypertension (high blood pressure) is a state where the heart pumps the blood at a higher rate through the arteries which can increase the accumulation of fatty plaques and increase the risk of heart attack. Hypotension (low blood pressure) on the other side can drive a person to unconscious state. The paper proposes a model for non-contact method of stress level indication of a person. The feature extraction includes calculation of MFCCs. The kmeans algorithm has been used for clustering the data set and for codebook generation. An overall accuracy of 83% is achieved through the model developed by us. Keywords: Hypertension, Hypotension, Mel-frequencyCepstrum coefficients, kmeans algorithm, pathological voices. I.INTRODUCTION V OICE production is the result of interactions of body parts that resonate. When a person breathes in, the airs through the vocal tract enters the vocal cords and vibrate them at different frequencies during expiration [8]. All parts of the body play specific roles in voice production and maybe responsible for the variations of the speech. The human voice can be affected by various parameters. The pressure exerted by the blood on the walls of blood vessels will also affect the voice. The larynx consists of muscles that are surrounded by the blood vessels. The amount of pressure exerted on the walls of the blood vessels will have a finite effect on the muscles. These are known as micro muscle tremors (MMT), which in turn produce some sort of variations in the speech [4]. The main idea behind this paper is to distinguish the pathological voices from the normal voices based on the Mel frequency Cepstrum coefficient feature vectors. Further the paper focuses on how to detect stress level by classifying the pathological voice. With the increase in the demand for non-contact based monitoring tools our research helps in designing a simple tool which can be used to detect the stress level by recording the speech from the patient using a microphone and may be economical. Hence this paper tries to classify these levels and indicate the state of a person. Speech analysis offers unique advantages which include user convenience that makes it an efficient technology to develop. Human voice has many features like jitter, shimmer, noise to harmonic ratio (N/H), autocorrelation (A/C), Mel frequency Cepstrum coefficients (MFCC) etc. The speech for each stress level differs in many parameters such as frequency, MFCCs and this variation can be clearly differentiated. For instance considering the sample with the high stress level has higher variation in the frequency compared to the normal speech level. The difference is shown in figure 1. Fig 1 (a) After the features are extracted they are classified using different algorithms available such as dynamic time wrapping (DTW), hidden markov model (HMM), artificial neural network (ANN) and kmeans classification [1]. Hence our paper uses Mel

frequency cepstrum coefficients for feature extraction and kmeans algorithm for classification. The paper is organized in the following format. The previous research work is described in section II, voice database of healthy and pathological voices that are used for classification is explained in section III.Section IV includes the methodology of feature extraction techniques and Kmeans algorithm and section V describes the results and followed by conclusion in section VI. which can be used for monitoring the minute variations in the speech samples and our study indicates that MFCC does help in distinguishing the stress level. III.DATA COLLECTION The voice recordings were collected from people of different age groups varying from 30-65 years suffering from high blood pressure and also low blood pressure. The subjects were free to choose the language. Each subject spoke for duration of 20 seconds. The recordings were made using a mobile phone application easy voice recorder available with Google play store. The samples were recorded at a frequency of 16 khz. The voice recordings from the normal people were collected and analyzed. IV.METHODOLOGY The complete process of classification system is depicted in Figure 2. The entire process is divided into training and testing phase. Fig 1 (b) Fig 2: Block diagram of the system In the training phase, the input sample is read into the system buffer and the features extracted are Mel frequency Cepstrum coefficients (MFCCs). Fig 1 (c) Fig 1: The variations in the frequency of speech sample having a) low stress b) high stress and c) normal speech II.RELATED WORK The previous research work successfully demonstrated the usefulness of some basic features in classifying normal and high blood pressure patients. The features that were used are harmonic to noise ratio, mean pitch, shimmer and jitter. Kmeans classifier gives efficiency of 79% in classifying the two categories. MFCC features for classification is one such feature A. Mel frequency cepstrum coefficients Mel Frequency Cepstral Coefficients (MFCCs) are a feature widely used in speech processing applications. The MFCCs are proved more efficient. The calculation of the MFCC includes the following steps. 1. Windowing A speech signal changes continuously, so to simplify things an assumption is made that on short time scales the audio signal does not change much. The input signal is thus divided into frames of 20-40 ms duration [3]. If the frame is much shorter, enough samples will not be available to get a reliable spectral estimate, if it is longer the signal changes too much throughout the frame.

The speech sample is multiplied with the hamming window represented by the equation (1). W(n) = 0.54 0.46 cos(2π(n N)) (1) Windowing FFT Mel-scale frequency wrapping Where n represents magnitude of the sample at that instant and ranges from 0 to N N represents width of the frame in terms of the samples 2. Discrete Fourier transforms (DFT) The speech signal is analysed in frequency domain more accurately and therefore the signal is converted into frequency domain using cosine transforms. Considering the mathematical computations involved in DFT a similar approach know as Fast Fourier transforms are used which is given by the equation (2). Equation (3) is called as twiddle factor. 3. Mel scale wrapping N 1 X(k) = x(n)w N nk n=0 W N = e j2π N MFCCs are generally a set of coefficient which represents a particular cosine transformation of the real logarithmic frequency onto Mel-frequency scale. The approximate formula to compute the Mel s for a given frequency f in Hz is given by equation (4) mel(f) = 2595 log10(1 + {f 700}) (4) Where f is the frequency to be approximated to the Mel scale (2) (3) MFCC feature vector DCT Fig 4: MFCCs feature extraction flow graph After the feature vectors are extracted for input speech sample the mfcc feature vectors are subjected to kmeans clustering and codebook generation using vector quantization. B. Classification Using Kmeans Algorithm Log K-means is one of the simplest unsupervised learning algorithms [8]. The procedure follows a simple and easy way to classify a given data set through a certain number of predefined clusters k. The first step is to define k centroids The next step is to take each point belonging to a given data set and associate it to the nearest centroid. When no point is pending, the first step is completed and an early clustering is done. At this point we need to re-calculate k new centroids of the clusters resulting from the previous step. After these k new centroids, a new binding has to be established between the same data set points and the nearest new centroid. This is continued until no more changes are done in the allocation of the data points i.e., the centroids do not change. In this experimentation, the number of centroids used are K=3. The euclidean distance is calculated using the formula given in equation (5). Fig 3: Mel filter bank 4. Discrete cosine transform(dct) The tranformed signal has to be reconstructed into the original speech sample. This reconstruction is possible using DCT. The MFCC process is decribed in the figure 4. d = x y = x i y i 2 n i=1 (5)

Start In training phase, the MFCCs feature vectors were extracted with 39 parameters which include 13 MFC coefficients, 13 delta coefficients and 13 delta-delta coefficients. Initialize the number of clusters Randomly select initial centers as the centroids of the K clusters Generate a new partition by assigning each data point to the closest cluster Calculate centroids for the new group Fig 7: Cepstrum represntation of the input speech samples Yes New group? End No The features thus extracted are shown in figure 7. The subplot 1 shows speech waveform.the subplot 2 shows the cepstrum representation of the speech sample that indicates normal stress level.the subplot 3,4 represents the cepstrum of the speech samples with low and high stress levels respectively. The variation in these cepstrums can be seen clearly.the feature vectors thus obtained were clustered using kmeans algorithm. V.SIMULATION RESULTS Fig 5: kmeans algorithm A monitoring system which uses speech as input for indicating stress level of the person has been developed. The system uses MATLAB environment for analysing the speech samples. Figure 6 shows the simulation results for stress level monitoring system. In testing phase, the test sample is applied to the system. The feature vector is calculated for the input test sample. The minimum distance between the codebook generated in the training phase and in the testing phase will result in the stress level of the input speech sample. Figure 9 shows the simulation results of the speech sample which indicates the high stress level. Fig 6: MATLAB simulation of the stress level classification. Fig 8: shows the kmeans clustering of the training data and the test sample.

[8] Saloni, R. K. Sharma And Anil K. Gupta, 2014,Classification of High Blood Pressure Persons Vs Normal Blood Pressure Persons Using Voice Analysis, IIJSP, 1, 47-52 Fig 9: Simulation results showing the test sample identified as high stress VI.CONCLUSION The paper brings out an approach to identify the stress level of a person using speech. The MFCC feature vectors are calculated and the vectors are clustered using kmeans algorithm. An accuracy of 83% was obtained. Furthermore different algorithms can be used to increase the accuracy of the system. REFERENCES [1] A. A. Khulage, Prof. B. V. Pathak, Analysis of speech under stress using linear techniques and non- linear techniques for emotion recognition system. [2] Abdelwadood Mesleh1, Dmitriy Skopin1, Sergey Baglikov2, And Anas Quteishat, nov. 2012 Heart Rate Extraction from Vowel Speech Signal, Journal of computer Science and technology 27(6): 1243{1251. [3] Bageshree V. Sathe-Pathak, Ashish R. Panat, July 2012 Extraction of Pitch and Formants and its Analysis to Identify 3 different emotional states of a person, IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 4, No 1. [4] Clifford S. Hopkins,Roy J. Ratley,Daniel S. Benincasa,John J. Grieco, 2005 Evaluation of Voice Stress Analysis Technology, 38th Hawaii International Conference on System Sciences. [5] Li Dong, august 2011 Time series analysis of jitter in sustained vowels, ICPHS XVII regular session hong Kong, 17-21 [6] Mireia Farrús, Javier Hernando, Pascual Ejarque,Jitter and Shimmer Measurements for Speaker Recognition. [7] Guojun Zhou,John H. L. Hansen, James F. Kaiser, march 2001,Nonlinear Feature Based Classification of Speech Under Stress ieee transactions on speech and audio processing, vol. 9,no. 3.