Implementation of Speech Based Stress Level Monitoring System

Size: px
Start display at page:

Download "Implementation of Speech Based Stress Level Monitoring System"

Transcription

1 4 th International Conference on Computing, Communication and Sensor Network, CCSN2015 Implementation of Speech Based Stress Level Monitoring System V.Naveen Kumar 1,Dr.Y.Padma sai 2, K.Sonali Swaroop 3 Department Of Electronics And Communication Engineering 1,2,3 VNRVJIET Hyderabad,India naveenkumar_v@vnrvjiet.in 1 ABSTRACT A human voice has many features of which pathological voice is one of the features. A pathological voice is the presence of abnormality in the speech. The variations in these pathological voices differentiate hypertension and hypotension. The speech sample with higher blood pressure has higher variation in the frequency of the speech and viceversa. The hypertension (high blood pressure) is a state where the heart pumps the blood at a higher rate through the arteries which can increase the accumulation of fatty plaques and increase the risk of heart attack. Hypotension (low blood pressure) on the other side can drive a person to unconscious state. The paper proposes a model for non-contact method of stress level indication of a person. The feature extraction includes calculation of MFCCs. The kmeans algorithm has been used for clustering the data set and for codebook generation. An overall accuracy of 83% is achieved through the model developed by us. Keywords: Hypertension, Hypotension, Mel-frequencyCepstrum coefficients, kmeans algorithm, pathological voices. I.INTRODUCTION V OICE production is the result of interactions of body parts that resonate. When a person breathes in, the airs through the vocal tract enters the vocal cords and vibrate them at different frequencies during expiration [8]. All parts of the body play specific roles in voice production and maybe responsible for the variations of the speech. The human voice can be affected by various parameters. The pressure exerted by the blood on the walls of blood vessels will also affect the voice. The larynx consists of muscles that are surrounded by the blood vessels. The amount of pressure exerted on the walls of the blood vessels will have a finite effect on the muscles. These are known as micro muscle tremors (MMT), which in turn produce some sort of variations in the speech [4]. The main idea behind this paper is to distinguish the pathological voices from the normal voices based on the Mel frequency Cepstrum coefficient feature vectors. Further the paper focuses on how to detect stress level by classifying the pathological voice. With the increase in the demand for non-contact based monitoring tools our research helps in designing a simple tool which can be used to detect the stress level by recording the speech from the patient using a microphone and may be economical. Hence this paper tries to classify these levels and indicate the state of a person. Speech analysis offers unique advantages which include user convenience that makes it an efficient technology to develop. Human voice has many features like jitter, shimmer, noise to harmonic ratio (N/H), autocorrelation (A/C), Mel frequency Cepstrum coefficients (MFCC) etc. The speech for each stress level differs in many parameters such as frequency, MFCCs and this variation can be clearly differentiated. For instance considering the sample with the high stress level has higher variation in the frequency compared to the normal speech level. The difference is shown in figure 1. Fig 1 (a) After the features are extracted they are classified using different algorithms available such as dynamic time wrapping (DTW), hidden markov model (HMM), artificial neural network (ANN) and kmeans classification [1]. Hence our paper uses Mel

2 frequency cepstrum coefficients for feature extraction and kmeans algorithm for classification. The paper is organized in the following format. The previous research work is described in section II, voice database of healthy and pathological voices that are used for classification is explained in section III.Section IV includes the methodology of feature extraction techniques and Kmeans algorithm and section V describes the results and followed by conclusion in section VI. which can be used for monitoring the minute variations in the speech samples and our study indicates that MFCC does help in distinguishing the stress level. III.DATA COLLECTION The voice recordings were collected from people of different age groups varying from years suffering from high blood pressure and also low blood pressure. The subjects were free to choose the language. Each subject spoke for duration of 20 seconds. The recordings were made using a mobile phone application easy voice recorder available with Google play store. The samples were recorded at a frequency of 16 khz. The voice recordings from the normal people were collected and analyzed. IV.METHODOLOGY The complete process of classification system is depicted in Figure 2. The entire process is divided into training and testing phase. Fig 1 (b) Fig 2: Block diagram of the system In the training phase, the input sample is read into the system buffer and the features extracted are Mel frequency Cepstrum coefficients (MFCCs). Fig 1 (c) Fig 1: The variations in the frequency of speech sample having a) low stress b) high stress and c) normal speech II.RELATED WORK The previous research work successfully demonstrated the usefulness of some basic features in classifying normal and high blood pressure patients. The features that were used are harmonic to noise ratio, mean pitch, shimmer and jitter. Kmeans classifier gives efficiency of 79% in classifying the two categories. MFCC features for classification is one such feature A. Mel frequency cepstrum coefficients Mel Frequency Cepstral Coefficients (MFCCs) are a feature widely used in speech processing applications. The MFCCs are proved more efficient. The calculation of the MFCC includes the following steps. 1. Windowing A speech signal changes continuously, so to simplify things an assumption is made that on short time scales the audio signal does not change much. The input signal is thus divided into frames of ms duration [3]. If the frame is much shorter, enough samples will not be available to get a reliable spectral estimate, if it is longer the signal changes too much throughout the frame.

3 The speech sample is multiplied with the hamming window represented by the equation (1). W(n) = cos(2π(n N)) (1) Windowing FFT Mel-scale frequency wrapping Where n represents magnitude of the sample at that instant and ranges from 0 to N N represents width of the frame in terms of the samples 2. Discrete Fourier transforms (DFT) The speech signal is analysed in frequency domain more accurately and therefore the signal is converted into frequency domain using cosine transforms. Considering the mathematical computations involved in DFT a similar approach know as Fast Fourier transforms are used which is given by the equation (2). Equation (3) is called as twiddle factor. 3. Mel scale wrapping N 1 X(k) = x(n)w N nk n=0 W N = e j2π N MFCCs are generally a set of coefficient which represents a particular cosine transformation of the real logarithmic frequency onto Mel-frequency scale. The approximate formula to compute the Mel s for a given frequency f in Hz is given by equation (4) mel(f) = 2595 log10(1 + {f 700}) (4) Where f is the frequency to be approximated to the Mel scale (2) (3) MFCC feature vector DCT Fig 4: MFCCs feature extraction flow graph After the feature vectors are extracted for input speech sample the mfcc feature vectors are subjected to kmeans clustering and codebook generation using vector quantization. B. Classification Using Kmeans Algorithm Log K-means is one of the simplest unsupervised learning algorithms [8]. The procedure follows a simple and easy way to classify a given data set through a certain number of predefined clusters k. The first step is to define k centroids The next step is to take each point belonging to a given data set and associate it to the nearest centroid. When no point is pending, the first step is completed and an early clustering is done. At this point we need to re-calculate k new centroids of the clusters resulting from the previous step. After these k new centroids, a new binding has to be established between the same data set points and the nearest new centroid. This is continued until no more changes are done in the allocation of the data points i.e., the centroids do not change. In this experimentation, the number of centroids used are K=3. The euclidean distance is calculated using the formula given in equation (5). Fig 3: Mel filter bank 4. Discrete cosine transform(dct) The tranformed signal has to be reconstructed into the original speech sample. This reconstruction is possible using DCT. The MFCC process is decribed in the figure 4. d = x y = x i y i 2 n i=1 (5)

4 Start In training phase, the MFCCs feature vectors were extracted with 39 parameters which include 13 MFC coefficients, 13 delta coefficients and 13 delta-delta coefficients. Initialize the number of clusters Randomly select initial centers as the centroids of the K clusters Generate a new partition by assigning each data point to the closest cluster Calculate centroids for the new group Fig 7: Cepstrum represntation of the input speech samples Yes New group? End No The features thus extracted are shown in figure 7. The subplot 1 shows speech waveform.the subplot 2 shows the cepstrum representation of the speech sample that indicates normal stress level.the subplot 3,4 represents the cepstrum of the speech samples with low and high stress levels respectively. The variation in these cepstrums can be seen clearly.the feature vectors thus obtained were clustered using kmeans algorithm. V.SIMULATION RESULTS Fig 5: kmeans algorithm A monitoring system which uses speech as input for indicating stress level of the person has been developed. The system uses MATLAB environment for analysing the speech samples. Figure 6 shows the simulation results for stress level monitoring system. In testing phase, the test sample is applied to the system. The feature vector is calculated for the input test sample. The minimum distance between the codebook generated in the training phase and in the testing phase will result in the stress level of the input speech sample. Figure 9 shows the simulation results of the speech sample which indicates the high stress level. Fig 6: MATLAB simulation of the stress level classification. Fig 8: shows the kmeans clustering of the training data and the test sample.

5 [8] Saloni, R. K. Sharma And Anil K. Gupta, 2014,Classification of High Blood Pressure Persons Vs Normal Blood Pressure Persons Using Voice Analysis, IIJSP, 1, Fig 9: Simulation results showing the test sample identified as high stress VI.CONCLUSION The paper brings out an approach to identify the stress level of a person using speech. The MFCC feature vectors are calculated and the vectors are clustered using kmeans algorithm. An accuracy of 83% was obtained. Furthermore different algorithms can be used to increase the accuracy of the system. REFERENCES [1] A. A. Khulage, Prof. B. V. Pathak, Analysis of speech under stress using linear techniques and non- linear techniques for emotion recognition system. [2] Abdelwadood Mesleh1, Dmitriy Skopin1, Sergey Baglikov2, And Anas Quteishat, nov Heart Rate Extraction from Vowel Speech Signal, Journal of computer Science and technology 27(6): 1243{1251. [3] Bageshree V. Sathe-Pathak, Ashish R. Panat, July 2012 Extraction of Pitch and Formants and its Analysis to Identify 3 different emotional states of a person, IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 4, No 1. [4] Clifford S. Hopkins,Roy J. Ratley,Daniel S. Benincasa,John J. Grieco, 2005 Evaluation of Voice Stress Analysis Technology, 38th Hawaii International Conference on System Sciences. [5] Li Dong, august 2011 Time series analysis of jitter in sustained vowels, ICPHS XVII regular session hong Kong, [6] Mireia Farrús, Javier Hernando, Pascual Ejarque,Jitter and Shimmer Measurements for Speaker Recognition. [7] Guojun Zhou,John H. L. Hansen, James F. Kaiser, march 2001,Nonlinear Feature Based Classification of Speech Under Stress ieee transactions on speech and audio processing, vol. 9,no. 3.

Voice Command Based Computer Application Control Using MFCC

Voice Command Based Computer Application Control Using MFCC Voice Command Based Computer Application Control Using MFCC Abinayaa B., Arun D., Darshini B., Nataraj C Department of Embedded Systems Technologies, Sri Ramakrishna College of Engineering, Coimbatore,

More information

Authentication of Fingerprint Recognition Using Natural Language Processing

Authentication of Fingerprint Recognition Using Natural Language Processing Authentication of Fingerprint Recognition Using Natural Language Shrikala B. Digavadekar 1, Prof. Ravindra T. Patil 2 1 Tatyasaheb Kore Institute of Engineering & Technology, Warananagar, India 2 Tatyasaheb

More information

Voice & Speech Based Security System Using MATLAB

Voice & Speech Based Security System Using MATLAB Silvy Achankunju, Chiranjeevi Mondikathi 1 Voice & Speech Based Security System Using MATLAB 1. Silvy Achankunju 2. Chiranjeevi Mondikathi M Tech II year Assistant Professor-EC Dept. silvy.jan28@gmail.com

More information

Text-Independent Speaker Identification

Text-Independent Speaker Identification December 8, 1999 Text-Independent Speaker Identification Til T. Phan and Thomas Soong 1.0 Introduction 1.1 Motivation The problem of speaker identification is an area with many different applications.

More information

Aditi Upadhyay Research Scholar, Department of Electronics & Communication Engineering Jaipur National University, Jaipur, Rajasthan, India

Aditi Upadhyay Research Scholar, Department of Electronics & Communication Engineering Jaipur National University, Jaipur, Rajasthan, India Analysis of Different Classifier Using Feature Extraction in Speaker Identification and Verification under Adverse Acoustic Condition for Different Scenario Shrikant Upadhyay Assistant Professor, Department

More information

Real Time Speaker Recognition System using MFCC and Vector Quantization Technique

Real Time Speaker Recognition System using MFCC and Vector Quantization Technique Real Time Speaker Recognition System using MFCC and Vector Quantization Technique Roma Bharti Mtech, Manav rachna international university Faridabad ABSTRACT This paper represents a very strong mathematical

More information

RECOGNITION OF EMOTION FROM MARATHI SPEECH USING MFCC AND DWT ALGORITHMS

RECOGNITION OF EMOTION FROM MARATHI SPEECH USING MFCC AND DWT ALGORITHMS RECOGNITION OF EMOTION FROM MARATHI SPEECH USING MFCC AND DWT ALGORITHMS Dipti D. Joshi, M.B. Zalte (EXTC Department, K.J. Somaiya College of Engineering, University of Mumbai, India) Diptijoshi3@gmail.com

More information

IJETST- Vol. 03 Issue 05 Pages May ISSN

IJETST- Vol. 03 Issue 05 Pages May ISSN International Journal of Emerging Trends in Science and Technology Implementation of MFCC Extraction Architecture and DTW Technique in Speech Recognition System R.M.Sneha 1, K.L.Hemalatha 2 1 PG Student,

More information

Design of Feature Extraction Circuit for Speech Recognition Applications

Design of Feature Extraction Circuit for Speech Recognition Applications Design of Feature Extraction Circuit for Speech Recognition Applications SaambhaviVB, SSSPRao and PRajalakshmi Indian Institute of Technology Hyderabad Email: ee10m09@iithacin Email: sssprao@cmcltdcom

More information

STUDY OF SPEAKER RECOGNITION SYSTEMS

STUDY OF SPEAKER RECOGNITION SYSTEMS STUDY OF SPEAKER RECOGNITION SYSTEMS A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR BACHELOR IN TECHNOLOGY IN ELECTRONICS & COMMUNICATION BY ASHISH KUMAR PANDA (107EC016) AMIT KUMAR SAHOO

More information

ACEEE Int. J. on Electrical and Power Engineering, Vol. 02, No. 02, August 2011

ACEEE Int. J. on Electrical and Power Engineering, Vol. 02, No. 02, August 2011 DOI: 01.IJEPE.02.02.69 ACEEE Int. J. on Electrical and Power Engineering, Vol. 02, No. 02, August 2011 Dynamic Spectrum Derived Mfcc and Hfcc Parameters and Human Robot Speech Interaction Krishna Kumar

More information

GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS. Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1)

GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS. Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1) GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1) (1) Stanford University (2) National Research and Simulation Center, Rafael Ltd. 0 MICROPHONE

More information

Speech Based Voice Recognition System for Natural Language Processing

Speech Based Voice Recognition System for Natural Language Processing Speech Based Voice Recognition System for Natural Language Processing Dr. Kavitha. R 1, Nachammai. N 2, Ranjani. R 2, Shifali. J 2, 1 Assitant Professor-II,CSE, 2 BE..- IV year students School of Computing,

More information

Implementing a Speech Recognition System on a GPU using CUDA. Presented by Omid Talakoub Astrid Yi

Implementing a Speech Recognition System on a GPU using CUDA. Presented by Omid Talakoub Astrid Yi Implementing a Speech Recognition System on a GPU using CUDA Presented by Omid Talakoub Astrid Yi Outline Background Motivation Speech recognition algorithm Implementation steps GPU implementation strategies

More information

Analyzing Mel Frequency Cepstral Coefficient for Recognition of Isolated English Word using DTW Matching

Analyzing Mel Frequency Cepstral Coefficient for Recognition of Isolated English Word using DTW Matching Abstract- Analyzing Mel Frequency Cepstral Coefficient for Recognition of Isolated English Word using DTW Matching Mr. Nitin Goyal, Dr. R.K.Purwar PG student, USICT NewDelhi, Associate Professor, USICT

More information

Image Transformation Techniques Dr. Rajeev Srivastava Dept. of Computer Engineering, ITBHU, Varanasi

Image Transformation Techniques Dr. Rajeev Srivastava Dept. of Computer Engineering, ITBHU, Varanasi Image Transformation Techniques Dr. Rajeev Srivastava Dept. of Computer Engineering, ITBHU, Varanasi 1. Introduction The choice of a particular transform in a given application depends on the amount of

More information

Neetha Das Prof. Andy Khong

Neetha Das Prof. Andy Khong Neetha Das Prof. Andy Khong Contents Introduction and aim Current system at IMI Proposed new classification model Support Vector Machines Initial audio data collection and processing Features and their

More information

Device Activation based on Voice Recognition using Mel Frequency Cepstral Coefficients (MFCC s) Algorithm

Device Activation based on Voice Recognition using Mel Frequency Cepstral Coefficients (MFCC s) Algorithm Device Activation based on Voice Recognition using Mel Frequency Cepstral Coefficients (MFCC s) Algorithm Hassan Mohammed Obaid Al Marzuqi 1, Shaik Mazhar Hussain 2, Dr Anilloy Frank 3 1,2,3Middle East

More information

SPEAKER RECOGNITION. 1. Speech Signal

SPEAKER RECOGNITION. 1. Speech Signal SPEAKER RECOGNITION Speaker Recognition is the problem of identifying a speaker from a recording of their speech. It is an important topic in Speech Signal Processing and has a variety of applications,

More information

Environment Independent Speech Recognition System using MFCC (Mel-frequency cepstral coefficient)

Environment Independent Speech Recognition System using MFCC (Mel-frequency cepstral coefficient) Environment Independent Speech Recognition System using MFCC (Mel-frequency cepstral coefficient) Kamalpreet kaur #1, Jatinder Kaur *2 #1, *2 Department of Electronics and Communication Engineering, CGCTC,

More information

Available online Journal of Scientific and Engineering Research, 2016, 3(4): Research Article

Available online   Journal of Scientific and Engineering Research, 2016, 3(4): Research Article Available online www.jsaer.com, 2016, 3(4):417-422 Research Article ISSN: 2394-2630 CODEN(USA): JSERBR Automatic Indexing of Multimedia Documents by Neural Networks Dabbabi Turkia 1, Lamia Bouafif 2, Ellouze

More information

A Novel Template Matching Approach To Speaker-Independent Arabic Spoken Digit Recognition

A Novel Template Matching Approach To Speaker-Independent Arabic Spoken Digit Recognition Special Session: Intelligent Knowledge Management A Novel Template Matching Approach To Speaker-Independent Arabic Spoken Digit Recognition Jiping Sun 1, Jeremy Sun 1, Kacem Abida 2, and Fakhri Karray

More information

Software/Hardware Co-Design of HMM Based Isolated Digit Recognition System

Software/Hardware Co-Design of HMM Based Isolated Digit Recognition System 154 JOURNAL OF COMPUTERS, VOL. 4, NO. 2, FEBRUARY 2009 Software/Hardware Co-Design of HMM Based Isolated Digit Recognition System V. Amudha, B.Venkataramani, R. Vinoth kumar and S. Ravishankar Department

More information

Chapter 3. Speech segmentation. 3.1 Preprocessing

Chapter 3. Speech segmentation. 3.1 Preprocessing , as done in this dissertation, refers to the process of determining the boundaries between phonemes in the speech signal. No higher-level lexical information is used to accomplish this. This chapter presents

More information

Intelligent Hands Free Speech based SMS System on Android

Intelligent Hands Free Speech based SMS System on Android Intelligent Hands Free Speech based SMS System on Android Gulbakshee Dharmale 1, Dr. Vilas Thakare 3, Dr. Dipti D. Patil 2 1,3 Computer Science Dept., SGB Amravati University, Amravati, INDIA. 2 Computer

More information

2014, IJARCSSE All Rights Reserved Page 461

2014, IJARCSSE All Rights Reserved Page 461 Volume 4, Issue 1, January 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Real Time Speech

More information

Input speech signal. Selected /Rejected. Pre-processing Feature extraction Matching algorithm. Database. Figure 1: Process flow in ASR

Input speech signal. Selected /Rejected. Pre-processing Feature extraction Matching algorithm. Database. Figure 1: Process flow in ASR Volume 5, Issue 1, January 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Feature Extraction

More information

Secure E- Commerce Transaction using Noisy Password with Voiceprint and OTP

Secure E- Commerce Transaction using Noisy Password with Voiceprint and OTP Secure E- Commerce Transaction using Noisy Password with Voiceprint and OTP Komal K. Kumbhare Department of Computer Engineering B. D. C. O. E. Sevagram, India komalkumbhare27@gmail.com Prof. K. V. Warkar

More information

Processing and Recognition of Voice

Processing and Recognition of Voice International Archive of Applied Sciences and Technology Int. Arch. App. Sci. Technol; Vol 4 [4]Decemebr 2013: 31-40 2013 Society of Education, India [ISO9001: 2008 Certified Organization] www.soeagra.com/iaast.html

More information

Make Garfield (6 axis robot arm) Smart through the design and implementation of Voice Recognition and Control

Make Garfield (6 axis robot arm) Smart through the design and implementation of Voice Recognition and Control i University of Southern Queensland Faculty of Health, Engineering & Sciences Make Garfield (6 axis robot arm) Smart through the design and implementation of Voice Recognition and Control A dissertation

More information

Acoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing

Acoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing Acoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing Samer Al Moubayed Center for Speech Technology, Department of Speech, Music, and Hearing, KTH, Sweden. sameram@kth.se

More information

Volume 2, Issue 9, September 2014 ISSN

Volume 2, Issue 9, September 2014 ISSN Fingerprint Verification of the Digital Images by Using the Discrete Cosine Transformation, Run length Encoding, Fourier transformation and Correlation. Palvee Sharma 1, Dr. Rajeev Mahajan 2 1M.Tech Student

More information

Introduction to Massive Data Interpretation

Introduction to Massive Data Interpretation Introduction to Massive Data Interpretation JERKER HAMMARBERG JAKOB FREDSLUND THE ALEXANDRA INSTITUTE 2013 2/12 Introduction Cases C1. Bird Vocalization Recognition C2. Body Movement Classification C3.

More information

A text-independent speaker verification model: A comparative analysis

A text-independent speaker verification model: A comparative analysis A text-independent speaker verification model: A comparative analysis Rishi Charan, Manisha.A, Karthik.R, Raesh Kumar M, Senior IEEE Member School of Electronic Engineering VIT University Tamil Nadu, India

More information

ON THE PERFORMANCE OF SEGMENT AVERAGING OF DISCRETE COSINE TRANSFORM COEFFICIENTS ON MUSICAL INSTRUMENTS TONE RECOGNITION

ON THE PERFORMANCE OF SEGMENT AVERAGING OF DISCRETE COSINE TRANSFORM COEFFICIENTS ON MUSICAL INSTRUMENTS TONE RECOGNITION O THE PERFORMACE OF SEGMET AVERAGIG OF DISCRETE COSIE TRASFORM COEFFICIETS O MUSICAL ISTRUMETS TOE RECOGITIO Linggo Sumarno Electrical Engineering Study Program, Sanata Dharma University, Yogyakarta, Indonesia

More information

DETECTING INDOOR SOUND EVENTS

DETECTING INDOOR SOUND EVENTS DETECTING INDOOR SOUND EVENTS Toma TELEMBICI, Lacrimioara GRAMA Signal Processing Group, Basis of Electronics Department, Faculty of Electronics, Telecommunications and Information Technology, Technical

More information

MP3 Speech and Speaker Recognition with Nearest Neighbor. ECE417 Multimedia Signal Processing Fall 2017

MP3 Speech and Speaker Recognition with Nearest Neighbor. ECE417 Multimedia Signal Processing Fall 2017 MP3 Speech and Speaker Recognition with Nearest Neighbor ECE417 Multimedia Signal Processing Fall 2017 Goals Given a dataset of N audio files: Features Raw Features, Cepstral (Hz), Cepstral (Mel) Classifier

More information

Machine Perception of Music & Audio. Topic 10: Classification

Machine Perception of Music & Audio. Topic 10: Classification Machine Perception of Music & Audio Topic 10: Classification 1 Classification Label objects as members of sets Things on the left Things on the right There is a set of possible examples Each example is

More information

Cepstral Analysis Tools for Percussive Timbre Identification

Cepstral Analysis Tools for Percussive Timbre Identification Cepstral Analysis Tools for Percussive Timbre Identification William Brent Department of Music and Center for Research in Computing and the Arts University of California, San Diego wbrent@ucsd.edu ABSTRACT

More information

Dietrich Paulus Joachim Hornegger. Pattern Recognition of Images and Speech in C++

Dietrich Paulus Joachim Hornegger. Pattern Recognition of Images and Speech in C++ Dietrich Paulus Joachim Hornegger Pattern Recognition of Images and Speech in C++ To Dorothea, Belinda, and Dominik In the text we use the following names which are protected, trademarks owned by a company

More information

Emotion recognition using Speech Signal: A Review

Emotion recognition using Speech Signal: A Review Emotion recognition using Speech Signal: A Review Dhruvi desai ME student, Communication System Engineering (E&C), Sarvajanik College of Engineering & Technology Surat, Gujarat, India. ---------------------------------------------------------------------***---------------------------------------------------------------------

More information

A NEURAL NETWORK APPLICATION FOR A COMPUTER ACCESS SECURITY SYSTEM: KEYSTROKE DYNAMICS VERSUS VOICE PATTERNS

A NEURAL NETWORK APPLICATION FOR A COMPUTER ACCESS SECURITY SYSTEM: KEYSTROKE DYNAMICS VERSUS VOICE PATTERNS A NEURAL NETWORK APPLICATION FOR A COMPUTER ACCESS SECURITY SYSTEM: KEYSTROKE DYNAMICS VERSUS VOICE PATTERNS A. SERMET ANAGUN Industrial Engineering Department, Osmangazi University, Eskisehir, Turkey

More information

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects. Perceptual coding Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual encoders, however, have been designed for the compression of general

More information

Pitch Prediction from Mel-frequency Cepstral Coefficients Using Sparse Spectrum Recovery

Pitch Prediction from Mel-frequency Cepstral Coefficients Using Sparse Spectrum Recovery Pitch Prediction from Mel-frequency Cepstral Coefficients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta Kumar Ghosh SPIRE LAB Electrical Engineering, Indian Institute of Science (IISc), Bangalore,

More information

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant

More information

NON-UNIFORM SPEAKER NORMALIZATION USING FREQUENCY-DEPENDENT SCALING FUNCTION

NON-UNIFORM SPEAKER NORMALIZATION USING FREQUENCY-DEPENDENT SCALING FUNCTION NON-UNIFORM SPEAKER NORMALIZATION USING FREQUENCY-DEPENDENT SCALING FUNCTION S. V. Bharath Kumar Imaging Technologies Lab General Electric - Global Research JFWTC, Bangalore - 560086, INDIA bharath.sv@geind.ge.com

More information

MATLAB Apps for Teaching Digital Speech Processing

MATLAB Apps for Teaching Digital Speech Processing MATLAB Apps for Teaching Digital Speech Processing Lawrence Rabiner, Rutgers University Ronald Schafer, Stanford University GUI LITE 2.5 editor written by Maria d Souza and Dan Litvin MATLAB coding support

More information

Separation of speech mixture using time-frequency masking implemented on a DSP

Separation of speech mixture using time-frequency masking implemented on a DSP Separation of speech mixture using time-frequency masking implemented on a DSP Javier Gaztelumendi and Yoganathan Sivakumar March 13, 2017 1 Introduction This report describes the implementation of a blind

More information

Complex Identification Decision Based on Several Independent Speaker Recognition Methods. Ilya Oparin Speech Technology Center

Complex Identification Decision Based on Several Independent Speaker Recognition Methods. Ilya Oparin Speech Technology Center Complex Identification Decision Based on Several Independent Speaker Recognition Methods Ilya Oparin Speech Technology Center Corporate Overview Global provider of voice biometric solutions Company name:

More information

AUDIO. Henning Schulzrinne Dept. of Computer Science Columbia University Spring 2015

AUDIO. Henning Schulzrinne Dept. of Computer Science Columbia University Spring 2015 AUDIO Henning Schulzrinne Dept. of Computer Science Columbia University Spring 2015 Key objectives How do humans generate and process sound? How does digital sound work? How fast do I have to sample audio?

More information

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual coding Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual encoders, however, have been designed for the compression of general

More information

SPEECH WATERMARKING USING DISCRETE WAVELET TRANSFORM, DISCRETE COSINE TRANSFORM AND SINGULAR VALUE DECOMPOSITION

SPEECH WATERMARKING USING DISCRETE WAVELET TRANSFORM, DISCRETE COSINE TRANSFORM AND SINGULAR VALUE DECOMPOSITION SPEECH WATERMARKING USING DISCRETE WAVELET TRANSFORM, DISCRETE COSINE TRANSFORM AND SINGULAR VALUE DECOMPOSITION D. AMBIKA *, Research Scholar, Department of Computer Science, Avinashilingam Institute

More information

Fatima Michael College of Engineering & Technology

Fatima Michael College of Engineering & Technology DEPARTMENT OF ECE V SEMESTER ECE QUESTION BANK EC6502 PRINCIPLES OF DIGITAL SIGNAL PROCESSING UNIT I DISCRETE FOURIER TRANSFORM PART A 1. Obtain the circular convolution of the following sequences x(n)

More information

Approach to Increase Accuracy of Multimodal Biometric System for Feature Level Fusion

Approach to Increase Accuracy of Multimodal Biometric System for Feature Level Fusion Approach to Increase Accuracy of Multimodal Biometric System for Feature Level Fusion Er. Munish Kumar, Er. Prabhjit Singh M-Tech(Scholar) Global Institute of Management and Emerging Technology Assistant

More information

K-modes Clustering Algorithm for Categorical Data

K-modes Clustering Algorithm for Categorical Data K-modes Clustering Algorithm for Categorical Data Neha Sharma Samrat Ashok Technological Institute Department of Information Technology, Vidisha, India Nirmal Gaud Samrat Ashok Technological Institute

More information

Keywords- Classification algorithm, Hypertensive, K Nearest Neighbor, Naive Bayesian, Data normalization

Keywords- Classification algorithm, Hypertensive, K Nearest Neighbor, Naive Bayesian, Data normalization GLOBAL JOURNAL OF ENGINEERING SCIENCE AND RESEARCHES APPLICATION OF CLASSIFICATION TECHNIQUES TO DETECT HYPERTENSIVE HEART DISEASE Tulasimala B. N* 1, Elakkiya S 2 & Keerthana N 3 *1 Assistant Professor,

More information

A Brief Overview of Audio Information Retrieval. Unjung Nam CCRMA Stanford University

A Brief Overview of Audio Information Retrieval. Unjung Nam CCRMA Stanford University A Brief Overview of Audio Information Retrieval Unjung Nam CCRMA Stanford University 1 Outline What is AIR? Motivation Related Field of Research Elements of AIR Experiments and discussion Music Classification

More information

Incremental K-means Clustering Algorithms: A Review

Incremental K-means Clustering Algorithms: A Review Incremental K-means Clustering Algorithms: A Review Amit Yadav Department of Computer Science Engineering Prof. Gambhir Singh H.R.Institute of Engineering and Technology, Ghaziabad Abstract: Clustering

More information

Fingerprint Based Gender Classification Using Block-Based DCT

Fingerprint Based Gender Classification Using Block-Based DCT Fingerprint Based Gender Classification Using Block-Based DCT Akhil Anjikar 1, Suchita Tarare 2, M. M. Goswami 3 Dept. of IT, Rajiv Gandhi College of Engineering & Research, RTM Nagpur University, Nagpur,

More information

Judul: EleectroLarynx, Esopahgus, and Normal Speech Classsification using Grradient Disceent, Gradient discent with momentum and learningg rate, and

Judul: EleectroLarynx, Esopahgus, and Normal Speech Classsification using Grradient Disceent, Gradient discent with momentum and learningg rate, and Judul: EleectroLarynx, Esopahgus, and Normal Speech Classsification using Grradient Disceent, Gradient discent with momentum and learningg rate, and Leevenberg-Maarquardt Algorithm Proceding Seminar Internasional:

More information

Mahdi Amiri. February Sharif University of Technology

Mahdi Amiri. February Sharif University of Technology Course Presentation Multimedia Systems Speech II Mahdi Amiri February 2014 Sharif University of Technology Speech Compression Road Map Based on Time Domain analysis Differential Pulse-Code Modulation (DPCM)

More information

Speech Recognition on DSP: Algorithm Optimization and Performance Analysis

Speech Recognition on DSP: Algorithm Optimization and Performance Analysis Speech Recognition on DSP: Algorithm Optimization and Performance Analysis YUAN Meng A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Philosophy in Electronic Engineering

More information

AUDIO COMPRESSION USING WAVELET TRANSFORM

AUDIO COMPRESSION USING WAVELET TRANSFORM AUDIO COMPRESSION USING WAVELET TRANSFORM Swapnil T. Dumbre Department of electronics, Amrutvahini College of Engineering,Sangamner,India Sheetal S. Gundal Department of electronics, Amrutvahini College

More information

Classification and Vowel Recognition

Classification and Vowel Recognition Laboratory 8 Classification and Vowel Recognition 8.1 Introduction The ability to recognize and categorize things is fundamental to human cognition. A large part of our ability to understand and deal with

More information

Multimedia Systems Speech II Hmid R. Rabiee Mahdi Amiri February 2015 Sharif University of Technology

Multimedia Systems Speech II Hmid R. Rabiee Mahdi Amiri February 2015 Sharif University of Technology Course Presentation Multimedia Systems Speech II Hmid R. Rabiee Mahdi Amiri February 25 Sharif University of Technology Speech Compression Road Map Based on Time Domain analysis Differential Pulse-Code

More information

Figure (5) Kohonen Self-Organized Map

Figure (5) Kohonen Self-Organized Map 2- KOHONEN SELF-ORGANIZING MAPS (SOM) - The self-organizing neural networks assume a topological structure among the cluster units. - There are m cluster units, arranged in a one- or two-dimensional array;

More information

Principles of Audio Coding

Principles of Audio Coding Principles of Audio Coding Topics today Introduction VOCODERS Psychoacoustics Equal-Loudness Curve Frequency Masking Temporal Masking (CSIT 410) 2 Introduction Speech compression algorithm focuses on exploiting

More information

Self-Organizing Maps for Analysis of Expandable Polystyrene Batch Process

Self-Organizing Maps for Analysis of Expandable Polystyrene Batch Process International Journal of Computers, Communications & Control Vol. II (2007), No. 2, pp. 143-148 Self-Organizing Maps for Analysis of Expandable Polystyrene Batch Process Mikko Heikkinen, Ville Nurminen,

More information

Analyzing Vocal Patterns to Determine Emotion Maisy Wieman, Andy Sun

Analyzing Vocal Patterns to Determine Emotion Maisy Wieman, Andy Sun Analyzing Vocal Patterns to Determine Emotion Maisy Wieman, Andy Sun 1. Introduction The human voice is very versatile and carries a multitude of emotions. Emotion in speech carries extra insight about

More information

Detection of goal event in soccer videos

Detection of goal event in soccer videos Detection of goal event in soccer videos Hyoung-Gook Kim, Steffen Roeber, Amjad Samour, Thomas Sikora Department of Communication Systems, Technical University of Berlin, Einsteinufer 17, D-10587 Berlin,

More information

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011 Audio Fundamentals, Compression Techniques & Standards Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011 Outlines Audio Fundamentals Sampling, digitization, quantization μ-law

More information

SPEECH FEATURE EXTRACTION USING WEIGHTED HIGHER-ORDER LOCAL AUTO-CORRELATION

SPEECH FEATURE EXTRACTION USING WEIGHTED HIGHER-ORDER LOCAL AUTO-CORRELATION Far East Journal of Electronics and Communications Volume 3, Number 2, 2009, Pages 125-140 Published Online: September 14, 2009 This paper is available online at http://www.pphmj.com 2009 Pushpa Publishing

More information

Comparing MFCC and MPEG-7 Audio Features for Feature Extraction, Maximum Likelihood HMM and Entropic Prior HMM for Sports Audio Classification

Comparing MFCC and MPEG-7 Audio Features for Feature Extraction, Maximum Likelihood HMM and Entropic Prior HMM for Sports Audio Classification MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Comparing MFCC and MPEG-7 Audio Features for Feature Extraction, Maximum Likelihood HMM and Entropic Prior HMM for Sports Audio Classification

More information

A Texture Feature Extraction Technique Using 2D-DFT and Hamming Distance

A Texture Feature Extraction Technique Using 2D-DFT and Hamming Distance A Texture Feature Extraction Technique Using 2D-DFT and Hamming Distance Author Tao, Yu, Muthukkumarasamy, Vallipuram, Verma, Brijesh, Blumenstein, Michael Published 2003 Conference Title Fifth International

More information

Keywords Wavelet decomposition, SIFT, Unibiometrics, Multibiometrics, Histogram Equalization.

Keywords Wavelet decomposition, SIFT, Unibiometrics, Multibiometrics, Histogram Equalization. Volume 3, Issue 7, July 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Secure and Reliable

More information

: A MATLAB TOOL FOR SPEECH PROCESSING, ANALYSIS AND RECOGNITION: SAR-LAB

: A MATLAB TOOL FOR SPEECH PROCESSING, ANALYSIS AND RECOGNITION: SAR-LAB 2006-472: A MATLAB TOOL FOR SPEECH PROCESSING, ANALYSIS AND RECOGNITION: SAR-LAB Veton Kepuska, Florida Tech Kepuska has joined FIT in 2003 after past 12 years of R&D experience in high-tech industry in

More information

2.4 Audio Compression

2.4 Audio Compression 2.4 Audio Compression 2.4.1 Pulse Code Modulation Audio signals are analog waves. The acoustic perception is determined by the frequency (pitch) and the amplitude (loudness). For storage, processing and

More information

1 Introduction. 3 Data Preprocessing. 2 Literature Review

1 Introduction. 3 Data Preprocessing. 2 Literature Review Rock or not? This sure does. [Category] Audio & Music CS 229 Project Report Anand Venkatesan(anand95), Arjun Parthipan(arjun777), Lakshmi Manoharan(mlakshmi) 1 Introduction Music Genre Classification continues

More information

Source Coding Basics and Speech Coding. Yao Wang Polytechnic University, Brooklyn, NY11201

Source Coding Basics and Speech Coding. Yao Wang Polytechnic University, Brooklyn, NY11201 Source Coding Basics and Speech Coding Yao Wang Polytechnic University, Brooklyn, NY1121 http://eeweb.poly.edu/~yao Outline Why do we need to compress speech signals Basic components in a source coding

More information

New Approach for K-mean and K-medoids Algorithm

New Approach for K-mean and K-medoids Algorithm New Approach for K-mean and K-medoids Algorithm Abhishek Patel Department of Information & Technology, Parul Institute of Engineering & Technology, Vadodara, Gujarat, India Purnima Singh Department of

More information

Keywords:- Fingerprint Identification, Hong s Enhancement, Euclidian Distance, Artificial Neural Network, Segmentation, Enhancement.

Keywords:- Fingerprint Identification, Hong s Enhancement, Euclidian Distance, Artificial Neural Network, Segmentation, Enhancement. Volume 5, Issue 8, August 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Embedded Algorithm

More information

Introduction to Artificial Intelligence

Introduction to Artificial Intelligence Introduction to Artificial Intelligence COMP307 Machine Learning 2: 3-K Techniques Yi Mei yi.mei@ecs.vuw.ac.nz 1 Outline K-Nearest Neighbour method Classification (Supervised learning) Basic NN (1-NN)

More information

Audio-Visual Speech Activity Detection

Audio-Visual Speech Activity Detection Institut für Technische Informatik und Kommunikationsnetze Semester Thesis at the Department of Information Technology and Electrical Engineering Audio-Visual Speech Activity Detection Salome Mannale Advisors:

More information

Fully Automatic Methodology for Human Action Recognition Incorporating Dynamic Information

Fully Automatic Methodology for Human Action Recognition Incorporating Dynamic Information Fully Automatic Methodology for Human Action Recognition Incorporating Dynamic Information Ana González, Marcos Ortega Hortas, and Manuel G. Penedo University of A Coruña, VARPA group, A Coruña 15071,

More information

Multimedia Database Systems. Retrieval by Content

Multimedia Database Systems. Retrieval by Content Multimedia Database Systems Retrieval by Content MIR Motivation Large volumes of data world-wide are not only based on text: Satellite images (oil spill), deep space images (NASA) Medical images (X-rays,

More information

Available online at ScienceDirect. Procedia Computer Science 89 (2016 )

Available online at   ScienceDirect. Procedia Computer Science 89 (2016 ) Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 89 (2016 ) 778 784 Twelfth International Multi-Conference on Information Processing-2016 (IMCIP-2016) Color Image Compression

More information

Source Coding Techniques

Source Coding Techniques Source Coding Techniques Source coding is based on changing the content of the original signal. Also called semantic-based coding. Compression rates may be higher but at a price of loss of information.

More information

Tactile Sensor System Processing Based On K-means Clustering

Tactile Sensor System Processing Based On K-means Clustering Tactile Sensor System Processing Based On K-means Clustering Harry Chan-Maestas Rochester Institute of Technology One Lomb Memorial Drive Rochester, NY 14623 USA Email: hxc1414@rit.edu Donald A. Sofge

More information

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques 24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE

More information

Biometric Security System Using Palm print

Biometric Security System Using Palm print ISSN (Online) : 2319-8753 ISSN (Print) : 2347-6710 International Journal of Innovative Research in Science, Engineering and Technology Volume 3, Special Issue 3, March 2014 2014 International Conference

More information

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant

More information

Computer Aided Diagnosis Based on Medical Image Processing and Artificial Intelligence Methods

Computer Aided Diagnosis Based on Medical Image Processing and Artificial Intelligence Methods International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 9 (2013), pp. 887-892 International Research Publications House http://www. irphouse.com /ijict.htm Computer

More information

CT516 Advanced Digital Communications Lecture 7: Speech Encoder

CT516 Advanced Digital Communications Lecture 7: Speech Encoder CT516 Advanced Digital Communications Lecture 7: Speech Encoder Yash M. Vasavada Associate Professor, DA-IICT, Gandhinagar 2nd February 2017 Yash M. Vasavada (DA-IICT) CT516: Adv. Digital Comm. 2nd February

More information

ISSN: (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies

ISSN: (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding Perceptual Coding Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding Part II wrap up 6.082 Fall 2006 Perceptual Coding, Slide 1 Lossless vs.

More information

Seismic regionalization based on an artificial neural network

Seismic regionalization based on an artificial neural network Seismic regionalization based on an artificial neural network *Jaime García-Pérez 1) and René Riaño 2) 1), 2) Instituto de Ingeniería, UNAM, CU, Coyoacán, México D.F., 014510, Mexico 1) jgap@pumas.ii.unam.mx

More information

Classifying Building Energy Consumption Behavior Using an Ensemble of Machine Learning Methods

Classifying Building Energy Consumption Behavior Using an Ensemble of Machine Learning Methods Classifying Building Energy Consumption Behavior Using an Ensemble of Machine Learning Methods Kunal Sharma, Nov 26 th 2018 Dr. Lewe, Dr. Duncan Areospace Design Lab Georgia Institute of Technology Objective

More information

Authentication and Secret Message Transmission Technique Using Discrete Fourier Transformation

Authentication and Secret Message Transmission Technique Using Discrete Fourier Transformation , 2009, 5, 363-370 doi:10.4236/ijcns.2009.25040 Published Online August 2009 (http://www.scirp.org/journal/ijcns/). Authentication and Secret Message Transmission Technique Using Discrete Fourier Transformation

More information

Music Genre Classification

Music Genre Classification Music Genre Classification Matthew Creme, Charles Burlin, Raphael Lenain Stanford University December 15, 2016 Abstract What exactly is it that makes us, humans, able to tell apart two songs of different

More information

Agenda for supervisor meeting the 22th of March 2011

Agenda for supervisor meeting the 22th of March 2011 Agenda for supervisor meeting the 22th of March 2011 Group 11gr842 A3-219 at 14:00 1 Approval of the agenda 2 Approval of minutes from last meeting 3 Status from the group Since last time the two teams

More information