[N569] Wavelet speech enhancement based on voiced/unvoiced decision

Similar documents
Image Denoising Based on Hybrid Fourier and Neighborhood Wavelet Coefficients Jun Cheng, Songli Lei

International Journal for Research in Applied Science & Engineering Technology (IJRASET) Denoising Of Speech Signals Using Wavelets

Image denoising in the wavelet domain using Improved Neigh-shrink

DCT image denoising: a simple and effective image denoising algorithm

Adaptive Wavelet Image Denoising Based on the Entropy of Homogenus Regions

WAVELET USE FOR IMAGE RESTORATION

Mengjiao Zhao, Wei-Ping Zhu

Empirical Mode Decomposition Based Denoising by Customized Thresholding

Efficient Algorithm For Denoising Of Medical Images Using Discrete Wavelet Transforms

A Trimmed Translation-Invariant Denoising Estimator

CHAPTER 4 WAVELET TRANSFORM-GENETIC ALGORITHM DENOISING TECHNIQUE

A New Soft-Thresholding Image Denoising Method

Image Denoising using SWT 2D Wavelet Transform

IMAGE DE-NOISING IN WAVELET DOMAIN

A New Wavelet Denoising Method for Experimental Time Domain Signals: Pulsed Dipolar ESR NY 14853, USA USA

Genetic Algorithm Based Medical Image Denoising Through Sub Band Adaptive Thresholding.

Denoising and Edge Detection Using Sobelmethod

An Effective Denoising Method for Images Contaminated with Mixed Noise Based on Adaptive Median Filtering and Wavelet Threshold Denoising

Wavelet Shrinkage in Noise Removal of Hyperspectral Remote Sensing Data

Adaptive Quantization for Video Compression in Frequency Domain

AN ALGORITHM FOR BLIND RESTORATION OF BLURRED AND NOISY IMAGES

Compression of RADARSAT Data with Block Adaptive Wavelets Abstract: 1. Introduction

A GEOMETRICAL WAVELET SHRINKAGE APPROACH FOR IMAGE DENOISING

Comparison of Wavelet thresholding for image denoising using different shrinkage

Fourier Transformation Methods in the Field of Gamma Spectrometry

SCALED BAYES IMAGE DENOISING ALGORITHM USING MODIFIED SOFT THRESHOLDING FUNCTION

Separate CT-Reconstruction for Orientation and Position Adaptive Wavelet Denoising

Denoising of Fingerprint Images

Hybrid Wavelet Thresholding for Enhanced MRI Image De-Noising

Incoherent noise suppression with curvelet-domain sparsity Vishal Kumar, EOS-UBC and Felix J. Herrmann, EOS-UBC

Wavelet Transform (WT) & JPEG-2000

Image denoising using curvelet transform: an approach for edge preservation

Image Transformation Techniques Dr. Rajeev Srivastava Dept. of Computer Engineering, ITBHU, Varanasi

Online PLCA for Real-time Semi-supervised Source Separation

An Denoising Method based on Improved Wavelet Threshold Function

Patch-Based Color Image Denoising using efficient Pixel-Wise Weighting Techniques

An Improved Real-time Denoising Method Based on Lifting Wavelet Transform

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

QR Code Watermarking Algorithm based on Wavelet Transform

Bayesian Spherical Wavelet Shrinkage: Applications to Shape Analysis

Image Denoising Methods Based on Wavelet Transform and Threshold Functions

GRID WARPING IN TOTAL VARIATION IMAGE ENHANCEMENT METHODS. Andrey Nasonov, and Andrey Krylov

SPARSE CODE SHRINKAGE BASED ON THE NORMAL INVERSE GAUSSIAN DENSITY MODEL. Department of Physics University of Tromsø N Tromsø, Norway

A Novel Approach of Watershed Segmentation of Noisy Image Using Adaptive Wavelet Threshold

Sparse Component Analysis (SCA) in Random-valued and Salt and Pepper Noise Removal

Image Enhancement Techniques for Fingerprint Identification

PRINCIPAL COMPONENT ANALYSIS IMAGE DENOISING USING LOCAL PIXEL GROUPING

Statistical Image Compression using Fast Fourier Coefficients

Iterated Denoising for Image Recovery

A NEW DCT-BASED WATERMARKING METHOD FOR COPYRIGHT PROTECTION OF DIGITAL AUDIO

CURVELET Based IMAGE DENOISING

Comparative Analysis of Various Denoising Techniques for MRI Image Using Wavelet

IMPROVED MOTION-BASED LOCALIZED SUPER RESOLUTION TECHNIQUE USING DISCRETE WAVELET TRANSFORM FOR LOW RESOLUTION VIDEO ENHANCEMENT

Robust biometric image watermarking for fingerprint and face template protection

Image Denoising Based on Wavelet Transform using Visu Thresholding Technique

Image Denoising based on Spatial/Wavelet Filter using Hybrid Thresholding Function

Denoising the Spectral Information of Non Stationary Image using DWT

Curvelet Transform with Adaptive Tiling

International Journal of Research in Advent Technology Available Online at:

Lecture 12 Video Coding Cascade Transforms H264, Wavelets

Optimization of Observation Membership Function By Particle Swarm Method for Enhancing Performances of Speaker Identification

MULTIMODE TREE CODING OF SPEECH WITH PERCEPTUAL PRE-WEIGHTING AND POST-WEIGHTING

A Simple Algorithm for Image Denoising Based on MS Segmentation

DUAL TREE COMPLEX WAVELETS Part 1

Improved Non-Local Means Algorithm Based on Dimensionality Reduction

COMPARISONS OF DCT-BASED AND DWT-BASED WATERMARKING TECHNIQUES

Fast Noise Level Estimation from a Single Image Degraded with Gaussian Noise

Computer Vision I. Announcements. Fourier Tansform. Efficient Implementation. Edge and Corner Detection. CSE252A Lecture 13.

Structural Similarity Optimized Wiener Filter: A Way to Fight Image Noise

Noise Reduction from Ultrasound Medical Images using Rotated Wavelet Filters

Digital Image Processing. Chapter 7: Wavelets and Multiresolution Processing ( )

Stacked Denoising Autoencoders for Face Pose Normalization

Blur Space Iterative De-blurring

SAR Interferogram Phase Filtering Using Wavelet Transform

FAST AND RELIABLE RECOGNITION OF HUMAN MOTION FROM MOTION TRAJECTORIES USING WAVELET ANALYSIS

Neural Networks Based Time-Delay Estimation using DCT Coefficients

Change Detection in Remotely Sensed Images Based on Image Fusion and Fuzzy Clustering

Evolved Multi-resolution Transforms for Optimized Image Compression and Reconstruction under Quantization

Comparative Study of Dual-Tree Complex Wavelet Transform and Double Density Complex Wavelet Transform for Image Denoising Using Wavelet-Domain

IMAGE FUSION PARAMETER ESTIMATION AND COMPARISON BETWEEN SVD AND DWT TECHNIQUE

A fast iterative thresholding algorithm for wavelet-regularized deconvolution

De-Noising with Spline Wavelets and SWT

Texture Analysis of Painted Strokes 1) Martin Lettner, Paul Kammerer, Robert Sablatnig

WAVELET BASED THRESHOLDING FOR IMAGE DENOISING IN MRI IMAGE

AUDIO COMPRESSION USING WAVELET TRANSFORM

Basis Selection For Wavelet Regression

An Approach for Reduction of Rain Streaks from a Single Image

The Pre-Image Problem and Kernel PCA for Speech Enhancement

No Reference Medical Image Quality Measurement Based on Spread Spectrum and Discrete Wavelet Transform using ROI Processing

CoE4TN3 Medical Image Processing

Voiced-Unvoiced-Silence Classification via Hierarchical Dual Geometry Analysis

2-2-2, Hikaridai, Seika-cho, Soraku-gun, Kyoto , Japan 2 Graduate School of Information Science, Nara Institute of Science and Technology

Image De-Noising and Compression Using Statistical based Thresholding in 2-D Discrete Wavelet Transform

Principal Component Image Interpretation A Logical and Statistical Approach

Speech Modulation for Image Watermarking

Image Denoising Using wavelet Transformation and Principal Component Analysis Using Local Pixel Grouping

A Robust Wavelet-Based Watermarking Algorithm Using Edge Detection

International Journal of Advanced Engineering Technology E-ISSN

Lecture 10 Video Coding Cascade Transforms H264, Wavelets

Digital Image Processing. Prof. P. K. Biswas. Department of Electronic & Electrical Communication Engineering

Transcription:

The 32nd International Congress and Exposition on Noise Control Engineering Jeju International Convention Center, Seogwipo, Korea, August 25-28, 2003 [N569] Wavelet speech enhancement based on voiced/unvoiced decision Jong Kwan Lee Korea Advanced Institute of Science and Technology 2106 LG Hall, Guseong-dong, Yuseong-gu, Daejon, 305-701, Republic of Korea Email address: c13525@mail.kaist.ac.kr Chang D. Yoo Korea Advanced Institute of Science and Technology 2106 LG Hall, Guseong-dong, Yuseong-gu, Daejon, 305-701, Republic of Korea ABSTRACT A wavelet-based speech enhancement algorithm for removing additive background noise from a single channel of noisy speech is proposed. In the algorithm, the enhancement is performed on a frame-by-frame basis, and each frame is classified into either voiced or unvoiced frame to deterimine the appropriate threshold in removing noise. The performance of the proposed algorithm is evaluated on Aurora 2 database, which consists of noisy English connected digits, and it was deterimined to be better than that of traditional wavelet based methods. KEYWORDS: Speech enhancement, Wavelet Transform, Voiced and unvoiced speech, Threshold. 1. Introduction Speech enhancement is an important research field with applications in voice communication and automatic speech recognition systems. The main objective is to maximally reduce noise while minimizing speech distortion. To achieve such an objective, various algorithms have been reported with limited success [1, 2, 3]. Recently, a promising technique based on wavelet transform was proposed for noise reduction[4]. It reduces noise by thresholding the wavelet coefficients so that only the coefficient with values above the threshold are retained. Since, signal energy is concentrated on a small number of wavelet coefficients in many signals while wavelet coefficients of noise is spread over a wide number of coefficients appropriate thresholding can lead to high noise reduction with low signal distortion. Unfortunately, determining an appropriate threshold value is not clear-cut. There are several factors that must be considered such as the characteristics of the signal and signalto-noise ratio(snr). -4149-

Speech can be divided into numerous voiced and unvoiced regions. The energy of voiced regions is a order of magnitude larger than that of unvoiced regions. Therefore for additive white Gaussian noise, the SNRs of voiced regions are generally much higher than that of unvoiced regions; therefore, for a fixed threshold, enhancement in voiced region is more effective than that in unvoiced region. In the proposed speech enhancement algorithm, different threshold values are used for voiced and unvoiced frames. The experimental results show that the proposed algorithm using different threshold values can lead improvement in enhancement over the traditional algorithms that uses one fixed threshold value. This paper is organized as follows. Section discusses the application of wavelet transform in speech enhancement. Section discusses the frame classification into either voiced or unvoiced frame. Section discusses the evaluation of the proposed algorithm. Finally Section concludes. 2. Wavelet Speech Enhancement The simpliest way to perform time/ frequency analysis is by short-time Fourier transform (stft), otherwise, known as the Gabor transform. However, the analysis based on stft is limited by the use of a fixed window size. Short window is required for the analysis of fast changing signals and long window is required for the analysis for slow changing signals. Unlike the bases of the Fourier transform, the bases of the wavelet transform are of different lengths and thus allow a trade-off between time and frequency resolutions. Consider the classical problem of recovering samples s[n] of an unknown deterministic signal from the set of noise corrupted samples x[i] = s[i] + z[i], i = 1, 2,..., N (1) where z[i] is a zero-mean white Gaussian noise of variance σ 2. Let x, s and n denote N 1 column vector containing the samples x[i], s[i] and z[i], respectively. Let W denote N N orthonormal wavelet transform matrix in the wavelet domain, (1) becomes where y = Wx, θ = Ws and n = Wz y = θ + n (2) An important property of the wavelet transform is the energy compaction property. While energy of speech is concentrated on a small number of wavelet coefficients, the noise energy is spread over a large number of coefficients. For relatively high SNR, θ has elements with large and small values but n has elements with small values. Hence, considerable amount of noise can be reduced by setting all coefficient values that are below a certain threshold to zero. Traditional wavelet-based speech enhancement algorithm can be summarized by the following three steps Wavelet transform of noisy signal Thresholding the resulting wavelet coefficients -4150-

Inverse transform to obtain the denoised signal The threshold value can be determined in many ways. Donoho [4] has suggested the following formula T = σ 2log(N), (3) where T is the threshold value and N is the length of the noisy signal. The standard deviation of noise must be estimated in order to determine the threshold value. The basic denoising algorithm using wavelet transform assumes that noise spectrum is white. Therefore we can calculate the standard deviation by the following equation. Assuming zero-mean Gaussian noise, the wavelet coefficients will be Gaussian random variables of zero mean and variance σ 2. In [4], the estimate of the standard deviation is given by σ = (1/0.6745)M edian( c ), (4) where c is the set of the wavelet coefficients of the noise. Noise that we encounter in our everyday life is not white but colored; therefor, an estimate of the noise variance for colored noise must be attained. A popular noise variance estimate proposed by Donoho for colored noise is given by σ i = (1/0.6745)Median( c i ), (5) where c i is the set of coefficients of the i th wavelet band of noise. The two most popular method for wavelet thresholding are hard thresholding and soft thresholding. Soft thresholding removes coefficients below a certain threshold and shrinks those above it. Hard thresholding removes coefficients that are below a certain threshold. However, coefficients above the threshold remain unaffected. Soft and hard thresholdings are respectively given by { Sgn(Y )( Y T ), Y > T T HR S (Y, T ) = (6) 0, Y < T and T HR H (Y, T ) = { Y, Y > T 0, Y < T, where Y and T are the noisy wavelet coefficient and the threshold proposed by Donoho. Clearly, soft thresholding attenuates the entire signal. However, the signal resulting from soft-thresholding has no large discontinuities. Hard thresholding, on the other hand, does not significantly affect the energy of the signal. Though, there may be large jumps and discontinuities in the signal. Some researchers have tried to design the threshold algorithm to improve the quality of processed speech signal. Breiman [8] applied non-negative garrote shrinkage to wavelet based denoising technique to remedy the drawback of the hard and the soft thresholding. The non-garrote shrinkage function is defined as follows (7) T HR G (Y, T ) = { Y T 2 /Y, Y > T 0, Y < T. (8) -4151-

(a) (b) (c) (d) Figure 1: The various threshold algorithms : (a) Hard, (b) Soft, (c) Garrote and (d) Firm A garrote shrinkage function is shown in Figure 1 (c). The shrinkage function is continuous and approaches the identity line as Y gets large. Another interesting threshold introduced by Gao and Bruce [12] is the firm threshold which is a modification of the garrote threshold. This threshold requires two threshold values. Plot of various thresholds are shown in Figure 1. We have found that that the garrote shrinkage performed best among the four thresholds mentioned above. 3. Voiced and Unvoiced Speech Detection Using Wavelet Various techniques for detecting voiced/unvoiced speech regions have been proposed; however, their performances are dramatically degraded in noise. Johnson proposed an algorithm using wavelet transform to classify the speech into voiced, unvoiced and mixed frames. The algorithm computes the discrete wavelet transform(dwt) and computes the level 1 energy. Let s denote approximation and detail as low pass filter output and high pass filter output respectively. If the percentage of energy concentrated in level 1 approximation is less than 40%, a frame is classified as unvoiced. If the percent- -4152-

age of the energy in level 1 approximation is between 40% and 90%, the frame is mixed voiced/unvoiced segment. Above 90%, the segment is regarded as voiced. The algorithm using DWT for classifying speech into voiced, unvoiced and mixed frames performs well only in clean environment and fails to perform well in noisy environment. Voiced region Figure 2: The classification of speech frame degraded by additive white Gaussian noise with SNR ranging from 0 to 20 db into either voiced or unvoiced regions. A simple modification to the above algorithm, proposed by Johnson [7] can achieve adequate performance even in noisy environment. Noise energy is estimated from regions where only noise is present. These regions can be detected using a voice activity detector (VAD). Using the noise energy estimate, a region can be classified into voiced or unvoiced by calculating the measure { ELS ELN voiced, > 0.9 ES EN = (9) unvoiced, otherwise where the ELS, ELN, ES and EN are approximation energy of noisy speech, approximation energy of estimated noise, energy of noisy speech and energy of estimated noise in level 1 respectively. Figure 2 shows classification of noisy speech into voiced and unvoiced regions using the above measure. Noisy speech is obtained by degrading Female speech sampled at 8kHz with various additive white Gaussian noise so that the SNR ranges from 0 to 20dB. We classify each frame into either voiced or unvoiced. Here mixed frames are classified as unvoiced. If an analysis frame is classified as unvoiced, the threshold value obtained by Equation 3 is multiplied by constant α (> 1) and for a frame classified as voiced, constant β (< 1) is multiplied. -4153-

β α Figure 3: The block diagram of the proposed speech enhancement algorithm. 4. Experiments We tested this method on the AURORA 2 database, which consists of noisy English connected digits. The speech signals are sampled at 8kHz. The voiced/unvoiced decision procedure was applied to each analysis frames of 120msec and 50% overlap. The DWT decomposed each frame into 8 bands. We modified the threshold value based on the classification of analysis frames. If the analysis frame is classified as unvoiced frame, the threshold value is multiplied by constant α = 2. If the voiced, the threshold value is multiplied by constant β = 0.5. To remedy the drawback of hard and soft thresholding, garrote shrinkage is utilized as a threshold. Figure (3) illustrates the procedure of proposed method. Utterance nine three five oh three two four by female speaker is degraded by 5dB white Gaussian noise. The noise utterance is enhanced both by WT using single threshold and by WT using two thresholds. Figure 4 (a), (b), (c) and (d) show respectively noisy, clean, enhanced by WT using single threshold and enhanced by WT using two thresholds. Table 1 and 2 show that the proposed method is well suited in removing noise and performs better than spectral subtraction(ss) and WT with a single threshold value. Table 1: SNR tests for white noise corrupted speech Unprocessed(dB) SS(dB) WT(dB) Proposed(dB) 0 4.67 6.02 6.82 5 8.58 9.78 10.73 10 13.27 13.87 14.69 15 17.84 17.01 17.77 20 22.67 20.77 20.59-4154-

1 nine three five oh three two four (a) 0 1 0 0.5 1 1.5 2 2.5 1 x 10 4 (b) 0 1 0 0.5 1 1.5 2 2.5 1 x 10 4 (c) 0 1 0 0.5 1 1.5 2 2.5 1 x 10 4 (d) 0 1 0 0.5 1 1.5 2 2.5 x 10 4 Figure 4: a) speech degraded by 5dB white Gaussian noise, b)clean speech, c)speech enhancement based on WT using single threshold value, d)speech enhancement based on WT using two threshold values. Table 2: SNR tests for subway noise corrupted speech Unprocessed(dB) SS(dB) WT(dB) Proposed(dB) 0 0.34 2.36 2.59 5 6.09 7.39 7.69 10 11.78 11.78 12.23 15 14.52 16.07 16.95 20 19.08 19.68 20.27 5. Conclusion In this paper the problem of a wavelet-based speech enhancement was addressed. Although each analysis frame has different SNR, the traditional WT use a single threshold value. In the proposed method, the two different threshold values and garrote shrinkage are used to improve the performance of traditional WT speech enhancement methods. Each frame is classified by computing the ratio of approximation energy to the total energy in level 1. To reliably classify noisy speech into voiced and unvoiced regions, speech classifying method proposed by Johnson [7] is modified. In unvoiced speech region, the threshold value that is obtain by Equation (3) is multiplied by a constant that is larger than one and for voiced speech region the threshold value is multiplied by a constant that is smaller than one. The experimental results show that the proposed algorithm outperforms traditional WT and SS methods. -4155-

REFERENCES [1] S.F. Boll, Suppression of acoustic noise in speech using spectral subtraction, IEEE Trans., ASSP 27(2), pp.113-120, 1978. [2] R. Martin, Spectral Subtraction base on Minimum Statistics, Proc. Seventh European Signal Processing Conference, pp.1182-1185, 1994 [3] W. Jiang and H.S. Malvar Adaptive Noise Reduction of Speech Signals, Technical Report MSR-TR-2000-86, July, 1994 [4] D.L. Donoho, De-noising by Soft-Thresholding, IEEE Trans. Inform. Theory, vol. 41, no. 3, pp. 613 627, May. 1995. [5] S. Mallat, A Wavelet Tour of Signal Processing, (Academic Press, 1998) [6] D. L. Dohono and I. M. Johnston, Ideal Spatial Adatation via Wavelet Shrinkage, Biometrika, vol. 81, pp. 425 455, 1994. [7] J. I. Johnson, Discrete Wavelet Transform Techniques in Speech Processing, IEEE TENCON, pp. 514 519, 1996. [8] L. Breiman, Better Subset Regression using the Non-negative Garrote, Technometrics, vol. 37, pp. 327 384, 1995. [9] D. L. Donoho and I. M. Johnston, Adapting to unknown Smoothnes via Wavelet Shrinkage, J. Amer. Stat. Assoc., pp. 1200 1224, Dec. 1995. [10] E. Ambikairajah, G. Tattersall and A. Davis, Wavelet Transform-based Speech Enhancement, Proc. on ICSLP, vol. 3, 1998. [11] I. Daubechies, Ten Lectures on Wavelets, (SIAM, New York, 1992) [12] H.Y. Gao and A. G. Bruce, WaveShrink with Firm Shrinkage, Statistica Sinica, vol. 7, pp.855-874, 1007-4156-