An auditory localization model based on. high frequency spectral cues. Dibyendu Nandy and Jezekiel Ben-Arie

Size: px
Start display at page:

Download "An auditory localization model based on. high frequency spectral cues. Dibyendu Nandy and Jezekiel Ben-Arie"

Transcription

1 An auditory localization model based on high frequency spectral cues Dibyendu Nandy and Jezekiel Ben-Arie Department of Electrical Engineering and Computer Science The University of Illinois at Chicago Running Head: Localization using spectral cues MS: 943 Revision: 3 Contact Address: Dr. Jezekiel Ben-Arie The University of Illinois at Chicago Department of Electrical Engineering and Computer Science (M/C 54) 85 South Morgan Street Chicago, IL Phone: (32) Fax: (32) benarie@eecs.uic.edu

2 Abstract We present in this paper a connectionist model that extracts interaural intensity dierences (IID) from head related transfer functions (HRTF) in the form of spectral cues to localize broadband high frequency auditory stimuli, both in azimuth and elevation. A novel discriminative matching measure (DMM) is dened and optimized to characterize matching this IID spectrum. The optimal DMM approach as well as a novel back-propagation based fuzzy model of localization are shown to be capable of localizing sources in azimuth, using only spectral IID cues. The fuzzy neural network model is extended to include localization in elevation. The use of training data with additive noise provides robustness to input errors. Outputs are modeled as two dimensional Gaussians which act as membership functions for the fuzzy sets of sound locations. Error back-propagation is used to train the network to correlate input patterns and the desired output patterns. The fuzzy outputs are used to estimate the location of the source by detecting Gaussians using the max-energy paradigm. The proposed model shows that HRTF based spectral IID patterns can provide sucient information for extracting localization cues using a connectionist paradigm. Successful recognition in the presence of additive noise in the inputs indicate that the computational framework of this model is robust to errors made in estimating the IID patterns. The localization errors for such noisy patterns at various elevations and azimuths are compared and found to be within limits of localization blurs observed in humans. INDEX TERMS: auditory localization, pattern recognition, discriminative matching measure (DMM), fuzzy neural networks, back-propagation

3 MS: 943 Revision: 3 Introduction The physics of sound propagation and the anatomy of the head produce interaural time dierence (ITD) cues and interaural intensity dierence (IID) cues at the two ears. The auditory system has been shown to be sensitive to the presence of these cues. It is well known that these ITDs and IIDs are the two primary cues which help in localization [3]. Most models of localization have been based on the processing of ITD cues, by extracting a measure of similarity between the inputs at the two ears, with minor signicance attached to the variation of the IID with frequency. These models include the approaches based on signal correlation [3, ] and its extensions. Other approaches include auditory nerve based models, Durlach's equalization and cancelation model, and count comparison models (suggested by von Bekesy) which are elaborated in [5]. The ITD cues have been shown to be of major importance in the localization process [3, 5]. However the complete localization phenomenon cannot be explained solely by ITDs. Sound sources located in the medial saggital plane relative to the head or on the cone of confusion, cannot be localized using only ITDs. In addition, gradual loss of synchrony of auditory nerve bers for signal frequencies above about.3 khz [26], together with a declining ability to detect envelope interaural phase dierences (EIPDs) for modulation frequencies above 5 Hz and carrier frequencies above 4kHz [4] suggest that ITD cues cannot be reliably estimated for high frequency sound sources. Furthermore, the ability of humans to adapt to conditions of localizing monaurally [] and the strong inuence of spectral content of narrowband noise sources on localization [5] implies that the localization mechanism must also use spectral information. It is well known that primarily the ITD cues and to some extent the head shadow eect on IID cues dominate azimuthal localization at low frequencies. Consequently, it has been suggested that spectral cues are extracted monaurally for localization only in elevation [4]. This is an over simplication of a complex localization task. It is now accepted that the pinna, head and torso provide important localization cues by spectrally modifying the sound

4 MS: 943 Revision: 3 2 signal received at the tympanic membrane according to the direction of origin [4, 2, 23]. Psychoacoustic experiments have also indicated that high frequency spectral cues are essential for a genuine spatial sensation of sound, which includes the outside-the-head sensation or externalization [8]. The ability to localize high frequency broadband sound as well as externalization is most eective for binaural audition and becomes less so for monaural audition. This indicates that there exist means to process binaural spectral cues for localization in human auditory system. The spectral modications imposed on the free-eld source emanating from a particular direction can be estimated by identifying the transfer function of the contributing eects of the pinnae, head and torso [2, 23]. This transfer function has been called the head related transfer function (HRTF). The HRTF is able to encode directional spectral cues, which can be detected by the human localization system [22, 23]. The cochlea acts as a set of band pass lters and channels incoming stimuli to excite auditory nerve bers which have bandlimited transfer characteristics [25, pp.65,86]. This has been modeled by Lyon [2], Yang et al. [24] and others. The nerve bers of the the inner hair cells pick up the displacement of the basilar membrane. These bers have been shown to have discharge rates which are proportional to the stimulus intensity (in db) over a sizeable range [2] [25, pp.84-86]. These nerve bers are tonotopically arranged [7] [25, pp.99-2], i.e they are spatially arranged in the order of characteristic frequencies (CFs) to which they are tuned to respond maximally. This indicates that frequency dependent intensities of stimuli are encoded in the nerve discharge rates of the peripheral auditory system, and are thus available for localization. Neti et al. [7] demonstrated the use of neural network models using spectral IID cues in both monaural and binaural form. However their approach of using HRTF amplitude spectra assumed a source with a at spectrum. Localization when the sound spectrum was not at was not explained. Reported results [7] indicated that azimuthal and elevation coding in the HRTF could not be independent. A random training set of HRTFs was used to try and show that it is possible to generalize the localization phenomenon by using the smoothness

5 MS: 943 Revision: 3 3 property of the HRTFs as they vary in elevation and azimuth. Neti et al. were able to show that response maps of hidden and output layer units were similar to observed responses of neurons in the superior colliculus (SC). However, actual results of individual localization experiments were not reported. The mean errors in localizing using the test set were shown to be rather large. In this paper, we present a connectionist approach that can successfully estimate the direction of a source from spectral IID cues. We do not claim this model to be physiologically exact, but rather that it can represent a systematic approach to localization using spectral IID cues. It has long been known that human perception of intensity is approximately logarithmic [25, pp.63-64] and therefore perceived loudness is measured in decibels (db). This perceptual relation has been backed by physiological observations. Measurements made in the auditory (VIII th) nerve have shown that discharge rates in auditory nerve bers are proportional to input stimulus intensity levels [2] [25, pp.84-86]. Sachs et al. [2] showed that the discharge rates of nerve bers for single tone stimuli are approximately proportional to the sound intensity (in db). This relationship was shown to hold over a sizeable range, from the threshold levels of the individual neurons to saturation levels at least 3 db above these threshold levels. Sound intensities were measured on a decibel scale. Thus ring rates of the auditory nerve bers were shown to be proportional to the logarithm of the actual sound stimulus (pressure) level. The dierence of intensities in the stimulus from each ear results in an IID characterized by frequency. This hypothesis is motivated by the tonotopic arrangement of nerve bers in the auditory nervous system [7] [25, pp.99-2] and the response of the cells in the lateral superior olive (LSO) [4, 7]. The majority of the LSO units are most sensitive to mid-to-high frequency stimuli, having characteristic frequencies (CFs) greater than about 2 khz. They are excited by stimulus at the ipsilateral ear and inhibited by stimulus at the contralateral ear [4, 7]. This response characteristic is termed E/I, and can be modeled as the dierence in the IID at a given frequency. As shown in Section 2, such a model eectively removes the dependence on the source spectrum while retaining spectral

6 MS: 943 Revision: 3 4 features due to pinnae ltering. The result is a source invariant internal representation of IIDs as a function of frequency and location. In Section 3, the process by which the internal IID spectral patterns are computed from the HRTF data is discussed. We show in Section 5, that by using a training set generated from HRTFs uniformly distributed over the auditory space, it is possible to achieve adequate generalization and thus accurately localize sources from HRTF based IID data not used in training. An approach based on such IIDs to estimate localization was simultaneously investigated by Duda [6]. Duda used information theoretic concepts based on the maximum likelihood estimation procedure. An implementation of that scheme by means of a neural network is not obvious. We use optimization techniques to shape the localization response to be unimodal and sharp even in the presence of noise. The approach is simple and is implemented using connectionist models with typical correlative (multiply with weights and add) neuron models. We use several algorithms for calculating suitable weights (correlation, optimizing a novel discriminative matching measure (DMM), error back-propagation) for networks with single/multiple layers with and without output nonlinearities (Section 4). We show that azimuthal localization is also possible using high frequency IID data. In later sections, localization in both azimuth and elevation is demonstrated using the back-propagation based fuzzy neural network model. The results of the simulations are presented in Section 6 and are compared with the observations of Makous et al. [3] on their experiments with human localization. Section 7 concludes this manuscript with a discussion of our experiments, results and the inferences that can be drawn from the model presented. 2 A Localization Model based on HRTFs The anatomical structure of the pinna head and torso aect incident acoustic signals by modifying their phase and amplitude spectra. Dierences in the acoustic path lengths at all frequencies lead to ITDs between the perceived auditory signals at the two ears. Head shadow eects and multiple reections o of the pinnae lead to IIDs and interaural phase

7 MS: 943 Revision: 3 5 dierences (IPD) which vary as a function of frequency. The eects of such head related ltering are captured by modeling the transfer characteristics of the incoming signal path as a linear lter from the source to the ear canal. The lter thus obtained is a direction dependent linear lter called the head related transfer function (HRTF). The HRTF model implicitly includes IIDs in its amplitude spectrum. As outlined by Wightman et al. [23], Wenzel [22] and others, the HRTF depends upon the directional parameters of the azimuth and elevation of the sound source relative to the head location. In the following discussion, localization of a broadband signal source in anechoic environments is considered. It is assumed that the source is stationary relative to the head. The stimulus is assumed to be a signal having non-zero components over a broad band of frequencies! 2 [2kHz; 2kHz], located at an azimuth and elevation and a constant distance r from the center of the head. With an assumed constant distance r, the elevation and azimuth are akin to latitudinal and longitudinal mapping of a spherical auditory space as described by Wightman and Kistler [23]. The stimulus spectrum at the left tympanic membrane can be expressed as X L (!; ; ; t) = H L (!; ; )X(!; t) ; () where X(!; t) is the short time Fourier transform of the free-eld sound source, from the direction ( ; ), and H L (!; ; ) is the HRTF of the left ear associated with the direction ( ; ). The spectrum at the right tympanic membrane has a similar form using the corresponding HRTF H R (!; ; ) for the right ear. The vibrations of the tympanic membrane are transduced through the middle ear bones to the cochlea. The stimulus energy is transfered to the basilar membrane, which carries the stimulus as traveling waves. The basilar membrane can be modeled as a set of bandpass lters [2, 24] [25, pp.6-66]. The center frequencies of these bandpass lters are approximately logarithmically spaced on the basilar membrane. The logarithmic spacing of the basilar membrane lters is modeled by a space-frequency variable s corresponding to the stimulus frequency! s. Thus if the response of the basilar membrane is

8 MS: 943 Revision: 3 6 sampled at n frequencies spaced logarithmically between! and! n? and indexed by s,! s =! e s n? ln[! n?! ] ; s = ; ; ; n? ;! = 2kHz ;! n? = 2kHz : (2) The mechanical motion of the basilar membrane at a particular location is picked up by the inner hair cells (IHC) of the organ of Corti. Auditory nerve bers which end in the IHC re in proportion to the neurotransmitter release caused by the depolarization of the IHC. These bers have been shown to have ring rates which are proportional to the intensity of the stimulus (in db) [2][25, pp.84-86]. The proportional range of response varies from the threshold levels of the individual neurons to their saturation levels, which are on an average about 3 db above the threshold levels. The intensity due to a bandpass (basilar membrane) lter H BM (s;!) with characteristic frequency! s, can be given by Z I L (s; ; ; t) = log j X L (!; ; ; t)h BM (s;!) j 2 d! : (3) When the lters H BM are suciently narrow, we can assume that H BM (s;!) (!?! s ) and the spectrum of the intensity level carried by these bers is given by I L (s; ; ; t) log j H L (! s ; ; )X(! s ; t) j 2 : (4) The modeled intensity spectrum given by Eq. (4) forms the auditory representation in the auditory nerve bers which synapse with the left cochlear nucleus. A more sophisticated treatment as done by Yang et al. [24] also gives rise to a similar form of representation in the auditory nerve, which they call the auditory spectrum. As the frequency content of the stimulus changes over time, so will the actual discharge pattern of the nerve bers. Let us consider a specic time frame in the localization process. Some of the nerve bers from the ipsilateral and contralateral cochlear nuclei synapse at the superior olivary complex (while others may pass without synapse to the inferior colliculus). The majority of the units in the lateral superior olive (LSO) have been observed to exhibit a E/I response [4, 7]. The E/I response can be modeled as a binaural dierence operator which evaluates the dierence between ipsilateral and contralateral stimuli at the input of the

9 MS: 943 Revision: 3 7 LSO units. From Eq. (4) corresponding to the ipsilateral cochlea and its equivalent for the contralateral cochlea, the output of an LSO unit can be modeled as being dependent on the IID at its characteristic frequency. Here, we consider! 2 [2kHz; 2kHz], the approximate range of the frequency response of the LSO units. We drop the variable t as we model a specic time frame during the localization process. Thus, h(s; ; ) = I I (s; ; )? I C (s; ; ) (5) where I I I L and I C I R. h(s; ; ) is the dierence in the intensities of the ipsilateral and contralateral stimuli at the frequency! s (Eq. (2)). h(s; ; ) is an internal representation of the IID, processed through the cochlea, the auditory nerve, the cochlear nucleus and the LSO at that frequency. From Eq. (5), Eq. (4) and Eq. () we get h(s; ; ) = log " # 2 j HI (! s ; ; )X(! s ) j = 2 log j H I(! s ; ; ) j j H C (! s ; ; )X(! s ) j j H C (! s ; ; ) j : (6) The internal IID h(s; ; ), over the range of the space-frequency index s form the vector h( ; ) = [h(; ; ); h(; ; ); ; h(s; ; ) ; h(n? ; ; )] T ; (7) which we shall refer to as the internal IID spectrum. In the process outlined above we have assumed the bandpass ltering eects of the peripheral auditory system to have very narrow pass bands. We have also ignored saturation eects of the nerve discharge rates. Thus the internal IID spectrum represents idealizations of the modeled nerve responses. To compensate partially for this idealization, we model variations from the input stimulus as estimation errors. The estimate of the internal IID spectrum may be expected to have variations with changes in the input stimulus or in the ring rates of the nerve bers or due to other factors. In Appendix A, we show that small additive distortions in the intensity encoding lead to corresponding additive distortions in the internal IID (Eq. (8)). Thus the estimated internal IID spectrum may be modeled as h e ( ; ) = h( ; ) + (8)

10 MS: 943 Revision: 3 8 It has been observed that the HRTF magnitude spectrum varies in a relatively smooth manner in both azimuth and elevation [22, 6, 6]. The smoothly varying nature of the HRTFs constrains the internal IID spectrum to be a continuous smooth function of azimuth and elevation for a particular frequency. We assume that the internal IID spectrum for dierent azimuths and elevations has the information to dene the auditory space for a human. Our model is based on this hypothesis. Thus the auditory system is modeled as being able to extract an internal representation of the direction-dependent IID spectrum in real time from incoming audio signal spectra. The internal IID spectrum is invariant to the source signal spectrum, but depends on the HRTF encoded directionally varying spectral cues. If x(t) is a broadband signal with nonzero frequency components, then the above formulation applies over the entire frequency band under consideration, and spectral pattern recognition (and thus localization) is possible, independent of the free-eld signal x(t). Auditory space map Elevation Azimuth Hidden layer neurons IID responsive neural units Localization network Contralateral units sensitive to intensity Ipsilateral units sensitive to intensity Figure : The proposed binaural localization model using internal representation of spectral IID cues encoded in the head related transfer function The internal IID spectrum h e ( ; ) has unique characteristics for every direction ( ; ) [6,

11 MS: 943 Revision: 3 9 6]. A feed forward network which projects such a HRTF dependent internal IID spectrum onto a set of connection weights can extract the direction of a sound source. These weights correspond to a set of localization lters. Such localization lters may be computed using several optimization techniques discussed below. We hypothesize that in the biological localization systems in humans and other animals, connection strengths (weights) may be adaptively formed by a combination of reinforcement and supervised training induced by auditory and visual cues as well as other sensory data. Such feedback learning would be optimal in the context of minimizing localization error. Thus our optimization techniques are also geared towards characterizing and minimizing the localization error. Figure shows a diagram of an articial neural network model with one hidden layer. The output response of such a network is modeled as activity in the units corresponding to the estimated azimuth and elevation. This output array thus forms a map which models the auditory space. 3 Computing the internal IID spectrum The internal IID spectral patterns are derived from the HRTF data set (SDO MP44.DAT) of a good human localizer used in the Convolvotron spatial audio system [23, 22]. The SDO data provide HRTFs for 2 azimuthal directions 3 apart from?8 to +5, at each of 6 elevations 8 apart, from?36 to 54, as nite impulse responses with 28 coecients each. Fig. 2 corresponding to the left LSO nucleus shows the internal IID spectral data for the elevations of,?8 and?36 and indicates how the it varies in azimuth and elevation. The SDO data are resampled (at 44. khz) minimum phase approximations of nite impulse response data originally measured at 5 khz sampling rate by Wightman and Kistler [23]. The fast Fourier transform (FFT) is used to determine the amplitude spectrum of these HRTFs. The amplitude spectra are then interpolated for each frequency in 36 of azimuth and 9 of elevation with a resolution of, using cubic splines. In azimuth, the splines are constrained to be continuous over the wraparound region from +5 and?8. This results in a HRTF map for either ear which varies smoothly in azimuth and elevation for each

12 MS: 943 Revision: 3 5 IID corresponding to left LSO (db) EL: deg EL: -8 deg EL: -36 deg -2 5 Frequency (khz) 5 2 left - Azimuth (deg) right Figure 2: The internal IID spectrum corresponding to the left LSO units from the HRTF data set SDO MP44.DAT for elevations 2 (?36 ;?8 ; ). sampled frequency. The amplitude spectra of the HRTFs are warped to a logarithmic scale (Eq. (2)) between 2 khz and 2 khz, the range of response of the LSO units. The warped frequency response is resampled uniformly at 28 points. Linear interpolation is used to determine the amplitude value for intermediate frequencies, at which the magnitude is not known. The logarithmic warping models the cochlear response along the basilar membrane. The internal IID spectral data vectors are computed from the resampled HRTF data using Eq. (6). 4 Localization in Azimuth In previous work [6], we have analyzed the discriminative capabilities of internal IID spectrum in providing localization cues in azimuth. Three dierent pattern matching techniques, including normalized correlation matching, fuzzy back-propagation matching, and an approach based on optimizing a novel discriminative matching measure (DMM) were investig-

13 MS: 943 Revision: 3 ated for accuracy and discriminative ability in localization. As discussed in Section 2, the internal IID spectrum of an incoming signal is projected onto a library of template patterns, each representing a particular direction. The estimate of this internal IID spectrum is assumed to be distorted by additive noise. Details of this distortion model are given in Appendix A. For simplicity in analysis, we assume this additive noise term to be zero mean Gaussian, independent and identically distributed. Normalized correlation involves projecting the incoming IID spectral vector onto the set of library IID spectral vectors. It is well known that matching using the correlation approach optimizes the signal to noise ratio (SNR). Detailed treatment of the correlation approach and its relation to Duda's maximum likelihood approach [6] are discussed in [6]. The fuzzy model using back-propagation training approach is detailed in Section 5 for localization in both azimuth and elevation. A brief summary of the approach optimizing the novel discriminative matching measure (DMM) is provided here. Given an M-dimensional IID spectral vector h e, and a set of M-dimensional template vectors i ; i 2 f; N h g associated with azimuthal directions i 2 [?8 ; +8 ), it is to be determined which of the N h templates responds best to h e. The eectiveness of the match can be quantied by the DMM. Let the response scores of the vector h e with the i th template i be given by c(i) = T i h e ; i 2 f N h g. i is the template associated with azimuth i that is to be correlated with h e. Let the vector h e evoke the highest response from the l th template l. The discriminative matching measure (DMM) is dened as the ratio: DMM = N X h N h? i= ( T l h e) 2 j l? i j 2 mod 8 (T i h e) 2 ; (9) where i i 2 f; Ng is the azimuthal angle of the i th template. j j mod 8 is the modulo operator. This function ensures that the maximum measured azimuthal dierence is less than or equal to 8. The DMM penalizes the matching score in proportion to the squared dierence of the estimated direction from the true direction. The smooth nature of the IID spectrum as a function of location is exploited by introducing a dependence of the angular

14 MS: 943 Revision: 3 2 distance which allows the response to fall gradually away from the peak..8 Normalized Correlation DMM:. Response Azimuth: -99 degrees Optimal DMM DMM: 3.54 Fuzzy Neural-Net DMM: Azimuth (degrees) Figure 3: Typical responses for localizing in azimuth using the optimal DMM approach, fuzzy back-propagation and normalized correlation. Shown are the responses for =?99. Note the sharply peaked response of the optimal DMM approach and the fuzzy neural network as compared to the broad peak of the normalized correlation method. The optimal lters i which yield maximal possible DMM when correlated with h e, are formulated in Appendix B. In this formulation, additive noise and expectations of random variables are used to dene the DMM in a slightly dierent manner. The optimal DMM lter is given by l = u l h T l h N h? h N h? P Nh i= j i? l j 2 mod 8 [R h i + R ] i? hl P Nh i= j i? l j 2 mod 8 [R h i + R ] i? hl ; () where u l is a user determined constraint, usually set to, R hi = h i h T i, the sample autocorrelation matrix of the IID spectral data and R = E[ T ], the noise autocorrelation matrix. In the limiting case, of increasing additive white noise, it is easy to show that the optimal DMM lter converges to the original template h l normalized to unit amplitude, thus indicating that the DMM method performs as well or better than correlation matching in terms of DMM. In implementing the optimal DMM approach, an incoming IID spectral vector is correl-

15 MS: 943 Revision: 3 3 Optimal DMM 2 Normalized Correlation DMM - Fuzzy Neural-Net RMS Error (degrees) Optimal DMM Normalized Correlation Fuzzy Neural-Net Input HRTF Template SNR (db) Input HRTF Template SNR (db) Figure 4: Average DMM obtained by matching using optimal DMM ltering, fuzzy neural networks and normalized correlation. As is evident, the optimal DMM method has the best DMM scores followed by the fuzzy neural network. Figure 5: RMS errors of optimal DMM lter matching, the fuzzy neural network and normalized correlation matching. Here the fuzzy network is seen to perform best in localizing with minimum RMS error 2 followed by the optimal DMM approach. ated with optimal DMM lters for each available azimuth. The response is a vector with components corresponding to all these azimuths. Fig. 3 shows a result in localizing a sound source located at an azimuth of = 99. Three methods are compared in this gure, namely, normalized correlation, fuzzy back-propagation and the optimal DMM approach. In these experiments, the response is estimated for azimuths apart. Note that the optimal DMM approach and the back-propagation based fuzzy neural network have sharply peaked responses for the correct azimuth. The normalized correlation approach has a broad peak, which is less discriminative. The estimate of the azimuth is less robust to variations for such broadly peaked response than the narrower peaks of the optimal DMM and the fuzzy neural network approach. Fig. 4 and Fig. 5 show the average DMM scores and the RMS errors in localization using these three approaches over a large range of additive noise levels. The DMM approach and the fuzzy neural network are seen to have similar performances, both in DMM and in localization accuracy. As expected, the DMM approach has a better average DMM score than the fuzzy neural network, but on the other hand has slightly higher RMS errors. The correlation approach is inferior to both of these methods in terms of DMM and RMS error. The optimal DMM approach is theoretically capable of being extended to localization in

16 MS: 943 Revision: 3 4 both azimuth and elevation. In order to formulate this, it is required to simply consider the absolute angular distance as dened in Eq. (2), rather than just the azimuthal angular distance i, with the corresponding IID spectral pattern for the relevant direction. In the experiments described above, a resolution of in azimuth was used. This is smaller than the localization acuity of humans [3], and thus is convenient to evaluate the performance bounds of the model. For localization in both azimuth and elevation, it would be necessary to construct 36 9 optimal lters for the desired resolution of. In our experiments, the computation of each lter takes approximately 2 minutes a 486 DX2/66. Thus it would take more that one year to compute all the 36 9 lters with the resources available to us. The numerical complexity precludes such an implementation. It must be noted that the simulation of the localization process after the lter computations is primarily a set of parallel vector correlations and is quite fast, even on serial architectures. It can thus be expected that the performance of such a model will be similar to the results obtained for localization in azimuth only and thus be comparable to human localization acuity. 5 Localization Using Fuzzy Neural Networks As discussed above, the nal stage in the localization process, after the extraction of the IID spectrum, involves mapping this IID pattern vector to a location in the modeled auditory space of the listener. The assumption is that the modeled IID responses are mapped from a tonotopic ordering to a location oriented response at some stage in the auditory nervous system. This mapping is done by a neural network (shown in Fig. ) based on principles of fuzzy logic []. The model is a feed forward fully connected network with one hidden layer. Each of the processing units in the network correlate the input vector with its associated weight vector. The logistic function is used as the output nonlinearity for all processing nodes. The error back-propagation algorithm is used to train the network in a supervised mode. The back-propagation algorithm has been extensively used. It is capable of extracting general characteristics of pattern classes quite readily. The training of the neural network

17 MS: 943 Revision: 3 5 consists of presenting a set of training patterns or exemplars to the network and demanding a corresponding set of desired outputs from the network. The network denes a nonlinear mapping from the IID spectral pattern space to the modeled auditory space. The mapping is a function of the number of layers, the weight matrix associated with each layer and the nonlinearity parameters of each neuron. The learning algorithm modies the weights of the network in accordance with the error between the actual and desired outputs. The topic of back-propagation is well developed [9] and we do not present any details here other than noting that the algorithm provides a method to do a gradient descent on the mean squared error surface in the weight space of the neural network. Neti et al. [7] used a fault tolerant learning method to model the behavior of biological neural networks where it is observed that cognitive functions are left relatively unimpaired despite damage to individual processing units. Neural units carried evenly distributed information and were modularly fault tolerant. However the manner in which the training data were selected did not allow for good generalization [7]. In the neural network we describe here, the problem of poor generalization has been overcome by using fuzzy variables and IID data uniformly distributed over the auditory space the model is expected to analyze. The use of fuzzy variables in the input allows the network to be robust to distortions in the input IID spectrum. The fuzzy model allows a set of outputs dened by the fuzzy membership function, to have high activity values, i.e. the estimated location can have varying memberships in a range of locations. We assume that this models the actual perception of sound source location by humans. Humans are generally able to localize sounds to a small region of uncertainty. This region is akin to the localization blur or the minimum audible angle (MAA) measured by various experimenters [3, 4]. The use of fuzzy variables also reduce the burden on the training algorithm and allow faster convergence. This is because the input IID vectors corresponding to nearby locations are highly correlated. Requiring an orthogonal set of outputs, would require a more complex mapping than requiring a set of fuzzy outputs as described below. Fuzzy outputs also allow for a far greater resolution than is possible

18 MS: 943 Revision: 3 6 otherwise without much greater computational burdens. 5. Fuzzy Model of the Input Stimulus The input to the neural network is given by the internal IID spectral pattern (Eq. (6) and Eq. (7)). It is desired that the network be robust to variances induced in the internal IID spectrum by estimation errors at various stages of auditory processing. Such errors are modeled as additive white noise (s) as explained in Appendix A. is assumed to be a zero mean independent and identically distributed (i.i.d.) random vector constructed from the estimation errors at each frequency. In training the network, each target direction is associated with several input templates, each one corresponding to the internal IID spectral pattern for that direction with dierent levels of additive noise. 2 2 left ear left ear HRTF magnitude (db) EL: 23 deg right ear Cochlear response magnitude (db) EL: 23 deg right ear AZ: -55 deg AZ: -55 deg Frequency (khz) -2 Frequency (khz) Figure 6: The magnitude of the HRTF for the left and right ear for the direction and = +23, shown in db. =?55 Figure 7: The HRTF magnitude spectra warped to a logarithmic scale between 2 khz and 2 khz. The input intensity at a particular frequency for a given direction is converted to a fuzzy variable [], allowing the network to learn expected variances of the input amplitude at that frequency. The h(s; ; ) value is scaled to lie in the range [ -, ]. Thus if h max and h min are the maximum and minimum gain values of h(s; ; ) over all input patterns from all directions, the input to the network is modeled as follows: ~h e (s; ; ) = 2( h(s; ; ) + (s) )? h max? h min h max? h min () The number of input nodes is xed by the choice of the input vector length, in this case 28

19 MS: 943 Revision: EL: 23 deg 35 EL: 23 deg.7 AZ: -55 deg Response of left LSO neurons (db) AZ: -55 deg Normalized LSO rate input to network SNR: 5 db 5. Frequency (khz) Frequency (khz) Figure 8: The internal IID pattern for the lateral superior olive units corresponding to direction =?55 and = +23. Figure 9: The internal IID pattern for the lateral superior olive units with additive noise to model estimation errors and after scaling. (SNR = 5 db) points, spaced logarithmically between 2 khz and 2 khz and indexed by s (Eq. (2)). Fig. 6 shows the original magnitude spectra of the HRTFs for the direction ( =?55 ; = +23 ), Fig. 7 the intensity spectrum as modeled in the cochlear response, Fig. 8 the modeled IID spectrum corresponding to the response of the LSO units and Fig. 9 the result of adding noise and scaling it to lie between h max and h min respectively. 5.2 Fuzzy Model of the Output Response The output response is modeled as a map of the auditory space in terms of elevation and azimuth. The output layer is a 2 array of neurons (Fig. ), with the azimuth ranging from?8 to 62 spaced 8 apart along the 2 neuron side and the elevation ranging from?72 to +9, also spaced 8 apart, along the neuron side. The output activity is modeled as a Gaussian with the mean at the location of the input stimulus being localized, and sampled at the locations corresponding to each neuron in the output array. This enables coding of locations which which do not correspond to the locations of neurons in the output array. Each Gaussian is plotted to be circularly symmetric in terms of the absolute angular dierence between the mean location of the Gaussian and the location of each neuron in the output array by using the direction cosines of these two directions. We note that at increasing elevations farther from the horizontal plane at, the azimuthal locations are sampled more

20 MS: 943 Revision: 3 8 densely. Hence the relative contribution of the elevation and azimuthal angles to the absolute angular dierence changes. Thus the use of is particularly relevant in estimating the localized direction. Gaussian at: Gaussian at: Expected fuzzy response EL: 23 deg AZ: -55 deg Expected fuzzy response EL: 22 deg AZ: 55 deg Note wrap around in azimuth - right - right Elevation (deg) left Azimuth (deg) Elevation (deg) left Azimuth (deg) Figure : The modeled output of the network at location (?55 ; +23 ) is a Gaussian with = 25 when mapped on a spherical surface. This target provides fuzzy memberships in a set of azimuths and elevations at and close to the source direction. Figure : This gure illustrates how the desired response at the location (+55 ; +22 ) has modeled to be continuous in azimuth by introducing a wraparound. The absolute angular dierence between the locations ( i ; i ) and ( j ; j ) is given by = cos? (l i l j + m i m j + n i n j ) ; (2) where l i = cos( i ) cos( i ) ; m i = sin( i ) cos( i ) ; n i = sin( i ) (3) are the direction cosines for direction ( i ; i ). The direction cosines for ( j ; j ) are similarly dened. Thus the value of the Gaussian, with its mean at ( i ; i ), at a neuron (k; l), corresponding to the location ( j ; j ) in the output array, is given by G i (k; l) = :9 e?2 2 2? :95 ; (4) where d (= 25 ) is the standard deviation of the output Gaussian. The Gaussian in Eq. (4) is limited in amplitude to [ -.95,.95 ]. This avoid demanding saturated responses from the output nonlinearity of each on the network output units. This form of output coding results in a map of the auditory space that is continuous on the sphere and thus in elevation

21 MS: 943 Revision: 3 9 Defuzzified: Defuzzified: EL: 22 deg EL: 23 deg Localization response AZ: -59 deg Input SNR: 5 db Localization response AZ: 54 deg Input SNR: 5 db - right - right Elevation (deg) left Azimuth (deg) Elevation (deg) left Azimuth (deg) Figure 2: The response of the localization network to the input (SNR: 5 db) from direction (?55 ; +23 ). The response is defuzzied according to Eq. (5) and the estimated direction is (?59 ; +22 ), with an absolute angular error () of 3:8283. Figure 3: The response of the localization network to the input (SNR: 5 db) from direction (+23 ; +55 ). The estimated location is (+22 ; 54 ), with an absolute angular error () of :364. and azimuth. It is capable of localizing sources from all directions. An example of the desired output for the direction corresponding to (?55 ; +23 ) is shown in Fig.. Another example for direction (+55 ; +22 ) is shown in Fig.. Note that the response wraps around in azimuth, and extends from +9 to?9 in elevation to cover the whole surface of the sphere. The Gaussian response model forms a fuzzy membership function [], with the output having memberships in a multiplicity of directions given by the response of each neuron. Thus the resolution of localization is not limited by the number of output neurons. The network trained with a set of patterns, which are selected to be uniformly distributed from the set of all available directions. The network is able to generalize and interpolate to responses for intermediate directions, for which it has not been trained. Examples of responses to inputs with which the network is not trained are shown in Fig. 3 and Fig. 2. Results are discussed further in Section Error Back-propagation Training The weight assignment is done by iteratively converging to a minimum close to the global minimum, on the mean squared error surface of the network mapping function. The network

22 MS: 943 Revision: 3 2 is initialized with small random weights. The error associated with each output node is dened as the dierence between node target output and the achieved node output. Errors associated with hidden layer nodes are determined by back-propagating the error associated with each node of the next layer through the output nonlinearity. The use of the logistic function which is continuous, as the output nonlinearity enables an analytic derivation of the back-propagation algorithm as described by Rumelhart et al. [9]. The network is trained in a batch mode. The training data include IID spectral templates for 2 azimuths 6 elevations, uniformly spaced 8 in azimuth and elevation. The actual training data correspond to directions over the range of azimuths 2 [?8 ; +8 ) and elevations 2 [?36 ; 54 ]. The inclusion of training patterns corrupted by additive noise of variance 2 (additive Gaussian noise that result in 2 db SNR and 25 db SNR are used), ensures robustness and better generalization capabilities [9], by enabling the network to model the estimation errors in the IID spectral patterns. The target pattern for each direction is a Gaussian as dened in Eq. (4). The network consists of 28 inputs, a varying number of hidden units and a 2 output array. The network was trained with heuristically adjusted learning and momentum rates. The convergence criteria was that the average error per node per pattern drop to less than :2. Varying numbers of hidden layer units from 4 to 2 were tried. The best error performance was observed for a network using 5 hidden units, which converged in about 45 iterations. 5.4 Estimating the Auditory Source Location The modeled output response is a Gaussian sampled in a 2 array as given by Eq. (4). As discussed above, the network is expected to interpolate responses for directions which do not correspond to a direction represented by the output nodes. The estimated location due to an output response corresponds to the mean of the Gaussian which best ts the response in some measure. Thus, the aim is to detect the location of the Gaussian at a higher resolution than aorded by the 2 output array. Ben-Arie and Rao [2] have shown that a signal may

23 MS: 943 Revision: 3 2 be decomposed on a set of Gaussian basis functions by recursively subtracting the Gaussian which provides the maximal amount of energy in the signal. A single Gaussian of known variance, as in the present case, can be detected by the mean of a Gaussian which provides the minimum mean squared error t to the signal. Following the max-energy paradigm of Ben-Arie and Rao, it is attempted to resolve the output response at a resolution of in both azimuth and elevation. Modeled Gaussian responses G i of standard deviation d = 25 are generated for i such that i 2 [?8 ; +79 ] and i 2 [?36 ; +54 ] sampled on a 2 array as in Eq. (4). Let the output response be approximately Gaussian in nature and be given by G op of some standard deviation op d. The ordered pair (k; l) index the output layer neurons. The estimated location ( e ; e ) is given by,! min arg ( i ; i ) kg i? G op k F ; (5) where kg i? G op k F = s X k X the Frobenius norm of the error matrix. l [G i ( d ; k; l)? G op ( op ; k; l)] 2 ; (6) The absolute angular error in the estimate is absolute angular dierence, between the locations ( i ; i ) and ( e ; e ) can be computed using Eq. (2). This scheme is seen to be robust to variation in op and the output response amplitude. In previous work [6] described in Section 4, the back-propagation method was evaluated for azimuthal localization using both criteria of angular error as well as the DMM measure (Fig. 4 and Fig. 5). In order to estimate the DMM of the response, a slightly dierent scheme to defuzzify the output response was implemented and is described in [6]. In brief, an interpolated response was generated at the required level of angular resolution ( ). The azimuth was estimated as the centroid of this interpolated response. 6 Simulation Results The performance of the network is evaluated by running simulations over a large range of input SNR. Each test set for a given input SNR consists of 5 randomly selected test IID

24 MS: 943 Revision: 3 22 spectral patterns uniformly distributed over the available range of input directions, i.e. 2 [?8 ; +79 ], 2 [?36 ; +54 ]. The response of the network for each input is defuzzied as described in Section 5.4. Samples of the network response are shown in Fig. 2 and Fig. 3. Evidently, the results (shown for noisy input at 2 db SNR) are robust to additive noise. The defuzzied results have minimum mean absolute angular errors of about 3:3. The absolute angular error is calculated as the angle between the true and estimated location projected at the center of a sphere by using Eq. (2). Fig. 4 shows the distribution of the estimated elevation against actual elevation and Fig. 5 shows the distribution of the estimated azimuth against actual azimuth for inputs with 2 db SNR. From Fig. 4 and Fig. 5, it is observed 6 5 Input SNR: 2 db 2 5 Input SNR: 2 db Estimated elevation (deg) Estimated azimuth (deg) Actual elevation (deg) Actual azimuth (deg) Figure 4: The distribution of estimated elevation against actual elevation for inputs at 2 db SNR. The average elevation error is :5 with a standard deviation of 3:5 and the average absolute angular error is 4:37. Figure 5: The distribution of estimated azimuth against actual azimuth for inputs at 2 db SNR. The average azimuthal error is :68 with a standard deviation of 4:38. The average absolute angular error is 4:37. that the model performs very well in estimating both elevation and azimuth from IID spectral cues. Table summarizes the results of localization over the range of test SNR from 5 db to 4 db, in terms of the mean and standard deviations of the elevation and azimuth error and the mean absolute error. All results are given in degrees. Evidently, the errors increase with greater levels of noise. Note that the standard deviation is a more accurate indicator of performance when measuring localization acuity in azimuth or elevation alone due to the signed nature of the error. It is seen that the mean absolute error in localization vary from

25 MS: 943 Revision: 3 23 Table : Average localization errors over a range on input SNR SNR (db) Mean Elevation Error Mean Azimuth Error Mean Absolute Error about 3:3 to about 7:3 over a wide range of noise level. The error is seen to plateau for SNR's above 2 db. Makous et al. [3] used broadband at stimuli in the range from.8 khz to 6 khz estimate localization acuity in humans, the approximate range of frequencies investigated for our model. Results were reported for open-loop and closed-loop localization experiments. The open-loop trials used short burst of stimuli to remove the contributions due to orienting the head. Here, it is suitable to compare our model responses with the open-loop experiments. It was observed that localization acuity in humans was usually better in azimuth than in elevation. Makous et al. reported signed errors which were at a minimum of about :7 : for azimuths and?:3 3:5 for elevations. These minima corresponded to locations directly ahead. Maximum azimuthal errors were of the order of 8: 6:2 for sources to one side and?3: :2 directly behind. Similarly maximum elevation errors were observed to be 7: :2 for source behind and slightly above the horizontal interaural plane. The minimum elevation errors reported by Makous et al. compare well with our results. The minimum azimuthal errors are smaller than what those achieved by our model. This may seem to invalidate the approach using IID information for azimuthal localization. However it must be noted that ITD cues, specially onset ITD cues provide very informative cues for azimuthal localization. It might be expected that the use of ITD cues will improve the acuity of this model. It cannot be denied that even without ITD cues, the modeled IID spectrum

26 MS: 943 Revision: 3 24 provides a reliable source of information for azimuthal localization. From Fig. 4 and Fig. 5, it can be observed that our model has errors which are approximately uniformly distributed. The model cannot accurately predict the variation in localization acuity with the location of the source, nor can it emulate front-to-back and back-to-front confusions observed in humans. This is because these phenomenon are closely related to the manner in which ITD cues are combined with IID cues to form a listener's subjective auditory space. 7 Conclusions The localization process has been modeled using the interaural intensity dierence (IID) spectral vectors dened in Eq. (8). This IID spectrum is the dierence of intensities from the ipsilateral and contralateral stimuli. The intensity responses of units in the auditory system can be approximated as the power spectrum of the stimulus exciting each bandpass lter of the basilar membrane. The localization process is modeled as a spatial correlate which maps the IID spectrum to the subjective auditory space. This model attempts to explain localization in the mid-to-high frequency range, based on using directional frequency cues that are imposed by the pinnae, head and torso on the IID at the two ears. The experiments described in the manuscript provide an insight into how the peripheral auditory system might be able to process spectral cues for localization. Thus it complements currently popular localization models that are based only on interaural time dierence (ITD) cues. It must be noted that this model has been dened only for broadband medium and high frequency sound sources (>.3 khz). The modeled IID spectrum is an idealization of the response of the units in the lateral superior olive (LSO). In obtaining this representation, the cochlear bandpass lters were assumed to be narrowband and in eect ignored. Saturation eects of the nerve bers in the auditory (VIII th) nerve were ignored. Stimuli were assumed to evoke responses in the approximately linear range of nerve bers and nonlinearities were discounted. These limitations of the model must be kept in mind when deriving conclusions from the above model.

Modeling of Pinna Related Transfer Functions (PRTF) Using the Finite Element Method (FEM)

Modeling of Pinna Related Transfer Functions (PRTF) Using the Finite Element Method (FEM) Modeling of Pinna Related Transfer Functions (PRTF) Using the Finite Element Method (FEM) Manan Joshi *1, Navarun Gupta 1, and Lawrence V. Hmurcik 1 1 University of Bridgeport, Bridgeport, CT *Corresponding

More information

SPATIAL FREQUENCY RESPONSE SURFACES (SFRS S): AN ALTERNATIVE VISUALIZATION AND INTERPOLATION TECHNIQUE FOR HEAD-RELATED TRANSFER FUNCTIONS (HRTF S)

SPATIAL FREQUENCY RESPONSE SURFACES (SFRS S): AN ALTERNATIVE VISUALIZATION AND INTERPOLATION TECHNIQUE FOR HEAD-RELATED TRANSFER FUNCTIONS (HRTF S) Cheng and Wakefield Page 1 of 13 Pre-print: AES16: Rovaniemi, Finland SPATIAL FREQUENCY RESPONSE SURFACES (SFRS S): AN ALTERNATIVE VISUALIZATION AND INTERPOLATION TECHNIQUE FOR HEAD-RELATED TRANSFER FUNCTIONS

More information

Horizontal plane HRTF reproduction using continuous Fourier-Bessel functions

Horizontal plane HRTF reproduction using continuous Fourier-Bessel functions Horizontal plane HRTF reproduction using continuous Fourier-Bessel functions Wen Zhang,2, Thushara D. Abhayapala,2, Rodney A. Kennedy Department of Information Engineering, Research School of Information

More information

Audio-coding standards

Audio-coding standards Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system.

More information

Measurement of pinna flare angle and its effect on individualized head-related transfer functions

Measurement of pinna flare angle and its effect on individualized head-related transfer functions PROCEEDINGS of the 22 nd International Congress on Acoustics Free-Field Virtual Psychoacoustics and Hearing Impairment: Paper ICA2016-53 Measurement of pinna flare angle and its effect on individualized

More information

Audio-coding standards

Audio-coding standards Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system.

More information

Analysis of Functional MRI Timeseries Data Using Signal Processing Techniques

Analysis of Functional MRI Timeseries Data Using Signal Processing Techniques Analysis of Functional MRI Timeseries Data Using Signal Processing Techniques Sea Chen Department of Biomedical Engineering Advisors: Dr. Charles A. Bouman and Dr. Mark J. Lowe S. Chen Final Exam October

More information

Surrounded by High-Definition Sound

Surrounded by High-Definition Sound Surrounded by High-Definition Sound Dr. ChingShun Lin CSIE, NCU May 6th, 009 Introduction What is noise? Uncertain filters Introduction (Cont.) How loud is loud? (Audible: 0Hz - 0kHz) Introduction (Cont.)

More information

Introduction to HRTFs

Introduction to HRTFs Introduction to HRTFs http://www.umiacs.umd.edu/users/ramani ramani@umiacs.umd.edu How do we perceive sound location? Initial idea: Measure attributes of received sound at the two ears Compare sound received

More information

Fundamentals of Perceptual Audio Encoding. Craig Lewiston HST.723 Lab II 3/23/06

Fundamentals of Perceptual Audio Encoding. Craig Lewiston HST.723 Lab II 3/23/06 Fundamentals of Perceptual Audio Encoding Craig Lewiston HST.723 Lab II 3/23/06 Goals of Lab Introduction to fundamental principles of digital audio & perceptual audio encoding Learn the basics of psychoacoustic

More information

Neurophysical Model by Barten and Its Development

Neurophysical Model by Barten and Its Development Chapter 14 Neurophysical Model by Barten and Its Development According to the Barten model, the perceived foveal image is corrupted by internal noise caused by statistical fluctuations, both in the number

More information

CELL COMPETITION GENERATES. Inst. fur Theor. Physik, SFB Nichtlin. Dynamik, Univ.

CELL COMPETITION GENERATES.  Inst. fur Theor. Physik, SFB Nichtlin. Dynamik, Univ. ON-CENTER AND OFF-CENTER CELL COMPETITION GENERATES ORIENTED RECEPTIVE FIELDS FROM NON-ORIENTED STIMULI IN KOHONEN'S SELF-ORGANIZING MAP Maximilian Riesenhuber y Hans-Ulrich Bauer Theo Geisel max@ai.mit.edu,

More information

AUDIOVISUAL COMMUNICATION

AUDIOVISUAL COMMUNICATION AUDIOVISUAL COMMUNICATION Laboratory Session: Audio Processing and Coding The objective of this lab session is to get the students familiar with audio processing and coding, notably psychoacoustic analysis

More information

Novel Lossy Compression Algorithms with Stacked Autoencoders

Novel Lossy Compression Algorithms with Stacked Autoencoders Novel Lossy Compression Algorithms with Stacked Autoencoders Anand Atreya and Daniel O Shea {aatreya, djoshea}@stanford.edu 11 December 2009 1. Introduction 1.1. Lossy compression Lossy compression is

More information

APPLYING EXTRAPOLATION AND INTERPOLATION METHODS TO MEASURED AND SIMULATED HRTF DATA USING SPHERICAL HARMONIC DECOMPOSITION.

APPLYING EXTRAPOLATION AND INTERPOLATION METHODS TO MEASURED AND SIMULATED HRTF DATA USING SPHERICAL HARMONIC DECOMPOSITION. APPLYING EXTRAPOLATION AND INTERPOLATION METHODS TO MEASURED AND SIMULATED HRTF DATA USING SPHERICAL HARMONIC DECOMPOSITION Martin Pollow Institute of Technical Acoustics RWTH Aachen University Neustraße

More information

Principles of Audio Coding

Principles of Audio Coding Principles of Audio Coding Topics today Introduction VOCODERS Psychoacoustics Equal-Loudness Curve Frequency Masking Temporal Masking (CSIT 410) 2 Introduction Speech compression algorithm focuses on exploiting

More information

SPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL

SPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL SPREAD SPECTRUM WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL 1 Yüksel Tokur 2 Ergun Erçelebi e-mail: tokur@gantep.edu.tr e-mail: ercelebi@gantep.edu.tr 1 Gaziantep University, MYO, 27310, Gaziantep,

More information

Modeling of Pinna Related Transfer Functions (PRTF) using the Finite Element Method (FEM)

Modeling of Pinna Related Transfer Functions (PRTF) using the Finite Element Method (FEM) Modeling of Pinna Related Transfer Functions (PRTF) using the Finite Element Method (FEM) Manan Joshi Navarun Gupta, Ph. D. Lawrence Hmurcik, Ph. D. University of Bridgeport, Bridgeport, CT Objective Measure

More information

AUDIOVISUAL COMMUNICATION

AUDIOVISUAL COMMUNICATION AUDIOVISUAL COMMUNICATION Laboratory Session: Audio Processing and Coding The objective of this lab session is to get the students familiar with audio processing and coding, notably psychoacoustic analysis

More information

AUDIBLE AND INAUDIBLE EARLY REFLECTIONS: THRESHOLDS FOR AURALIZATION SYSTEM DESIGN

AUDIBLE AND INAUDIBLE EARLY REFLECTIONS: THRESHOLDS FOR AURALIZATION SYSTEM DESIGN AUDIBLE AND INAUDIBLE EARLY REFLECTIONS: THRESHOLDS FOR AURALIZATION SYSTEM DESIGN Durand R. Begault, Ph.D. San José State University Flight Management and Human Factors Research Division NASA Ames Research

More information

A B. A: sigmoid B: EBA (x0=0.03) C: EBA (x0=0.05) U

A B. A: sigmoid B: EBA (x0=0.03) C: EBA (x0=0.05) U Extending the Power and Capacity of Constraint Satisfaction Networks nchuan Zeng and Tony R. Martinez Computer Science Department, Brigham Young University, Provo, Utah 8460 Email: zengx@axon.cs.byu.edu,

More information

ELEC Dr Reji Mathew Electrical Engineering UNSW

ELEC Dr Reji Mathew Electrical Engineering UNSW ELEC 4622 Dr Reji Mathew Electrical Engineering UNSW Review of Motion Modelling and Estimation Introduction to Motion Modelling & Estimation Forward Motion Backward Motion Block Motion Estimation Motion

More information

EE795: Computer Vision and Intelligent Systems

EE795: Computer Vision and Intelligent Systems EE795: Computer Vision and Intelligent Systems Spring 2012 TTh 17:30-18:45 FDH 204 Lecture 14 130307 http://www.ee.unlv.edu/~b1morris/ecg795/ 2 Outline Review Stereo Dense Motion Estimation Translational

More information

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects. Perceptual coding Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual encoders, however, have been designed for the compression of general

More information

The ring of place cells in the rodent hippocampus is partly under the control

The ring of place cells in the rodent hippocampus is partly under the control A VISUALLY DRIVEN HIPPOCAMPAL PLACE CELL MODEL Mark C. Fuhs,A.David Redish, and David S. Touretzky Computer Science Dept. and Center for the Neural Basis of Cognition Carnegie Mellon University Pittsburgh,

More information

CHAPTER 3 WAVEFRONT RECONSTRUCTION METHODS. 3.1 Spatial Correlation Method

CHAPTER 3 WAVEFRONT RECONSTRUCTION METHODS. 3.1 Spatial Correlation Method CHAPTER 3 WAVEFRONT RECONSTRUCTION METHODS Pierce [9] defines a wavefront as any moving surface along which a waveform feature is being simultaneously received. The first step in performing ray tracing

More information

specular diffuse reflection.

specular diffuse reflection. Lesson 8 Light and Optics The Nature of Light Properties of Light: Reflection Refraction Interference Diffraction Polarization Dispersion and Prisms Total Internal Reflection Huygens s Principle The Nature

More information

Richard S. Zemel 1 Georey E. Hinton North Torrey Pines Rd. Toronto, ONT M5S 1A4. Abstract

Richard S. Zemel 1 Georey E. Hinton North Torrey Pines Rd. Toronto, ONT M5S 1A4. Abstract Developing Population Codes By Minimizing Description Length Richard S Zemel 1 Georey E Hinton University oftoronto & Computer Science Department The Salk Institute, CNL University oftoronto 0 North Torrey

More information

Function approximation using RBF network. 10 basis functions and 25 data points.

Function approximation using RBF network. 10 basis functions and 25 data points. 1 Function approximation using RBF network F (x j ) = m 1 w i ϕ( x j t i ) i=1 j = 1... N, m 1 = 10, N = 25 10 basis functions and 25 data points. Basis function centers are plotted with circles and data

More information

AN ALGORITHM FOR BLIND RESTORATION OF BLURRED AND NOISY IMAGES

AN ALGORITHM FOR BLIND RESTORATION OF BLURRED AND NOISY IMAGES AN ALGORITHM FOR BLIND RESTORATION OF BLURRED AND NOISY IMAGES Nader Moayeri and Konstantinos Konstantinides Hewlett-Packard Laboratories 1501 Page Mill Road Palo Alto, CA 94304-1120 moayeri,konstant@hpl.hp.com

More information

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 SUBJECTIVE AND OBJECTIVE QUALITY EVALUATION FOR AUDIO WATERMARKING BASED ON SINUSOIDAL AMPLITUDE MODULATION PACS: 43.10.Pr, 43.60.Ek

More information

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual coding Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual encoders, however, have been designed for the compression of general

More information

CHAPTER 6 PERCEPTUAL ORGANIZATION BASED ON TEMPORAL DYNAMICS

CHAPTER 6 PERCEPTUAL ORGANIZATION BASED ON TEMPORAL DYNAMICS CHAPTER 6 PERCEPTUAL ORGANIZATION BASED ON TEMPORAL DYNAMICS This chapter presents a computational model for perceptual organization. A figure-ground segregation network is proposed based on a novel boundary

More information

BINAURAL SOUND LOCALIZATION FOR UNTRAINED DIRECTIONS BASED ON A GAUSSIAN MIXTURE MODEL

BINAURAL SOUND LOCALIZATION FOR UNTRAINED DIRECTIONS BASED ON A GAUSSIAN MIXTURE MODEL BINAURAL SOUND LOCALIZATION FOR UNTRAINED DIRECTIONS BASED ON A GAUSSIAN MIXTURE MODEL Takanori Nishino and Kazuya Takeda Center for Information Media Studies, Nagoya University Furo-cho, Chikusa-ku, Nagoya,

More information

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 13 Audio Signal Processing 14/04/01 http://www.ee.unlv.edu/~b1morris/ee482/

More information

Parametric Coding of Spatial Audio

Parametric Coding of Spatial Audio Parametric Coding of Spatial Audio Ph.D. Thesis Christof Faller, September 24, 2004 Thesis advisor: Prof. Martin Vetterli Audiovisual Communications Laboratory, EPFL Lausanne Parametric Coding of Spatial

More information

Multimedia Communications. Audio coding

Multimedia Communications. Audio coding Multimedia Communications Audio coding Introduction Lossy compression schemes can be based on source model (e.g., speech compression) or user model (audio coding) Unlike speech, audio signals can be generated

More information

Networks for Control. California Institute of Technology. Pasadena, CA Abstract

Networks for Control. California Institute of Technology. Pasadena, CA Abstract Learning Fuzzy Rule-Based Neural Networks for Control Charles M. Higgins and Rodney M. Goodman Department of Electrical Engineering, 116-81 California Institute of Technology Pasadena, CA 91125 Abstract

More information

Random Search Report An objective look at random search performance for 4 problem sets

Random Search Report An objective look at random search performance for 4 problem sets Random Search Report An objective look at random search performance for 4 problem sets Dudon Wai Georgia Institute of Technology CS 7641: Machine Learning Atlanta, GA dwai3@gatech.edu Abstract: This report

More information

1 INTRODUCTION The LMS adaptive algorithm is the most popular algorithm for adaptive ltering because of its simplicity and robustness. However, its ma

1 INTRODUCTION The LMS adaptive algorithm is the most popular algorithm for adaptive ltering because of its simplicity and robustness. However, its ma MULTIPLE SUBSPACE ULV ALGORITHM AND LMS TRACKING S. HOSUR, A. H. TEWFIK, D. BOLEY University of Minnesota 200 Union St. S.E. Minneapolis, MN 55455 U.S.A fhosur@ee,tewk@ee,boley@csg.umn.edu ABSTRACT. The

More information

Topographic Mapping with fmri

Topographic Mapping with fmri Topographic Mapping with fmri Retinotopy in visual cortex Tonotopy in auditory cortex signal processing + neuroimaging = beauty! Topographic Mapping with fmri Retinotopy in visual cortex Tonotopy in auditory

More information

CHAPTER 3. Preprocessing and Feature Extraction. Techniques

CHAPTER 3. Preprocessing and Feature Extraction. Techniques CHAPTER 3 Preprocessing and Feature Extraction Techniques CHAPTER 3 Preprocessing and Feature Extraction Techniques 3.1 Need for Preprocessing and Feature Extraction schemes for Pattern Recognition and

More information

Effects of multi-scale velocity heterogeneities on wave-equation migration Yong Ma and Paul Sava, Center for Wave Phenomena, Colorado School of Mines

Effects of multi-scale velocity heterogeneities on wave-equation migration Yong Ma and Paul Sava, Center for Wave Phenomena, Colorado School of Mines Effects of multi-scale velocity heterogeneities on wave-equation migration Yong Ma and Paul Sava, Center for Wave Phenomena, Colorado School of Mines SUMMARY Velocity models used for wavefield-based seismic

More information

Ensemble methods in machine learning. Example. Neural networks. Neural networks

Ensemble methods in machine learning. Example. Neural networks. Neural networks Ensemble methods in machine learning Bootstrap aggregating (bagging) train an ensemble of models based on randomly resampled versions of the training set, then take a majority vote Example What if you

More information

Advanced phase retrieval: maximum likelihood technique with sparse regularization of phase and amplitude

Advanced phase retrieval: maximum likelihood technique with sparse regularization of phase and amplitude Advanced phase retrieval: maximum likelihood technique with sparse regularization of phase and amplitude A. Migukin *, V. atkovnik and J. Astola Department of Signal Processing, Tampere University of Technology,

More information

Mpeg 1 layer 3 (mp3) general overview

Mpeg 1 layer 3 (mp3) general overview Mpeg 1 layer 3 (mp3) general overview 1 Digital Audio! CD Audio:! 16 bit encoding! 2 Channels (Stereo)! 44.1 khz sampling rate 2 * 44.1 khz * 16 bits = 1.41 Mb/s + Overhead (synchronization, error correction,

More information

Sontacchi A., Noisternig M., Majdak P., Höldrich R.

Sontacchi A., Noisternig M., Majdak P., Höldrich R. $Q2EMHFWLYHRGHORI/RFDOLVDWLRQLQ %LQDXUDO6RXQG5HSURGXFWLRQ6\VWHPV Sontacchi A., Noisternig M., Majdak P., Höldrich R. $(6VW,QWHUQDWLRQDO&RQIHUHQFH -XQH 6W3HWHUVEXUJ5XVVLD ,QVWLWXWHRI(OHFWURQLFXVLF DQG$FRXVWLFV

More information

m Environment Output Activation 0.8 Output Activation Input Value

m Environment Output Activation 0.8 Output Activation Input Value Learning Sensory-Motor Cortical Mappings Without Training Mike Spratling Gillian Hayes Department of Articial Intelligence University of Edinburgh mikes@dai.ed.ac.uk gmh@dai.ed.ac.uk Abstract. This paper

More information

Appendix 4. Audio coding algorithms

Appendix 4. Audio coding algorithms Appendix 4. Audio coding algorithms 1 Introduction The main application of audio compression systems is to obtain compact digital representations of high-quality (CD-quality) wideband audio signals. Typically

More information

IEEE Proof Web Version

IEEE Proof Web Version IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 8, NOVEMBER 2009 1 Model-Based Expectation-Maximization Source Separation and Localization Michael I. Mandel, Member, IEEE, Ron

More information

Digital Image Processing. Prof. P. K. Biswas. Department of Electronic & Electrical Communication Engineering

Digital Image Processing. Prof. P. K. Biswas. Department of Electronic & Electrical Communication Engineering Digital Image Processing Prof. P. K. Biswas Department of Electronic & Electrical Communication Engineering Indian Institute of Technology, Kharagpur Lecture - 21 Image Enhancement Frequency Domain Processing

More information

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used. 1 4.12 Generalization In back-propagation learning, as many training examples as possible are typically used. It is hoped that the network so designed generalizes well. A network generalizes well when

More information

A Neural Network for Real-Time Signal Processing

A Neural Network for Real-Time Signal Processing 248 MalkofT A Neural Network for Real-Time Signal Processing Donald B. Malkoff General Electric / Advanced Technology Laboratories Moorestown Corporate Center Building 145-2, Route 38 Moorestown, NJ 08057

More information

Scaling and Power Spectra of Natural Images

Scaling and Power Spectra of Natural Images Scaling and Power Spectra of Natural Images R. P. Millane, S. Alzaidi and W. H. Hsiao Department of Electrical and Computer Engineering University of Canterbury Private Bag 4800, Christchurch, New Zealand

More information

mywbut.com Diffraction

mywbut.com Diffraction Diffraction If an opaque obstacle (or aperture) is placed between a source of light and screen, a sufficiently distinct shadow of opaque (or an illuminated aperture) is obtained on the screen.this shows

More information

Face Recognition using Eigenfaces SMAI Course Project

Face Recognition using Eigenfaces SMAI Course Project Face Recognition using Eigenfaces SMAI Course Project Satarupa Guha IIIT Hyderabad 201307566 satarupa.guha@research.iiit.ac.in Ayushi Dalmia IIIT Hyderabad 201307565 ayushi.dalmia@research.iiit.ac.in Abstract

More information

Image Compression: An Artificial Neural Network Approach

Image Compression: An Artificial Neural Network Approach Image Compression: An Artificial Neural Network Approach Anjana B 1, Mrs Shreeja R 2 1 Department of Computer Science and Engineering, Calicut University, Kuttippuram 2 Department of Computer Science and

More information

Recognition. Clark F. Olson. Cornell University. work on separate feature sets can be performed in

Recognition. Clark F. Olson. Cornell University. work on separate feature sets can be performed in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 907-912, 1996. Connectionist Networks for Feature Indexing and Object Recognition Clark F. Olson Department of Computer

More information

ET4254 Communications and Networking 1

ET4254 Communications and Networking 1 Topic 2 Aims:- Communications System Model and Concepts Protocols and Architecture Analog and Digital Signal Concepts Frequency Spectrum and Bandwidth 1 A Communications Model 2 Communications Tasks Transmission

More information

Audio Compression Using Decibel chirp Wavelet in Psycho- Acoustic Model

Audio Compression Using Decibel chirp Wavelet in Psycho- Acoustic Model Audio Compression Using Decibel chirp Wavelet in Psycho- Acoustic Model 1 M. Chinna Rao M.Tech,(Ph.D) Research scholar, JNTUK,kakinada chinnarao.mortha@gmail.com 2 Dr. A.V.S.N. Murthy Professor of Mathematics,

More information

Department of. Computer Science. Remapping Subpartitions of. Hyperspace Using Iterative. Genetic Search. Keith Mathias and Darrell Whitley

Department of. Computer Science. Remapping Subpartitions of. Hyperspace Using Iterative. Genetic Search. Keith Mathias and Darrell Whitley Department of Computer Science Remapping Subpartitions of Hyperspace Using Iterative Genetic Search Keith Mathias and Darrell Whitley Technical Report CS-4-11 January 7, 14 Colorado State University Remapping

More information

Edge and local feature detection - 2. Importance of edge detection in computer vision

Edge and local feature detection - 2. Importance of edge detection in computer vision Edge and local feature detection Gradient based edge detection Edge detection by function fitting Second derivative edge detectors Edge linking and the construction of the chain graph Edge and local feature

More information

IMPROVEMENTS TO THE BACKPROPAGATION ALGORITHM

IMPROVEMENTS TO THE BACKPROPAGATION ALGORITHM Annals of the University of Petroşani, Economics, 12(4), 2012, 185-192 185 IMPROVEMENTS TO THE BACKPROPAGATION ALGORITHM MIRCEA PETRINI * ABSTACT: This paper presents some simple techniques to improve

More information

Guidelines for proper use of Plate elements

Guidelines for proper use of Plate elements Guidelines for proper use of Plate elements In structural analysis using finite element method, the analysis model is created by dividing the entire structure into finite elements. This procedure is known

More information

ALMA Memo No An Imaging Study for ACA. Min S. Yun. University of Massachusetts. April 1, Abstract

ALMA Memo No An Imaging Study for ACA. Min S. Yun. University of Massachusetts. April 1, Abstract ALMA Memo No. 368 An Imaging Study for ACA Min S. Yun University of Massachusetts April 1, 2001 Abstract 1 Introduction The ALMA Complementary Array (ACA) is one of the several new capabilities for ALMA

More information

Metrics for performance assessment of mixed-order Ambisonics spherical microphone arrays

Metrics for performance assessment of mixed-order Ambisonics spherical microphone arrays Downloaded from orbit.dtu.dk on: Oct 6, 28 Metrics for performance assessment of mixed-order Ambisonics spherical microphone arrays Favrot, Sylvain Emmanuel; Marschall, Marton Published in: Proceedings

More information

Neural Networks Based Time-Delay Estimation using DCT Coefficients

Neural Networks Based Time-Delay Estimation using DCT Coefficients American Journal of Applied Sciences 6 (4): 73-78, 9 ISSN 1546-939 9 Science Publications Neural Networks Based Time-Delay Estimation using DCT Coefficients Samir J. Shaltaf and Ahmad A. Mohammad Department

More information

1. Introduction. 2. Motivation and Problem Definition. Volume 8 Issue 2, February Susmita Mohapatra

1. Introduction. 2. Motivation and Problem Definition. Volume 8 Issue 2, February Susmita Mohapatra Pattern Recall Analysis of the Hopfield Neural Network with a Genetic Algorithm Susmita Mohapatra Department of Computer Science, Utkal University, India Abstract: This paper is focused on the implementation

More information

Image Transformation Techniques Dr. Rajeev Srivastava Dept. of Computer Engineering, ITBHU, Varanasi

Image Transformation Techniques Dr. Rajeev Srivastava Dept. of Computer Engineering, ITBHU, Varanasi Image Transformation Techniques Dr. Rajeev Srivastava Dept. of Computer Engineering, ITBHU, Varanasi 1. Introduction The choice of a particular transform in a given application depends on the amount of

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

Evaluation of a new Ambisonic decoder for irregular loudspeaker arrays using interaural cues

Evaluation of a new Ambisonic decoder for irregular loudspeaker arrays using interaural cues 3rd International Symposium on Ambisonics & Spherical Acoustics@Lexington, Kentucky, USA, 2nd June 2011 Evaluation of a new Ambisonic decoder for irregular loudspeaker arrays using interaural cues J. Treviño

More information

Analysis of Directional Beam Patterns from Firefly Optimization

Analysis of Directional Beam Patterns from Firefly Optimization Analysis of Directional Beam Patterns from Firefly Optimization Nicholas Misiunas, Charles Thompson and Kavitha Chandra Center for Advanced Computation and Telecommunications Department of Electrical and

More information

Ultrasonic Multi-Skip Tomography for Pipe Inspection

Ultrasonic Multi-Skip Tomography for Pipe Inspection 18 th World Conference on Non destructive Testing, 16-2 April 212, Durban, South Africa Ultrasonic Multi-Skip Tomography for Pipe Inspection Arno VOLKER 1, Rik VOS 1 Alan HUNTER 1 1 TNO, Stieltjesweg 1,

More information

Data Mining. Neural Networks

Data Mining. Neural Networks Data Mining Neural Networks Goals for this Unit Basic understanding of Neural Networks and how they work Ability to use Neural Networks to solve real problems Understand when neural networks may be most

More information

Assignment 2. Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions

Assignment 2. Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions ENEE 739Q: STATISTICAL AND NEURAL PATTERN RECOGNITION Spring 2002 Assignment 2 Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions Aravind Sundaresan

More information

Lecture 16 Perceptual Audio Coding

Lecture 16 Perceptual Audio Coding EECS 225D Audio Signal Processing in Humans and Machines Lecture 16 Perceptual Audio Coding 2012-3-14 Professor Nelson Morgan today s lecture by John Lazzaro www.icsi.berkeley.edu/eecs225d/spr12/ Hero

More information

Dr Andrew Abel University of Stirling, Scotland

Dr Andrew Abel University of Stirling, Scotland Dr Andrew Abel University of Stirling, Scotland University of Stirling - Scotland Cognitive Signal Image and Control Processing Research (COSIPRA) Cognitive Computation neurobiology, cognitive psychology

More information

Empirical transfer function determination by. BP 100, Universit de PARIS 6

Empirical transfer function determination by. BP 100, Universit de PARIS 6 Empirical transfer function determination by the use of Multilayer Perceptron F. Badran b, M. Crepon a, C. Mejia a, S. Thiria a and N. Tran a a Laboratoire d'oc anographie Dynamique et de Climatologie

More information

Accurate Image Registration from Local Phase Information

Accurate Image Registration from Local Phase Information Accurate Image Registration from Local Phase Information Himanshu Arora, Anoop M. Namboodiri, and C.V. Jawahar Center for Visual Information Technology, IIIT, Hyderabad, India { himanshu@research., anoop@,

More information

Wevelet Neuron Filter with the Local Statistics. Oriented to the Pre-processor for the Image Signals

Wevelet Neuron Filter with the Local Statistics. Oriented to the Pre-processor for the Image Signals Wevelet Neuron Filter with the Local Statistics Oriented to the Pre-processor for the Image Signals Noriaki Suetake Naoki Yamauchi 3 Takeshi Yamakawa y epartment of Control Engineering and Science Kyushu

More information

AUDIO SIGNAL PROCESSING FOR NEXT- GENERATION MULTIMEDIA COMMUNI CATION SYSTEMS

AUDIO SIGNAL PROCESSING FOR NEXT- GENERATION MULTIMEDIA COMMUNI CATION SYSTEMS AUDIO SIGNAL PROCESSING FOR NEXT- GENERATION MULTIMEDIA COMMUNI CATION SYSTEMS Edited by YITENG (ARDEN) HUANG Bell Laboratories, Lucent Technologies JACOB BENESTY Universite du Quebec, INRS-EMT Kluwer

More information

Journal of Asian Scientific Research FEATURES COMPOSITION FOR PROFICIENT AND REAL TIME RETRIEVAL IN CBIR SYSTEM. Tohid Sedghi

Journal of Asian Scientific Research FEATURES COMPOSITION FOR PROFICIENT AND REAL TIME RETRIEVAL IN CBIR SYSTEM. Tohid Sedghi Journal of Asian Scientific Research, 013, 3(1):68-74 Journal of Asian Scientific Research journal homepage: http://aessweb.com/journal-detail.php?id=5003 FEATURES COMPOSTON FOR PROFCENT AND REAL TME RETREVAL

More information

Motion Estimation. There are three main types (or applications) of motion estimation:

Motion Estimation. There are three main types (or applications) of motion estimation: Members: D91922016 朱威達 R93922010 林聖凱 R93922044 謝俊瑋 Motion Estimation There are three main types (or applications) of motion estimation: Parametric motion (image alignment) The main idea of parametric motion

More information

Ghosts In the Image Aliasing Problems With Incoherent Synthetic Aperture Using A Sparse Array

Ghosts In the Image Aliasing Problems With Incoherent Synthetic Aperture Using A Sparse Array Ghosts In the Image Aliasing Problems With Incoherent Synthetic Aperture Using A Sparse Array M. Hoffmann-Kuhnt tmsmh@nus.edu.sg M. A. Chitre tmsmac@nus.edu.sg J. R. Potter tmsmh@nus.edu.sg Abstract -

More information

Adaptive Waveform Inversion: Theory Mike Warner*, Imperial College London, and Lluís Guasch, Sub Salt Solutions Limited

Adaptive Waveform Inversion: Theory Mike Warner*, Imperial College London, and Lluís Guasch, Sub Salt Solutions Limited Adaptive Waveform Inversion: Theory Mike Warner*, Imperial College London, and Lluís Guasch, Sub Salt Solutions Limited Summary We present a new method for performing full-waveform inversion that appears

More information

Interfacing of CASA and Multistream recognition. Cedex, France. CH-1920, Martigny, Switzerland

Interfacing of CASA and Multistream recognition. Cedex, France. CH-1920, Martigny, Switzerland Interfacing of CASA and Multistream recognition Herv Glotin 2;, Fr d ric Berthommier, Emmanuel Tessier, Herv Bourlard 2 Institut de la Communication Parl e (ICP), 46 Av F lix Viallet, 3803 Grenoble Cedex,

More information

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009 Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer

More information

Optimum Array Processing

Optimum Array Processing Optimum Array Processing Part IV of Detection, Estimation, and Modulation Theory Harry L. Van Trees WILEY- INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION Preface xix 1 Introduction 1 1.1 Array Processing

More information

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS Cognitive Robotics Original: David G. Lowe, 004 Summary: Coen van Leeuwen, s1460919 Abstract: This article presents a method to extract

More information

3.1. Solution for white Gaussian noise

3.1. Solution for white Gaussian noise Low complexity M-hypotheses detection: M vectors case Mohammed Nae and Ahmed H. Tewk Dept. of Electrical Engineering University of Minnesota, Minneapolis, MN 55455 mnae,tewk@ece.umn.edu Abstract Low complexity

More information

From Shapes to Sounds: A perceptual mapping

From Shapes to Sounds: A perceptual mapping From Shapes to Sounds: A perceptual mapping Vikas Chandrakant Raykar vikas@umiacs.umd.edu Abstract In this report we present a perceptually inspired mapping to convert a simple two dimensional image consisting

More information

Digital Image Processing

Digital Image Processing Digital Image Processing Third Edition Rafael C. Gonzalez University of Tennessee Richard E. Woods MedData Interactive PEARSON Prentice Hall Pearson Education International Contents Preface xv Acknowledgments

More information

SELECTION OF A MULTIVARIATE CALIBRATION METHOD

SELECTION OF A MULTIVARIATE CALIBRATION METHOD SELECTION OF A MULTIVARIATE CALIBRATION METHOD 0. Aim of this document Different types of multivariate calibration methods are available. The aim of this document is to help the user select the proper

More information

Array Shape Tracking Using Active Sonar Reverberation

Array Shape Tracking Using Active Sonar Reverberation Lincoln Laboratory ASAP-2003 Worshop Array Shape Tracing Using Active Sonar Reverberation Vijay Varadarajan and Jeffrey Kroli Due University Department of Electrical and Computer Engineering Durham, NC

More information

Akarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction

Akarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction Akarsh Pokkunuru EECS Department 03-16-2017 Contractive Auto-Encoders: Explicit Invariance During Feature Extraction 1 AGENDA Introduction to Auto-encoders Types of Auto-encoders Analysis of different

More information

Automatic Machinery Fault Detection and Diagnosis Using Fuzzy Logic

Automatic Machinery Fault Detection and Diagnosis Using Fuzzy Logic Automatic Machinery Fault Detection and Diagnosis Using Fuzzy Logic Chris K. Mechefske Department of Mechanical and Materials Engineering The University of Western Ontario London, Ontario, Canada N6A5B9

More information

Using the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection

Using the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection Using the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection Hyunghoon Cho and David Wu December 10, 2010 1 Introduction Given its performance in recent years' PASCAL Visual

More information

BMVC 1996 doi: /c.10.41

BMVC 1996 doi: /c.10.41 On the use of the 1D Boolean model for the description of binary textures M Petrou, M Arrigo and J A Vons Dept. of Electronic and Electrical Engineering, University of Surrey, Guildford GU2 5XH, United

More information

Accelerometer Gesture Recognition

Accelerometer Gesture Recognition Accelerometer Gesture Recognition Michael Xie xie@cs.stanford.edu David Pan napdivad@stanford.edu December 12, 2014 Abstract Our goal is to make gesture-based input for smartphones and smartwatches accurate

More information

Norbert Schuff VA Medical Center and UCSF

Norbert Schuff VA Medical Center and UCSF Norbert Schuff Medical Center and UCSF Norbert.schuff@ucsf.edu Medical Imaging Informatics N.Schuff Course # 170.03 Slide 1/67 Objective Learn the principle segmentation techniques Understand the role

More information