Audio Engineering Society. Convention Paper. Presented at the 119th Convention 2005 October 7 10 New York, New York USA

Size: px

Start display at page:

Download "Audio Engineering Society. Convention Paper. Presented at the 119th Convention 2005 October 7 10 New York, New York USA"

Laureen Carr
5 years ago
Views:

1 Audio Engineering Sociey Convenion Paper Presened a he 119h Convenion 2005 Ocober 7 10 New Yor, New Yor USA This convenion paper has been reproduced from he auhor's advance manuscrip, wihou ediing, correcions, or consideraion by he Review Board. The AES aes no responsibiliy for he conens. Addiional papers may be obained by sending reques and remiance o Audio Engineering Sociey, 60 Eas 42 nd Sree, New Yor, New Yor , USA; also see All righs reserved. Reproducion of his paper, or any porion hereof, is no permied wihou direc permission from he Journal of he Audio Engineering Sociey. Qualiy Enhancemen of Low Bi Rae MPEG1-Layer 3 Audio Based on Audio Resynhesis Demerios Canzos 1 and Chris Kyriaais 1 1 Inegraed Media Sysems Cener (IMSC), Universiy of Souhern California, Los Angeles, CA, , USA ABSTRACT One of he mos popular audio compression formas is indispuably he MPEG1-Layer 3 forma which is based on he idea of low-bi ransparen encoding. As hese ypes of audio signals are saring o migrae from porable players wih inexpensive headphones o higher qualiy home audio sysems, i is becoming eviden ha higher bi raes may be required o mainain ransparency. We propose a novel mehod ha enhances low bi rae MP3 encoded audio segmens by applying mulichannel audio resynhesis mehods in a pos-processing sage or during decoding. Our algorihm employs he highly efficien Generalized Gaussian mixure model which, combined wih cepsral smoohing, leads o very low cepsral reconsrucion errors. In addiion, residual conversion is applied which proves o significanly improve he enhancemen performance. The mehod presened can be easily generalized o include oher audio formas for which sound qualiy is an issue. 1. INTRODUCTION The majoriy of MPEG1-Layer 3 (Mp3) audio encoded a low bi rae does no deliver high qualiy sound. On he oher hand, high bi rae Mp3 segmens, even hough hey deliver sufficien sound qualiy, are oo large o ransmi or sore. Wih he emergence of high qualiy consumer audio sysems and he prevalence of Mp3 as he sandard audio coding scheme, he need for enhancing low bi rae Mp3 audio daa wihou imposing excessive sorage or ransmission requiremens, seems naural. In his wor, we aemp o improve he qualiy of Mp3 encoded audio daa based on a recenly inroduced concep ermed audio resynhesis ([1]). In audio resynhesis, a reference (source) channel is ransmied and hen used o recreae he remaining (arge) channels a he receiving end by deriving a small se of consan parameers. In order o apply his concep o Mp3 audio enhancemen, we replace he source channel wih a low bi rae Mp3 audio music segmen and he arge channel wih he original uncompressed audio segmen of he same music piece. Our main goal is o recreae he high qualiy arge segmen a he receiver end by ransmiing a small se of consan parameers and by using he low qualiy source segmen ha is already sored a he receiver. This scheme is implemened in a pos-processing sage or during decoding and hus boh source and arge are preconvered o he same lossless daa forma (e.g. WAV).

2 Recen wor on audio resynhesis ([1]) has been based on previous specral ransformaion algorihms ([2,3,4]). The basic assumpion made in hese algorihms is ha he specral parameers are of Gaussian naure and hence are modeled by a Gaussian mixure. This grealy faciliaes he Maximum Lielihood (ML) parameers esimaion since he popular Expecaion-Maximizaion (EM) algorihm can be applied. As we show laer, he acual naure of he cepsral coefficiens of an audio signal is no sricly Gaussian and hus he Gaussian mixure model, alhough convenien, is no he bes soluion. We presen a new approach on modeling he cepsral coefficiens by employing he Generalized Gaussian mixure model. This model is very flexible and incorporaes a large number of disribuions including he Gaussian. A new echnique is also inroduced which aes effec during he cepsral conversion sep. Due o he lineariy of he conversion funcion and he abrup changes of he cepsral vecors during shor ime periods, he reconsrucion errors are considerably high. We propose a mehod in which he cepsral vecors are smoohed and he number of mixure componens increases o faciliae he as of he conversion funcion. Finally, a novel echnique relaed o residual processing is implemened. In many cases of low bi rae Mp3 sources, reconsrucion in he cepsral domain is no adequae for disorion-free enhanced audio. For his reason we also apply residual conversion and even hough i is no as accurae as cepsral conversion, i proves o significanly enrich he specral deails of he enhanced Mp3 music piece. 2. STATISTICAL CONVERSION The approach followed is based on previous saisical conversion algorihms relaed o speech synhesis ([2,3,4]). In our applicaion, he shor erm specral parameers are seleced o be he LPC cepsral vecors ([5]). The LPC analysis is carried ou in overlapping frames hrough a sliding window and hence each frame is modeled as an AR filer excied by a residual. We exrac he LPC cepsral vecors of he arge (which is unnown a he receiving end) and source signals. Our goal is o modify he cepsral vecors of he source signal so ha hey would be close in he leas squares sense o he arge cepsral vecors of he same music piece. This is accomplished by deriving a mapping funcion ha will conver each of he source cepsral vecors o he arge cepral vecor of he same ime frame (he wo signals are ime-aligned). The funcion is assumed linear and will be fully deermined by a small se of consan parameers. As shown laer, a similar conversion echnique can be applied o he residual vecors in which he source residual is modified so ha i beer maches he arge residual. In order o implemen he conversion funcion, we assume ha he source cepsral (and residual) vecors are generaed by a probabiliy densiy funcion (pdf). The as of deermining his pdf is effecively he sysem raining. The audio segmen used during raining is chosen so ha i is capable of modeling a large and diverse number of music pieces and is called he raining se. The esing source and esing arge signals are he paricular signal segmens on which we apply he conversion scheme and derive he specific conversion funcion. In he following subsecion we presen he probabilisic model associaed wih he raining as The Generalized Gaussian Mixure Model In he previous saisical conversion algorihms a common assumpion is ha he specral vecors are of Gaussian naure and hence he Gaussian mixure model is employed. The Gaussian mixure model has been reaed in numerous oher applicaions and an algorihm o esimae is parameers (EM) is readily available. However, as we show laer, he cepsral vecors of audio daa are no sricly Gaussian and hus his model is no he bes selecion. A more flexible model is adoped here, which includes he Gaussian mixure as a subcase, and is called he Generalized Gaussian mixure. Is componen pdf, he Generalized Gaussian pdf, is more flexible and adaps o virually any unimodal disribuion. Is analyical form for a random variable z is: aβ ( z µ ) α g( z; µ, σ,a ) = exp[ β ] (1) 2σΓ(1/ a) σ where µ is he mean, σ is he variance, α is he shape parameer, Γ( ) is he Gamma funcion and β is a dependen parameer: 1 / 2 Γ(3 / a) β = (2) Γ(1 / a) AES 119h Convenion, New Yor, New Yor, 2005 Ocober 7 10 Page 2 of 10

3 If α =2.0 we have he Gaussian pdf and if α =1.0 we have he Laplace pdf. When α >>1 he disribuion ends o he uniform pdf and when α < 1 he disribuion becomes impulsive. We consider he raining cepsral vecors (and he esing source vecors) o be generaed by a mixure wih componen pdf as described in equaion (1). The mixure formulaion of he Generalized Gaussian case is shown below: K q G( x ) = p( C ) g( x ; µ, σ, a ) (3) = 1 j= 1 where C denoes he cluser (componen), K is he number of clusers and p(c ) denoes he prior probabiliy ha he cepsral vecor x belongs o cluser. The cepsral vecor is q dimensional where q is he cepsral order and he jh coefficien is denoed by x (j). The vecor coefficiens are considered o be independen and hus he join pdf is he produc of he q coefficien pdf s. This diagonal formulaion is favorable since i decreases he compuaional complexiy during implemenaion Mixure Parameers Esimaion and Clusering The inclusion of a hird independen parameer (he shape parameer α) incurs addiional complexiy when i comes o ML (Maximum Lielihood) esimaion of he pdf parameers. This becomes more apparen in a mixure pdf where i is obvious ha he model is considerably more difficul o manipulae han he Gaussian mixure and he EM algorihm canno be applied easily because he Expecaion sep is very hard o compue. Also, even hough he EM algorihm is guaraneed o approach a local maximum, i is uncerain how fas his can be reached. We decide o follow a differen pah han he one used in he convenional mixure esimaion mehods by clusering he vecors and focusing on each cluser separaely. This will divide he parameers esimaion as ino K simpler ass. In order o perform his decomposiion we employ fuzzy clusering echniques hrough he c-means algorihm ([6]) and cluser he raining vecors ino K groups. The c-means is nown o avoid local minima beer han he -means and i also provides a fuzziness opion ha regulaes he occurrence of ouliers. The nex sep is o perform ML esimaion on each cluser. The esimaion is now sraighforward because he mean for each componen is nown (i is he cluser cener). We also compue p(c ) as he number of vecors ha belong o cluser divided by he oal number of vecors. The ML esimaor for he shape parameer a (j) of cluser and coordinae j is given by ([7]): ψ(1/ a ( + 1) + log( a j) 2 a a x µ : x C x : x C a ) 1 + a log( x µ 2 a 1 log( n µ ) x : x C = 0 µ a ) (4) where n is he number of vecors ha belong o class and ψ( ) is a funcion given by: 1 τ 1 1 ψ ( τ ) = (1 )(1 ) d (5) 0 The expression in (4) is solved by ieraive mehods. (j) The variance parameer σ of he h cluser and jh coordinae is hen esimaed as follows ([7]): 1 / a a a a β x µ : x C σ = (6) n Noe ha he zero h cepsral coefficiens (energy coefficiens) are discarded because hey inroduce srong bias during parameers esimaion. Besides, he frame energy informaion (relaive o he oher frames) is already conained in he residual Conversion Funcion The conversion funcion F( ) acs on he vecor sequence [x 1,...,x n ] and produces a vecor sequence close in he leas squares sense o he sequence [y 1,...,y n ]. Since we have seleced a diagonal implemenaion, his funcion will ac on he individual vecor componens and minimize he error: n q E = y = 1 j = 1 ( j ) ( j ) 2 F ( x ) (7) AES 119h Convenion, New Yor, New Yor, 2005 Ocober 7 10 Page 3 of 10

4 as in [2]. This problem becomes possible o solve under he consrain ha F is piecewise linear, i.e. K u F ( x ) = P( C x )[ v + ( x µ )] (8) = 1 σ for =1,..,n and j =1,..,q. The condiional probabiliy ha a given vecor belongs o cluser, P(C x ), is given by: p( C ) g( x ; µ, σ j = 1 P(C x ) = G( x ) q,a ) (9) The unnown parameers se [v,u] can be found by minimizing (7) which reduces o solving a ypical se of q independen leas-squares equaions ([2]) and hence he linear conversion funcion F is fully deermined Conversion Opimizaion hrough Cepsral Smoohing and Daa Overfiing The cepsral conversion funcion will generally no provide he accuracy in resuls ha is needed for audio reproducion. The cepsral vecors vary rapidly from frame o frame and many spies occur. The conversion funcion, due o is linear form, canno follow hese abrup changes and fails o produce he desired vecors. A new echnique is inroduced here ha improves he cepsral conversion performance. In essence, we smooh ou he cepsral vecors o reduce he spies by increasing he LPC analysis frame slide and lengh and a he same ime increase he mixure groups number so ha he conversion funcion has more componens available. The frame slide and lengh increase is applied only on he esing source and arge signals and no on he raining signal. If we apply he frame slide and lengh increase on he raining vecors oo hen heir number will decrease considerably and he ML esimaion will fail for a mixure of many componens. The number of groups is around hree imes larger han he number deermined by he MDL informaion-heoreic crierion ([8]) and hus he raining daa is overfied. This overfiing does no affec he conversion sage since any unnecessary clusers are filered ou by he conversion funcion. This echnique is proved o be exremely favorable since accurae reconsrucion of he cepsral vecors is achieved Residual Modeling and Conversion In many cases, an accurae cepsral reconsrucion is no sufficien for acousically undisored enhanced Mp3 segmens. Especially in he case of a very low bi rae source (e.g. 64Kbps), many audible arifacs are presen because he source and arge esing signals are simply oo differen. Insrumens ha are inaudible in he source signal will usually appear in he enhanced signal as disorions since he LPC coefficiens alone fail o reproduce hem. In such cases, he signal differences lie mainly in he residuals and herefore some residual processing is essenial for beer enhancemen resuls. We adop he assumpion ha he residual vecors are correlaed wih heir corresponding cepsral vecors ([9]) and hus share similar saisical properies. Therefore, we can apply he saisical conversion described in he previous secions o he residual vecors also. The probabilisic model used here is he same used for cepsral conversion (i.e. i is derived from he raining cepsral vecors). However, he dimensionaliy of he residual vecors is much higher han ha of he raining cepsral vecors and herefore we have o divide hem in subvecors of dimensionaliy equal o ha of he raining cepsral vecors. For insance, in he case of 30 raining cepsral coordinaes and 840 residual coordinaes, we would divide he residual vecors in subvecors of 30 coordinaes each and apply saisical conversion in each of he 28 subvecors ses separaely. Clearly, we do no expec a residual reconsrucion wih accuracy similar o ha of he cepsral reconsrucion because he residuals are oo spiy. Furhermore, we have no derived a raining se or a probabilisic model specifically for he residual vecors since he exremely high residual vecor dimensionaliy would mae his impracical. Besides, we would have o design a global mixure pdf ha could efficienly model any se of esing residual vecors even hough hese are highly diverse and conain he fine deails of he signal. Using he mixure pdf derived from he raining cepsral vecors shows ha he convered residuals are much closer o he arge residuals (han he source residuals are) and a large amoun of informaion is conveyed o he enhanced Mp3 segmen hrough his process. I was also observed ha a high raining cepsral order led o smaller residual reconsrucion errors. Therefore we selec a cepsral order for he raining vecors ha is higher han he cepsral order of he esing vecors. AES 119h Convenion, New Yor, New Yor, 2005 Ocober 7 10 Page 4 of 10

5 3. IMPLEMENTATION The algorihm described previously was applied and esed on a randomly seleced music piece. The general scenario involves enhancing a 32sec long, 64Kbps Mp3 segmen. This is he esing source signal. The esing arge signal is he uncompressed WAV file of he same music piece. These wo segmens are ime-aligned and since he algorihm is applied in a pos-processing sage, he Mp3 source is also convered o a WAV forma. Careful consideraion has been aen o reduce he residual conversion parameers size as much as possible. As shown laer in his secion, he acual size of he conversion funcion is less han he size of he Mp3 source and much less han he size of he uncompressed, arge file. Some objecive enhancemen resuls are also provided which prove he validiy of his scheme Wavele-Based Subband Coding Due o he higher sampling frequency and richer conen of an audio signal (compared o a speech signal) we follow a subband analysis. The subband separaion is performed wih waveles ([10]) and in his case he Daubechies filer of order 40 was a good choice since no audible aliasing effecs were observed. Several differrren wavele ree srucures were esed (e.g. equidisan subbands) bu he mos efficien srucure proved o be one ha emulaes he criical bands of he human hearing sysem as in [11]. This choice is furher jusified by he fac ha he Mp3 encoded source segmen has passed hrough a criical filerban also ([12]). The high number of subbands seleced allows us, as we show laer, o ae advanage of he iner-band redundancy and also o process heavier he subbands ha are he mos significan (i.e. he ones ha are more degraded or carry he audible pars of he signal). The acual wavele filerban is shown in Fig. 1 and is applied o boh esing source and esing arge signals leading o 17 esing subbands Training Model Derivaion A crucial par of he algorihm is o derive a Generalized Gaussian mixure pdf ha does no have o adjus o he paricular esing music piece. This probabilisic model should be global in he sense ha i will include he saisical properies of all possible music segmens and boh ransmiing and receiving ends will have access o i (e.g. pre-sored in boh sides). [0, 22] 13 [11, 13.8] [13.8, 16.5] [16.5, 19.3] [6.9, 8.3] [8.3, 9.6] [9.6, 11] [0, 0.7] [0.7, 1.4] [1.4, 2.1] [2.1, 2.8] [2.8, 3.4] [3.4, 4.1] [4.1, 4.8] [4.8, 5.5] [5.5, 6.2] [6.2, 6.9] Figure 1: Wavele ree srucure used for subband analysis of he esing source and esing arge signals (Numbers in braces indicae he frequency region in Hz in each subband. Numbers on leafs indicae he subband index from 1 o 17) Several candidae raining ses were processed o produce a mixure pdf among which were he mulichannel raining se of [1],a whie noise raining se, a Brownian noise raining se and a pin noise raining se. Pin noise proved o be he mos suiable raining se and produced smaller cepsral reconsrucion errors (up o 5% less in all subbands compared o he oher ses). In order o reduce he raining model size and allow for he daa diversiy needed in he case of many mixure componens ML esimaion, we divide he raining daa se ino 4 large equidisan subbands (insead of he 17 subbands shown in Fig. 1) covering he frequency range 0-22Hz (0-5.5Hz,5.5-11Hz, Hz, Hz) and each subband consiss of 12,000 cepsral vecors of cepsral order 30. Each of he 17 analysis subbands of he esing source and esing arge signals acquires he raining model parameers from one of he 4 larger subbands ha i is par of. During cepsral conversion, he cepsral order of he raining model is runcaed appropriaely for each esing subband o adjus o he lower cepsral order of he paricular esing source and esing arge cepsral vecors. During residual conversion, he large raining cepsral dimensionaliy allows for more efficien division of he esing residual vecors ino subvecors, as explained in secion [19.3, 22] AES 119h Convenion, New Yor, New Yor, 2005 Ocober 7 10 Page 5 of 10

6 3.3. Cepsral Conversion Resuls The cepsral conversion algorihm described in secion 2 is implemened according o he experimenal condiions of Table 1. Analysis Frame Slide/Lengh Cepsral Order Subbands Train(ms) Tes(ms) Train Tes /15 50/ /15 50/ Figure 2: Hisogram of shape parameers for he frequency band 0-5.5Hz of he pin noise raining se Fig. 2 shows he disribuion of he mixure pdf shape parameers for all groups and vecor coordinaes of he firs (0-5.5Hz) of he 4 raining subbands. I is clear ha he shape parameers, alhough srongly peaed a a =2.0, have he majoriy of heir values in he inervals (subgaussian) and (supergaussian) which jusifies he use of he Generalized Gaussian mixure as a more accurae model. Pin noise is random daa raher han acual audio daa bu a similar hisogram is obained from he audio daa se used in [1]. Figure 3: Fiing of mixure pdf (120 groups) o he normalized hisogram of he firs cepsral coefficiens of he band 0-5.5Hz of he pin noise raining se In Fig. 3 he validiy of he esimaion algorihm, as described in secion 2.2, is shown. Even hough a mixure model of 40 groups would be sufficien (as deermined by he MDL crierion), we increase his number o 120 and overfi he model for all 4 raining subbands as explained in secion 2.4. The fiing of he mixure pdf o he hisogram is sill very accurae which is aribued o he high modeling flexibiliy of he Generalized Gaussian pdf. Table 1: Experimenal parameers. The frequency regions of each of he analysis subbands in he lef-mos able column can be found in Fig.1. The frame slide and lengh are differen for he raining and esing segmens as explained in secion 2.4. We now show he necessiy of he conversion opimizaion scheme of secion 2.4 by esing wo scenarios where cepsral smoohing and daa overfiing are no applied a he same ime and which lead o increased cepsral reconsrucion errors. In case A, resynhesis is applied wih cepsral smoohing bu no daa overfiing (i.e. we derive a mixure pdf of 40 groups insead of 120), while in case B resynhesis is applied wih overfiing (120 groups) bu no smoohing (i.e. he raining and esing recordings have boh frame slide 10ms and frame lengh 15ms). The resuls are shown in Table 2. Average Quadraic Cepsral Disance Beween Targe-Source (frame slide/lengh 50ms/75ms) Targe-Resynhesis case A (no overfiing) Targe/Resynhesis case B (no smoohing) Targe-Resynhesis (smoohing+overfiing) Band 15 Band 16 Band Table 2: Two poor cepsral reconsrucion scenarios A,B for subbands and he case where cepsral smoohing and daa overfiing are applied ogeher. The conversion resuls for he remaining subbands (1-14) are shown in Table 3. I is clear ha he error reducion due o resynhesis varies across he subbands. However, he average cepsral disance beween he AES 119h Convenion, New Yor, New Yor, 2005 Ocober 7 10 Page 6 of 10

esing arge and resynhesized segmens is of he same order of magniude for mos of he subbands which means ha he cepsral conversion echnique has finie accuracy.

7 esing arge and resynhesized segmens is of he same order of magniude for mos of he subbands which means ha he cepsral conversion echnique has finie accuracy. By decreasing he duraion of he esing segmens and hus he number of cepsral vecors, he Analysis Subband Cepsral Disance Targe-Source E Cepsral Disance Targe-Resynhesis E Table 3: Average quadraic cepsral conversion resuls for subbands accuracy would increase bu so would he conversion parameers overhead since more conversion parameers would have o be ransmied per uni lengh of esing segmen. Figure 4: Cepsral reconsrucion of he firs coordinae for subband 17 ( Hz) In Fig. 4, an example of cepsral conversion for subband 17 is shown. I is clear ha he resynhesized firs cepsral coefficiens follow he corresponding arge coefficiens closely. Subbands 1-8 and 12 do no show observable errors since he iniial disance beween he source and arge cepsral coefficiens are small. Finally, from Tables 2 and 3 we observe ha he cepsral disance beween he source and arge signals grealy increases for subbands 9-17 (excep subband 12). This is direcly relaed o he fac ha he 64Kbps Mp3 coding scheme severely degrades he signal conen around he frequency region Hz while i reains he lower subbands. This will be aen ino accoun during he residual conversion implemenaion presened in he nex secion Residual Conversion Resuls and Redundancy The residual conversion scheme described in secion 2.6 is implemened. We exrac he residual vecors according o he 17 subbands analysis and apply he same 4 subbands raining model used for cepsral conversion. The high cepsral order of he model (30) allows for he inclusion of low-valued vecor coefficiens which are necessary for modeling he residual valleys. Low cepsral orders were also esed and led o larger residual reconsrucion errors. Therefore, he selecion of a high raining cepsral order is favorable. As menioned, he esing source and arge residual vecors acquire he model parameers according o one of he 4 raining subbands he paricular esing subband belongs o Residual Inra-Band Redundancy The residual conversion scheme as described previously requires a large amoun of conversion parameers o be creaed. For a full reconsrucion of all he residual vecors of a paricular subband, he size of he conversion parameers would be as large as 60% of he size of he arge (uncompressed) signal and several imes larger han he source Mp3 signal. For his reason, we decide o downsample he esing source and esing arge residual vecors before conversion. We esed downsampling facors of 2, 4 and 8 and he bes combinaion in erms of conversion parameers size and reconsrucion accuracy proved o be a downsampling facor of 4. Afer conversion, he reconsruced residual is resampled o he original rae by using he previous wo samples a each ime insance. Under his scheme, he audio qualiy does no decrease noiceably compared o a full reconsrucion and he size of he residual conversion funcion becomes four imes smaller. AES 119h Convenion, New Yor, New Yor, 2005 Ocober 7 10 Page 7 of 10

8 Residual Iner-Band Redundancy In Fig. 4, he average quadraic residual disances beween source and arge residuals for all subbands are ploed. Analysis Subband Residual Average Quadraic Disance Targe-Source Residual Average Quadraic Disance Targe-Reconsruc Table 4: Average quadraic residual conversion resuls for subbands when using he reconsruced residual of subband 16 for subbands Figure 4: Average quadraic residual errors beween source and arge for all subbands. I is clear ha no all source subbands are heavily disored. Subbands 1-8, 12 and 13 show small residual differences beween he esing source and esing arge segmens. This means ha we can apply residual processing o seleced subbands only. Applying residual conversion o subbands 9-11 and produced audible enhancemen wihou deriving many conversion parameers or performing excessive compuaions. Processing he remaining subbands did no provide significanly beer resuls ha could jusify he large amoun of he resuling conversion parameers. A furher reducion in parameers is achieved by observing ha he 4 highes esing subbands (14-17) show many residual similariies. By reconsrucing only one of hese residual signals and replacing all 4 residual signals wih he paricular reconsruced residual signal, a grea reducion in he average quadraic residual disances for all 4 subbands is achieved. This is also aribued o he fac ha he paricular subbands conen is no highly audible and he residual disances beween source and arge signals in hese subbands are large. Thus, even a less accuraely reconsruced residual is beer han he original source residuals. This is shown below in Table 4 where he reconsruced residual is derived for subband 16 only and i is used for all 4 subbands. The residual conversion resuls for he remaining subbands are shown in Table 5. Each of hese subbands has is own reconsruced residual since he lower subbands are very differen o each oher. Analysis Subband Residual Average Quadraic Disance Targe-Source Table 5: Average quadraic residual conversion resuls for subbands 9-11 when using he corresponding reconsruced residuals. The resuls of Tables 4 and 5 prove he validiy of he residual conversion scheme. Subbands 9-11 and 16 have reduced heir original residual errors more han 50%. Subbands 14, 15 and 17 have reduced heir original residual errors around 45% bu his reducion could be even more if each subband had is own reconsruced residual insead of sharing he residual derived from subband 16. Achieving an error reducion of 50% or more for hese subbands does no acually provide any acousical improvemen of he enhanced waveform since, as menioned, hey do no conain he highly audible pars of he signal Overall Performance Residual Average Quadraic Disance Targe-Reconsruc Several objecive similariy measures were esed among which he Muual Informaion in he ime domain proved o be he mos suiable. Fig. 5 illusraes he effeciveness of he seleced wavele srucure agains wavele rees of 2, 4 and 8 equidisan subbands. These cases are furher subdivided in cases of cepsral reconsrucion only and cepsral reconsrucion wih residual reconsrucion. In he case of 2 subbands, residual conversion is applied in boh subbands. In he case of 4 subbands, residual conversion is applied in he upper 3 subbands and in he case of 8 subbands residual conversion is applied in he upper 6 subbands. AES 119h Convenion, New Yor, New Yor, 2005 Ocober 7 10 Page 8 of 10

Even hough residual processing does no increase dramaically he Muual Informaion meric, he differences acousically are very sharp and he resynhesized segmen wihou residual conversion conains many

9 Figure 5: Muual Informaion beween esing arge and resynhesized signals for various wavele srucures wih and wihou residual conversion. From Fig. 5 i is clear ha audio enhancemen is more efficien -in erms of conversion parameers size and qualiy improvemen- when applying 17 bands wavele separaion wih residual reconsrucion. Even hough residual processing does no increase dramaically he Muual Informaion meric, he differences acousically are very sharp and he resynhesized segmen wihou residual conversion conains many periodic and random disorions. In conras, audio enhancemen wih residual conversion does no cause any audible disorions as preliminary subjecive ess show. The audio qualiy increase in he enhanced segmen compared o he source segmen is also easily percepible. To furher illusrae his we provide some ime domain waveform resuls of seleced subbands when applying residual conversion and cepsral conversion under he 17 subband analysis. I is obvious from Fig. 6 ha some subbands are severely degraded because he source waveform is almos non exisen. The resynhesized signal follows much closer he arge signal bu as menioned before here sill exis residual differences beween he arge and resynhesized segmens (see Tables 4 and 5) and herefore he wo signals canno be idenical for subbands Subbands 1-8 are no degraded enough (see Table 2 and Fig. 4) o show noiceable differences beween he source and arge waveforms and hence are no illusraed. Table 6 shows he ransmission requiremens of our scheme when ransmiing he cepsral conversion and residual conversion parameers under he 17 subbands separaion. No arihmeic coding is applied o compress he conversion parameers se and herefore i is possible ha he ransmission size can be furher reduced. Some Figure 6: Time domain resynhesis resuls for subbands 11 (upper plo) and 17 (lower plo). of he lower subbands can also no be processed a all (no cepsral conversion) since for hese he source and arge cepsral differences are very small. Mp3 Source 64Kbps size (byes) Conversion Parameers size (byes) Targe WAV size (byes) Table 6: Amoun of ransmied conversion parameers compared o he source and arge segmen sizes. As shown in Table 6, he conversion funcion size is smaller han he Mp3 source signal (77% of he source size) and much smaller han he arge segmen size. If we do no apply cepsral conversion for subbands 1-8 hen he parameers size would be 155Byes (61% of he source size). AES 119h Convenion, New Yor, New Yor, 2005 Ocober 7 10 Page 9 of 10

10 4. CONCLUSIONS AND FUTURE RESEARCH We presened a novel echnique on audio qualiy enhancemen of low bi rae Mp3 signals. Subjecive ess are currenly underway bu he qualiy improvemen is paricularly audible since he source segmen is Mp3 encoded in very low bi rae and herefore i is severely degraded. We have shown hrough objecive means ha he resynhesized signal is closer o he arge (han he source is) in erms of cepsral and residual disances and also in he ime domain by illusraing some subband waveforms. The selecion of subbands ha need residual or cepsral conversion can be deermined robusly by processing only he subbands ha conain he highes residual or cepsral errors, respecively. Furher invesigaion is needed on deermining he opimal number of subbands since i is clear ha a high number of subbands improves he enhancemen performance and can also allow for deecing more redundancies (e.g. source subbands ha are no degraded). The residual conversion scheme could be possibly furher improved by selecing a higher cepsral order for he raining model. Finally, if we apply he resynhesis scheme o a 128bps Mp3 source (which has double he size of he currenly used source) he relaive reducion in conversion parameers would be double he curren one (38% of he source size) or more since i is possible ha fewer subbands would need residual (or cepsral) conversion. Higher bi rae Mp3 source segmens are currenly being esed and naurally he algorihm performance is beer since he overall differences beween he source and arge audio segmens are smaller. 5. ACKNOWLEDGEMENTS Research presened in his paper was funded in par by he Inegraed Media Sysems Cener, a Naional Science Foundaion Engineering Research Cener, Cooperaive Agreemen No. EEC and in par by he US Army Research, Developmen, and Engineering Command (RDECOM). Saemens and opinions expressed do no necessarily reflec he posiion or policy of he Naional Science Foundaion or he US Governmen and no official endorsemen should be inferred. 6. REFERENCES [1] A. Moucharis, S. S. Narayanan and C. Kyriaais, Muli-resoluion specral conversion for mulichannel audio resynhesis, IEEE Proc. In. Conf. Mulimedia and Expo (ICME), vol.2, (Lausanne, Swizerland), pp , Augus [2] Y. Sylianou, O. Cappe and E. Moulines, Coninuous probabilisic ransform for voice conversion, IEEE Trans. Speech and Audio Processing, vol.6, no.2, pp , March [3] A. Kain and M. W. Macon, Specral voice conversion for ex-o-speech synhesis, IEEE Proc. In. Conf. Acousics, Speech and Signal Processing (ICASSP), Seale, WA, May 1998, pp [4] D. Reynolds and R. Rose, Robus ex-independen speaer idenificaion using Gaussian mixure speaer models, IEEE Trans. Speech and Audio Processing, vol.3, no.1, pp.72-83, January [5] L. Rabiner and B. H. Juang, Fundamenals of Speech Recogniion, Prenice Hall, Englewood Cliffs, NJ, [6] J. C. Bezde, Paern Recogniion wih Fuzzy Objecive Funcion Algorihms, Plenum Press, New Yor, NY, [7] F. Muller, Disribuion shape of wo-dimensional DCT coefficiens of naural images, Elecronics Leers, vol.29, no.22, pp , 1993 [8] J. Rissanen, Modeling by shores daa descripion, Auomaica, vol.14, pp , [9] B. Gille and S. King, Transforming Voice Qualiy, Eurospeech, pp , [10] G, Srang and T. Nguyen, Waveles and Filer Bans, Wellesley-Cambridge, [11] D. Sinha and A.H. Tewfi, Low Bi Rae Transparen Audio Compression using Adaped Waveles, IEEE Trans. Signal Processing, vol.41, pp , December [12] P. Noll, MPEG Digial Audio Coding Sandards, CRC Press LLC, AES 119h Convenion, New Yor, New Yor, 2005 Ocober 7 10 Page 10 of 10

Implementing Ray Casting in Tetrahedral Meshes with Programmable Graphics Hardware (Technical Report)

Implementing Ray Casting in Tetrahedral Meshes with Programmable Graphics Hardware (Technical Report) Implemening Ray Casing in Terahedral Meshes wih Programmable Graphics Hardware (Technical Repor) Marin Kraus, Thomas Erl March 28, 2002 1 Inroducion Alhough cell-projecion, e.g., [3, 2], and resampling,