Formalisation of MPEG-1 compressed domain audio features

Size: px
Start display at page:

Download "Formalisation of MPEG-1 compressed domain audio features"

Transcription

1 Techncal Report CSIRO Mathematcal and Informaton Scences Formalsaton of MPEG-1 compressed doman audo features By Slva Pfeffer and Thomas Vncent 18 th December 1 Report Number: 1/196 CSIRO Mathematcal and Informaton Scences Locked Bag 17, North Ryde NSW 167 Australa Contact: Slva.Pfeffer@csro.au

2

3 TABLE OF CONTENTS 1 Introducton... 4 MPEG-1 compressed data MPEG-1 audo encodng Feld nformaton Header-type nformaton Subband values Low-level audo features Pre-processed subband nformaton Cepstral features Energy features Slence statstcs Spectral energy statstcs Bandwdth features Ptch features Segmentaton General segmentaton Scene change Classfcaton Slence determnaton Musc/speech determnaton Sound effects Nose Speaker Recognton and Identfcaton Gender Speech recognton Beat recognton Conclusons REFERENCES MPEG-1 audo features S. Pfeffer, T. Vncent

4 1 Introducton In ths paper we gve an overvew of exstng audo content analyss approaches n the compressed doman and ncorporate them nto a coherent formal structure. We frst examne the knds of nformaton accessble n an MPEG-1 compressed audo stream and descrbe a coherent approach to determne features from these. These features are generc enough to be further processed wth standard audo content analyss approaches. We report on a number of applcatons that have been presented makng use of the compressed doman features. Most of them am at creatng an ndex to the audo stream by segmentng the stream nto temporally coherent regons, whch are often classfed nto a pre-specfed set of classes. We also dscuss recognton and dentfcaton applcatons. MPEG-1 compressed data To understand the knd of features that can be extracted from an MPEG-1 compressed audo stream, we have to understand the meanng of the encoded felds (see [HAS97, ISA93, NUL97, PAN95]). To that end, we frst brefly explan the encodng steps and the resultng feld structures and then explan whch felds contan useful nformaton for content analyss..1 MPEG-1 audo encodng MPEG-1 audo encodng comes n three dfferent flavours called Layers. They ncrease n complexty from Layer 1 to 3 yet all follow the same processng steps: 1. The sampled sound data s broken up nto analyss wndows and transformed nto the frequency doman. A polyphase flterbank calculates 3 frequency band magntudes (called subband values) for each of the three Layers, whch s further refned to 576 subbands for Layer 3 only.. The resultng subband values are manpulated accordng to psychoacoustc models and the desred btrate. The am s to flter out sounds that are masked by other sounds and to arrve at a perceptually lossless compresson. The extent of compresson acheved by ths psychoacoustc flterng s encoder-specfc and not standardsed. 3. The remanng subband magntudes are lnearsed nto a btstream accordng to the btstream format standardsed for the respectve Layer. In ths last step, further compresson can be done (such as Huffman encodng) whch s lossless and explots redundances of the data contaned wthn several successve analyss wndows. The data s then encoded n a so-called audo frames. The resultng fles therefore consst of a sequence of (audo) frames contanng Layerspecfc felds. Fgure 1 dsplays the transformaton that a sequence of 115 PCM samples go through durng encodng. Layers 1 and stop after the frst transform (a polyphase flterbank), whle Layer 3 goes through an addtonal Modfed Dscrete Cosne Transform (MDCT) step. Access of MPEG-1 audo features S. Pfeffer, T. Vncent

5 transformaton coeffcents n Layer 3 can therefore be ether at the flterbank or the MDCT level. I t 115 PCM samples Polyphase Flterbank f 3 subbands (36 ponts each) MDCT MDCT MDCT MDCT MDCT MDCT modfed DCT (36 / 3 x 1 samples) I f Fgure 1: Frequency transformatons of Layer 3 Fgure dsplays the frame formats of all three Layers. The 3 subband values are encoded n groups of 1 (Layer 1 & ) or 18 (Layer 3) subband samples. We call these groups granules. There s only one such granule n a Layer 1 frame, whereas a Layer frame contans three granules and a Layer 3 frame two granules to explot further redundances. A granule n Layer 3 can be vewed as ether consstng of 18 values n each of 3 subbands or of one value n each of 576 subbands dependng on whether one accesses the flterbank coeffcents or the MDCT coeffcents. Layer I & Layer II Frame header Bt allocaton Scalefactor selecton Quantzed values Encoded data Bt allocaton Scalefactor selecton Quantzed values Channel 1 Channel Layer III (MP3) Frame header Sde nfo Encoded data Scalefactors Scalefactors Scalefactors data factors data factors data factors Huffman Scale- Huffman Scale- Huffman Scale- Huffman data Channel 1 Channel Channel 1 Channel Granule 1 Granule Man data Fgure : Frame formats for all three Layers Except for the number of granules, Layer 1 and encodngs are the same. Ther subband values are encoded n the quantsed values feld after havng beng normalsed wth scalefactors. The number of bts requred for the quantsed values s encoded n the bt allocaton feld. Addtonally, the scalefactor selecton feld used by Layer stores how many of the three scalefactors of each granule were dfferent and thus had to be encoded. MPEG-1 audo features S. Pfeffer, T. Vncent

6 Layer 3 s dfferent. As mentoned above, t results n 576 subband values. These are further compressed manly by use of a Huffman compresson scheme after careful groupng of subbands (see Fgure 3) f Pars of bg values,.e. value < 8191 Quadruples of small values,.e. value {, -1, 1} Pars of zero values,.e. value = Regon Regon 1 Regon Huffman encoded Not encoded Nose allocaton & scalefactor calculaton Huffman encoded Fgure 3: Layer 3 encodng of frequency bands. Feld nformaton Wthout gong back to decodng an MPEG audo fle to PCM samples, there are two types of nformaton that can be used as features on whch to base audo content analyss approaches: the nformaton encoded n the header-lke felds (header, bt allocaton, scalefactor selecton, scalefactors, sde nformaton) and the encoded subband values...1 Header-type nformaton Wang and Vlermo [WAN1] have used the wndow type nformaton of Layer 3 to detect beats. Layer 3 uses four dfferent knds of analyss wndows: long, long-to-short, short, and shortto-long. The short wndows are used for short but ntensve sounds for whch the long wndow would ntroduce too much pre-echo. They found that the wndow-swtchng pattern of pop-musc beats for ther specfc encoder at btrates of kbps gves (long, long-to-short, short, short, short-to-long, long) wndow sequences n 99% of the beats. Our own research has also examned the header-type felds [BAR97]. We have used the followng felds of Layers 1 & wth the gven feature nterpretaton: Bt allocaton: stores the dynamc range of a sequence of subband values. Scalefactors: stores nformaton on the maxmum loudness of a sequence of subband values. Scalefactor selecton nformaton (Layer only): stores how the loudness changes on three subsequent groups; a value of ndcates no change, so the loudness s stable, a value of MPEG-1 audo features S. Pfeffer, T. Vncent

7 ndcates a transent change, and the values 1 and 3 ndcate an unstable change... Subband values Bascally all other publshed research on compressed-doman audo analyss has used the subband values as a startng pont for feature calculaton. Thus t s requred to decode the MPEG audo stream enough to access the subband values. For all three Layers, the subband values are not avalable drectly n the lnearsed fle/stream but have to be reconstructed from the encoded felds mplyng some processng cost. The most tme consumng step for decodng an MPEG audo stream s however the resynthess of PCM samples and ths s avoded as the subband values are stll n the compressed doman. In Layers 1 and the subband values may be approxmated by drectly usng the quantsed values n an encoded frame (whch can only be extracted from the fle wth help of the btallocaton nformaton). Ths however gnores the fact that the values are normalsed by the scalefactors n each of the 3 subbands. So, to arrve at the subband values encoded n the fle, one has to use the quantsed values and denormalse them. In Layer 3 there are 576 subband values. To extract the frequency band magntudes from the fle, t s necessary to decode the quantsed samples wth a Huffman decoder. Then, scalefactors have to be readjusted, whch served to colour the quantsaton nose, and quantsaton has to be reversed. After ths, one reaches the alas reduced MDCT coeffcents, whch we call the 576 subband values. To acheve features whch are comparable ndependent of the encoded Layer, t s possble to further decode the 576 MDCT coeffcents to the orgnal 3 Polyphase Flterbank coeffcents, but there s a processng cost assocated and a loss of frequency resoluton. One however gans on temporal resoluton, whch mght be more approprate n certan applcaton areas. Subband values are n the followng denoted by s (n), beng the subband number, I-1, (I=3 n Layers 1 and, and possbly 576 n Layer 3) and n the tme ndex. In the followng, all ndex values wll start wth. The tme ndex n s vewed from a whole fle perspectve. As an example, f we take a Layer encoded audo fle, ts 5 th frame, ts nd granule, and the 5 th subband value (out of the 1) of the 1 th subband, we access the value s 9 (16) (see Fgure 4). Frame Frame 1 Frame Frame 3 Granule Granule1 Granule Frame 4 Granule Wndow t S 9 (16) n t M-1 Fgure 4: Explanaton of subband numberng scheme. MPEG-1 audo features S. Pfeffer, T. Vncent

8 Dependng on the requred resoluton of an analyss, one may choose to calculate features on the subband value resoluton level, on the granule resoluton level, or on the frame resoluton level. A statstcal analyss on a larger tme wndow may thus be calculated on a multplcty of any of these resoluton levels. In our own research we have chosen the subband value or granule resoluton, whle many others prefer the frame resoluton. Most analyss algorthms work on a wndow of samples. Independent of the choce of resoluton, we denote the wndow sze by M and the tme poston wthn a wndow by m, m M-1. t wll denote the wndow number whle gong over a fle, whch s closely related to the tme poston wthn a fle. A subband value at wndow poston m s accessed dependng on the choce of resoluton for analyss. For example, f we select a granule as wndow sze, have consecutve non-overlappng wndows only, and work on a subband value resoluton level, the above subband value s 9 (16) wll be n wndow number t=13 at poston m=4 (M=1 s the mpled wndow sze for the Layer granule here) (see Fgure 4). Synopss of used terms: n Tme ndex n N-1 Subband ndex I-1 s (n) Subband value at tme ndex n for subband I m Tme poston wthn wndow m M-1 M Analyss wndow sze t Analyss wndow number 3 Low-level audo features Gong from the subband values to hgh-level analyss such as segmentaton, classfcaton, recognton and dentfcaton, requres frstly the calculaton of low-level audo features on whch the further analyss wll be based. Most often nformaton on the subband energes s used as a startng pont for the analyss of the features. 3.1 Pre-processed subband nformaton Instead of usng the subband values themselves to access subband energes, some researchers preprocess them to acheve dfferent goals. Tzanetaks et al. [TZA] use the root mean squared (RMS) subband vector on a frame resoluton because ths s a better measure of sgnal energy than the subband values themselves. A generalsed formula for ther approach, ndependent of granule, frame or any larger wndow, s gven by: s ( t) = 1 M M 1 m= s ( Mt + m). Ths enables one to arrve at 3 or 576 subband values by averagng the M subband values n the wndow. MPEG-1 audo features S. Pfeffer, T. Vncent

9 Boccgnone et al. [BOC99] use the subband mean value n a wndow of sze M as feature as a way to go to a frame resoluton. A generalsed formula for ther approach, ndependent of granule, frame or any larger wndow, s gven by: M 1 1 µ ( t) = s ( Mt + m). M m= Nakajma et al. [NAK99] calculate the normalsed subband energy from the subband samples of a frame to absorb sound level dependency on audo source. The followng formula normalses a sngle subband value on the maxmum of all subband values at the same tme ndex n: s ( n) ς ( n) = 1log1 ( ). max( s ( n) : j I 1) j Gong to a lower resoluton for a wndow of sze M, we have also used the subband mean as a bass for calculaton of a normalsed subband energy: µ ( t) ς ( t) = 1log1 ( ). max( µ ( t) : j I 1) j In our own work we have made use of the scalefactors n Layer 1 and for a granule resoluton subband energy measure [BAR97]. The scalefactors are the maxmum value of the sequence of subband values wthn a granule: scf ( t) = max( s ( Mt + m) : m M 1), wth Ι 1. The two values s (n) and ς (n) work at the subband value resoluton, whle s (t), µ I (t),ς (t) and scf (t) work on a granule, frame or any larger wndow resoluton. In the followng subsectons, any of these values may be used n the formulas nterchangeably. The choce depends on the goals of the analyss. As a placeholder we wll use s (t). 3. Cepstral features For speech and speaker recognton approaches t s standard to use a representaton of the audo sgnal n the frequency doman as feature. Cepstral coeffcents have proven to be partcularly successful. They are calculated by performng another frequency transform on the logarthm of spectral coeffcents. Both the 3 subband values of Layer 1 and, and the 576 subband values of Layer 3 are lnearly spaced spectral coeffcents and serve well as a bass for calculaton of cepstral coeffcents. Venugopal et al. [VEN99] calculate lnear frequency cepstral coeffcents from the subband values and use them for speaker dentfcaton. Yapp et al. [YAP97] use the quantsed values of the frst nne subbands for ther speech recognton system. To reach a hgher frequency resoluton they apply the Fast Fourer MPEG-1 audo features S. Pfeffer, T. Vncent

10 Transform (FFT) on subband wndows of sze 5 ms usng a Hammng wndow and paddng the number of samples to a power of. They take the magntude of the FFT values and assemble them over the subbands nto one sngle frequency vector. Ths feature vector s now mel-warped and used to calculate cepstral, delta cepstral, and acceleraton coeffcents as features. 3.3 Energy features Sgnal energy features are closely related to the human loudness percepton. When calculatng energy features n the compressed doman rather than from uncompressed PCM samples, the results are closer approxmatons of perceptual loudness because the subband values have been fltered by the psychoacoustc model and thus the nfluence of non-hearable frequences s reduced. The dsadvantage however s that sgnal energy s dstrbuted over the frequency bands and thus has to be added up requrng hgher computatonal complexty than on uncompressed sgnals. Sgnal energy measures are often used for segmentaton of an audo stream. A sgnal's start and end tmes are then usually determned by thresholdng. Patel et al. [PAT96] calculate sgnal energy for a wndow of sze M from the subband values. Nakajma et al. [NAK99] restrct ther loudness measurement to the energy n the lowest subband as ths s the one where most energy s concentrated and ths restrcton provdes a consderable effcency ncrease. A generalsed formula for sgnal energy s gven by: I 1 M 1 1 E ( t) = s ( Mt + m) h( M 1 m). I M = m= The wndow functon h(m) may be e.g. a Rectangular, Hammng, Hannng, Welch or Bartlett wndow dependng on the requred narrowness or peakness of spectral leakage. Tzanetaks et al. [TZA] prefer to use the RMS of the sgnal energy for loudness approxmaton whch acheves a better separaton for low level values. Another loudness approxmaton s the sgnal magntude, whch s less senstve to nose than sgnal energy. It can be calculated analogously to sgnal energy [PAT96] va: I 1 M 1 1 M ( t) = s ( Mt + m) h( M 1 m). I M = m= In our own work we have used the sum of scalefactors on a granule resoluton for a fast approxmaton of the sgnal magntude on Layer 1 & frames [BAR97, PFE99]. Wang and Vlermo [WAN1] calculate the band energy of several subbands and use a threshold on them to determne a confdence for a pop-musc beat n the granule: E ( t) = J M 1 ( J1, J ) = J1 m= s ( Mt + m) h( M 1 m). MPEG-1 audo features S. Pfeffer, T. Vncent

11 3.4 Slence statstcs Slence statstcs are often used as ndcators for classfcaton of audo segments nto dfferent sgnal classes. Speech segments for example generally contan a lot more slence than musc segments. Therefore, Patel et al. [PAT96] propose pause rate as an ndcator to separate speech from non-speech sgnals: M 1 1 P ( t) = ( E( Mt + m) > Ts ) ( E( Mt + m + 1) > T s ) M m= wth T s as the slence energy threshold. P counts the number of slent segments on a tme nterval of sze M. Smlarly, Tzanetaks et al. [TZA] use a low energy feature to separate speech from musc. On a wndow of about 1 sec (M=4 frames) they calculate the percentage of frames that have less than the average energy E (t) : 1 L ( t) = M M 1 m= ( E( Mt + m) < E( t)). Nakajma et al. [NAK99] call ther slence statstc energy densty. It s also defned on a wndow of 1 sec and bascally calculated as the log value of the varance of L(t). 3.5 Spectral energy statstcs Spectral energy statstcs capture subband energy dstrbuton features, whch are ndcatve for specfc types of sounds. The spectral centrod s the balancng pont of the subband energy dstrbuton [BAR97, TZA]. It s thus calculated as the frst moment of the subband energy dstrbuton: I 1 ( + 1) s ( t) = C ( t) =. I 1 s ( t) = It determnes the frequency area around whch most of the sgnal energy concentrates and s thus closely related to the tme-doman zero crossng rate (ZCR) feature often used n speech recognton systems to determne exact start- and endponts of talkspurts. It s also frequently used as an approxmaton for a perceptual brghtness measure [BAR97]. Nakajma et al. [NAK99] use the squared subband samples n ther spectral centrod calculaton to better spread out the centrod values. The spectral rolloff pont R s determned where 85% of the wndow's energy s acheved: R = s ( t) =.85 I 1 = s ( t). MPEG-1 audo features S. Pfeffer, T. Vncent

12 It s used to dstngush voced speech from unvoced speech and musc [TZA], whch have a hgher rolloff pont because ther power s better dstrbuted over the subband range. Patel et al. [PAT96] propose a feature called band energy rato, whch sets the energy of the low frequences (subbands to J-1) n relaton to the hgh frequences (subbands J to I-1): J 1 M 1 s = m= B ( t) = I M 1 s = J m= ( Mt + m) h( M 1 m). ( Mt + m) h( M 1 m) Ths s ndcatve of the vocedness of a sound. They clam that J= s a good choce because voced sgnal energy concentrates below 1.5kHz whle unvoced sgnal energy s dstrbuted over all subbands. The spectral flux of two successve wndows t and t+1 s calculated as the -norm of the dfference between normalzed subband value vectors at t and t+1 [TZA]: ( t, t + 1) = 1 s ( t) s ( t + 1). = max( s ( t) : j I 1) max( s ( t + 1) : j I 1) I j Whle the frst three statstcs calculated spectral energy dstrbuton features on one wndow, the spectral flux determnes changes of spectral energy dstrbuton of two successve wndows. The subband central moments calculated by Boccgnone et al. [BOC99] on the contrary calculate statstcs wthn subbands over several frames. They capture how much a subband's energy s dspersed from ts mean: M 1 k 1 k D ( t) = ( s ( Mt + m) µ ( t)) wth k=,...,5. M m= j 3.6 Bandwdth features The bandwdth covered by a wndow s calculated from all subbands wth suffcent energy: BW t) = max( : ( I 1) ( s ( t) > T )) mn( : ( I 1) ( s ( t) > T )). ( s s It s stpulated that the bandwdth of speech s usually narrower than that of musc [NAK99, VEN99]. Nakajma et al. [NAK99] propose to determne bandwdth nformaton by countng the number of subbands wth sgnfcant level: SB ( t) = I 1 = ( s ( t) > T s ). If a wndow s content covers a lot of subbands such as musc, SB(t) becomes large. In addton to measurng the subband range that a wndow contans, ths also takes nto account how strong the subbands n between are represented. MPEG-1 audo features S. Pfeffer, T. Vncent

13 3.7 Ptch features Ptch s ndcatve of a speaker and thus an mportant property of a sound. Patel et al. [PAT96] calculate the ptch of a sound sgnal n the compressed doman by usng the autocorrelaton functon of the values of the frst subband on 3% overlappng wndows. Wth an overlap of o samples, the related generc formula can be gven va: M 1 1 A ( t, k) = ( s (( M o) t + m) s (( M o) t + m + k)). M m= They calculate the ptch only on wndows of suffcent energy to reduce processng tme on slences. In addton they perform nonlnear clppng of small subband values to avod confuson of the frst and second formants. They choose the largest autocorrelaton peak value as the ptch f t contans more than 3% of the wndow's energy. Venugopal et al. [VEN99] use the analyss-by-synthess approach of the Multband Exctaton Vocoder for ptch estmaton. In t, speech s synthessed and an unbased error measure s calculated by comparson to the orgnal speech. The ptch perod s the perod used when the error s mnmum. 4 Segmentaton When talkng about segmentaton of an audo stream, temporal segmentaton s usually the subject. The dentfcaton of the sound components that belong to one specfc sound event could be regarded as a spato-temporal segmentaton. Ths s a hard task and beng researched n the feld of computatonal audtory scene analyss (CASA). We are not aware of any approaches toward CASA n the MPEG-1 compressed doman. So, here we concentrate on temporal segmentaton n whch specfc temporal fragments of an audo stream are dentfed for ther homogenous content accordng to some crtera. Exstng segmentaton approaches determne fragment boundares based on e.g. strong changes of a specfc feature or relatve pauses. For such segmentaton approaches, the presented features are often not used drectly nstead ther mean and varances are calculated on larger wndows of about 1-4 sec. Addtonally, logtransforms of the results can be used to reduce the dynamc range and make the clusters n classfcaton more compact [TZA]. 4.1 General segmentaton Tzanetaks et al. [TZA] perform generc audo segmentaton at texture change nstants. They use the features low energy, and mean and varances of spectral centrod, spectral rolloff pont, spectral flux, and RMS energy n a feature vector. They calculate the Mahalanobs dstance between successve feature vectors, and dfferentate t. Peaks are then pcked as segment boundares va an adaptve thresholdng algorthm, whch ncludes a MPEG-1 audo features S. Pfeffer, T. Vncent

14 mnmum duraton condton to avod small regons. They acheve up to 75% consstency wth human segmentaton results. Our own approach to segmentaton uses the sgnal magntude as feature [PFE1] to calculate relatve pauses. The segmentaton algorthm also follows an adaptve thresholdng approach on sec ntervals. Wndows are determned as slence f ther sgnal magntude stays under the threshold. Segmentaton uses a mnmum duraton and a maxmum tolerated nterrupton parameter. A sequence of slence wndows gets clustered nto a pause segment f t covers at least the mnmum duraton and s not nterrupted by non-slence wndows longer than the tolerated nterrupton. We acheve ht rates between 46% and 97% when comparng to human segmentaton results dependng upon the SNR of the materal. 4. Scene change Barrass [BAR97] calculates a runnng average of the spectral centrod called brghtness hstory. Sudden changes n brghtness are used for scene change detecton. Boccgnone et al. [BOC99] calculate vdeo scene changes based on audo and vdeo breaks. Vdeo analyss provdes shot boundares, whch are scene change canddates. Audo analyss valdates the canddates. They calculate the subband mean energy and four subband central moments on the frst 8 subbands and accumulate these nto one feature vector. Then, an Artfcal Neural Network s traned to partton the feature space nto slence, speech, musc, and nose resultng n transton ponts between dfferent sound classes. These are used to valdate the shot boundares. They acheve a ht rate between 6% and 93% for audo breaks n comparson to human results. 5 Classfcaton Although temporal segmentaton s an mportant frst step n determnng the structure of an audo stream, automatc determnaton of more nformaton on the actual content of the fragments s of hgher value. Thus the next step s to classfy the fragment content nto a gven set of sound classes. Generc classes are slence, musc, speech, and nose. Accordng to the requrements of the applcaton, more specfc classfcatons may be requred, such as the determnaton of the type of sound effect or the separaton of speakers. 5.1 Slence determnaton Patel et al. [PAT96] use a standard threshold approach on the average sgnal energy of a vdeo clp to determne whether the clp contans slence. Nakajma et al. [NAK99] use the varance of subband energy to dstngush between slence and non-slence: MPEG-1 audo features S. Pfeffer, T. Vncent

15 M 1 1 σ ( n ) = ( s ( m) µ ( m)). M m= They choose M = 1 sec and use only one subband sample per frame. Ths sub-samplng ncreases calculaton speed enormously. A slent segment s declared where σ (n)<t s wth T s as the slence energy threshold. They acheve a ht rate of 91% wth 13% false hts. The false hts are manly attrbuted to mxed sgnal 1 sec wndows. 5. Musc/speech determnaton Patel et al. [PAT96] classfy audo segments determned by vdeo shot boundary detecton. If the band energy rato les above.8 or the pause rate s below. or there s no ptch found, the segment s classfed as a muscal clp, else as speech clp. Nakajma et al. [NAK99] use the energy densty and the average number of subbands wth sgnfcant level on 1 sec wndows to dstngush between musc and speech. Musc has a hgher energy densty than speech. Musc also usually has sgnfcant subbands up to subband number, whereas speech rarely goes beyond subband 1. They use a multvarate Gaussan dstrbuton to model the classes and acheve a ht rate of 93% for musc (wth 4% false hts) and of 88% for speech (wth 16% false hts). The false hts for speech stem from ntermttent sounds such as drum solo. Tzanetaks et al. [TZA] use the features low energy, and the mean and varances of the spectral centrod, spectral rolloff pont, and spectral flux n a feature vector. They compare a multvarate Gaussan dstrbuton classfer to a K-Nearest Neghbour classfer to dstngush speech from musc. They evaluate them on about hours of audo data and compare results on a frame bass achevng about 8% accuracy for the Gaussan dstrbuton and about 85% accuracy for the K-NN. In comparson to the classfcaton of PCM data, the results only degrade by about %. Barrass [BAR97] determnes musc on Layer fles as a sgnal that exhbts long-term wdeband stablty. Ths stablty s calculated from the scalefactor selecton nformaton. A sgnal s determned as long-term stable f more than 6% of the non-zero subbands of a frame have a repeated scalefactor. Coverage of the subbands must be at least 4 out of 3 subbands. In contrast, speech s determned as a sgnal wth low- or md-range brghtness and stablty. The brghtness s calculated va the spectral centrod of the scalefactors. Low-range brghtness ndcates a male voce, md-range brghtness a female one and hgh-range brghtness agan sgnfes musc. 5.3 Sound effects Nakajma et al. [NAK99] use the average and varance of the spectral centrod on 1 sec wndows to determne applause on ther TV program sound tracks. Applause has a contnuous selfsmlarty and stable centre frequency. They acheve a ht rate of 74% wth 15% false hts, whch MPEG-1 audo features S. Pfeffer, T. Vncent

16 occur manly on mxed sgnal 1 sec wndows. 5.4 Nose Barrass [BAR97] determnes nose on Layer fles as a sgnal that exhbts long-term wde-band transence. Ths transence s calculated from the scalefactor selecton nformaton. A sgnal s determned as long-term transent f more than 3% of the non-zero subbands of a frame have non-repeated scalefactors. Coverage of the subbands must be at least 1 out of 4 subbands. In addton, short, loud and brght sgnals are determned as a clang and also classfed as a nose. 5.5 Speaker To dstngush between sx dfferent speakers, Venugopal et al. [VEN99] use normalsed lnear frequency cepstral coeffcents and estmate Gaussan Mxture Model parameters usng the Expectaton Maxmsaton algorthm. 6 Recognton and Identfcaton On speech segments, recognton and dentfcaton of more specfc sound content s possble such as the gender of a speaker segment, the speaker tself, and the content of hs speech. On musc segments, recognton of beats and dentfcaton of rhythms can be performed. We report on one beat recognton approach based on MPEG-1 compressed doman features. 6.1 Gender Venugopal et al. [VEN99] use the ptch estmaton of the Multband Exctaton Vocoder and declare the speaker as male f the ptch s between 6 and 1 Hz and female between 1 and Hz. They acheve a ht rate of about 8%. As mentoned above, Barrass [BAR97] determnes the gender of a speech frame va the brghtness of the sgnal, whch s calculated from the spectral centrod of the scalefactors n Layer. If the spectral centrod les below the second subband, t s determned as male, and between the thrd and sxth subband as female. 6. Speech recognton Yapp and Zck [YAP97] mplemented a speaker-dependent, small vocabulary speech recognton system that uses compressed doman features. They calculate cepstral, delta cepstral, and acceleraton coeffcents as descrbed n Secton..1. Then they tran Hdden Markov Models MPEG-1 audo features S. Pfeffer, T. Vncent

17 (HMMs) on 17 words. Tranng and recognton were both performed wth contnuously spoken sentences. A word-level accuracy of 99% was obtaned on Layer encoded data at 3 kbt/s. Ther system works on Layer 1 and Layer and nterlayer tranng and recognton s possble. 6.3 Beat recognton Wang and Vlermo [WAN1] presented a compressed-doman beat detector for pop-musc wth the am of replacng beats that were lost durng an Internet transmsson of a pop-song wth prevously stored beat samples of that song. They use the wndow type nformaton of Layer 3 fles and the band energy of four frequency ranges for beat detecton. The four frequency ranges are: the full-band energy and the frequency ntervals -459, , and Hz. The mddle frequency ranges usually gve poor beat nformaton because other nstruments and sngng are more domnant n these ranges. When restrctng the search for beats to the most probable tmes after nter beat ntervals (IBIs), they detect most beats. 7 Conclusons In ths paper we have gven an overvew of the knd of features that have been extracted n the MPEG-1 audo compressed doman. Consderng the amount of MPEG-1 Layer 3 (MP3) fles avalable nowadays, audo analyss on compressed fles s bound to be n great demand soon. Research n ths feld s stll n ts nfancy and there are stll many opportuntes to pursue for fundamental research. Audo analyss results can be used more powerfully when used n conjuncton wth vdeo analyss results to acheve automatc extracton of more abstract concepts. On ts own t can be used for sound-based audo search engnes no more based on textual queres and flenames but on the audo content. ACKNOWLEDGEMENTS We thank Conrad Parker and Stephen Barrass for ther proof-readng and feedback to ths paper. MPEG-1 audo features S. Pfeffer, T. Vncent

18 REFERENCES [BAR97] [BOC99] Stephen Barrass Blby A tool for foragng n MPEG audo, Techncal Report No. 1/98, CSIRO Mathematcal and Informaton Scences, Nov (publshed June 1). G. Boccgnone, M. De Santo, and G. Percannella, "Jont Audo-Vdeo Processng of MPEG Encoded Sequences", Proc. IEEE Intl. Conf. on Multmeda Computng and Systems (ICMCS), Vol., 1999, pp [HAS97] Haskell, Pur, and Netraval, "Dgtal Vdeo: An Introducton to MPEG-, Chapter 4, 1997, Chapman & Hall, New York. [ISA93] ISO, Internatonal Organzaton for Standardzaton, "Internatonal Standard , Informaton Technology - Codng of movng pctures and assocated audo for dgtal storage meda at up to about 1,5 MBt/s - Part 3: Audo", [NAK99] Y. Nakajma, Y. Lu, M. Sugano, A. Yoneyama, H. Yanaghara, and A. Kurematsu, "A Fast Audo Classfcaton From MPEG", Proc. IEEE Intl. Conf. on Acoustcs, Speech, and Sgnal Processng (ICASSP), Vol. IV, 1999, Phoenx, Arzona, USA, pp [NOL97] P. Noll, "MPEG Dgtal Audo Codng", IEEE Sgnal Processng Magazne, Sept. 1997, pp [PAN95] D. Pan, "A Tutoral on MPEG/Audo Compresson", IEEE Multmeda, Vol. No., [PAT96] [PFE1] [PFE99] [TZA] [VEN99] summer 1995, pp N.V. Patel, and I.K. Seth, "Audo characterzaton for vdeo ndexng", Proc. SPIE, Storage and Retreval for Stll Image and Vdeo Databases IV, Vol. 67, 1996, San Jose, CA, USA, pp Slva Pfeffer Pause concepts for audo segmentaton at dfferent semantc levels, Proc. ACM Multmeda 1, Sept 3 Oct 5, Ottawa, Ontaro, Canada, pp , 1. S. Pfeffer, J. Robert-Rbes, and D. Km, "Audo Content Extracton from MPEGencoded sequences", Proc. Ffth Jont Conference on Informaton Scences, pp , 1999, Vol. II, Atlantc Cty, New Jersey. G. Tzanetaks, and P. Cook, "Sound analyss usng MPEG compressed audo", Proc. IEEE Intl. Conf. on Acoustcs, Speech, and Sgnal Processng ICASSP, pp , Vol., Istanbul, Turkey. S. Venugopal, K.R. Ramakrshnan, S.H. Srnvas, and N. Balakrshnan, "Audo scene analyss and scene change detecton n the MPEG compressed doman", IEEE Thrd Workshop on Multmeda Sgnal Processng, MMSP 19999, pp [WAN1] Ye Wang, Mkka Vlermo A compressed doman beat detector usng MP3 audo btstreams, Proc. ACM Multmeda 1, Sept 3 Oct 5, Ottawa, Ontaro, Canada, pp. 194-, 1. [YAP97] L. Yapp, and G. Zck, "Speech recognton on MPEG/audo encoded fles", Proc. IEEE Intl. Conf. on Multmeda Computng and Systems (ICMCS), pp , 1997, Ottawa, Canada. MPEG-1 audo features S. Pfeffer, T. Vncent

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

[33]. As we have seen there are different algorithms for compressing the speech. The

[33]. As we have seen there are different algorithms for compressing the speech. The 49 5. LD-CELP SPEECH CODER 5.1 INTRODUCTION Speech compresson s one of the mportant doman n dgtal communcaton [33]. As we have seen there are dfferent algorthms for compressng the speech. The mportant

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Quantization Noise Power Injection In Subband Audio Coding Using Low Selectivity Filter Banks

Quantization Noise Power Injection In Subband Audio Coding Using Low Selectivity Filter Banks Quantzaton Nose Power Injecton In Subband Audo Codng Usng Low Selectvty Flter Banks D. ARTÍNEZ -UÑOZ, N. RUIZ-REYES, P. VERA-CANDEAS, P.J. RECHE-LÓPEZ, J. CURPIÁN-ALONSO Departamento de Electrónca Unversdad

More information

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe CSCI 104 Sortng Algorthms Mark Redekopp Davd Kempe Algorthm Effcency SORTING 2 Sortng If we have an unordered lst, sequental search becomes our only choce If we wll perform a lot of searches t may be benefcal

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION 1 THE PUBLISHING HOUSE PROCEEDINGS OF THE ROMANIAN ACADEMY, Seres A, OF THE ROMANIAN ACADEMY Volume 4, Number 2/2003, pp.000-000 A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION Tudor BARBU Insttute

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Why consder unlabeled samples?. Collectng and labelng large set of samples s costly Gettng recorded speech s free, labelng s tme consumng 2. Classfer could be desgned

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Detection of an Object by using Principal Component Analysis

Detection of an Object by using Principal Component Analysis Detecton of an Object by usng Prncpal Component Analyss 1. G. Nagaven, 2. Dr. T. Sreenvasulu Reddy 1. M.Tech, Department of EEE, SVUCE, Trupath, Inda. 2. Assoc. Professor, Department of ECE, SVUCE, Trupath,

More information

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research

More information

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z. TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS Muradalyev AZ Azerbajan Scentfc-Research and Desgn-Prospectng Insttute of Energetc AZ1012, Ave HZardab-94 E-mal:aydn_murad@yahoocom Importance of

More information

Image Representation & Visualization Basic Imaging Algorithms Shape Representation and Analysis. outline

Image Representation & Visualization Basic Imaging Algorithms Shape Representation and Analysis. outline mage Vsualzaton mage Vsualzaton mage Representaton & Vsualzaton Basc magng Algorthms Shape Representaton and Analyss outlne mage Representaton & Vsualzaton Basc magng Algorthms Shape Representaton and

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

A Semi-parametric Regression Model to Estimate Variability of NO 2

A Semi-parametric Regression Model to Estimate Variability of NO 2 Envronment and Polluton; Vol. 2, No. 1; 2013 ISSN 1927-0909 E-ISSN 1927-0917 Publshed by Canadan Center of Scence and Educaton A Sem-parametrc Regresson Model to Estmate Varablty of NO 2 Meczysław Szyszkowcz

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.

More information

Feature Extraction and Test Algorithm for Speaker Verification

Feature Extraction and Test Algorithm for Speaker Verification Feature Extracton and Test Algorthm for Speaker Verfcaton Wu Guo, Renhua Wang and Lrong Da Unversty of Scence and Technology of Chna, Hefe guowu@mal.ustc.edu.cn,{rhw, lrda}@ustc,edu.cn Abstract. In ths

More information

Classification Based Mode Decisions for Video over Networks

Classification Based Mode Decisions for Video over Networks Classfcaton Based Mode Decsons for Vdeo over Networks Deepak S. Turaga and Tsuhan Chen Advanced Multmeda Processng Lab Tranng data for Inter-Intra Decson Inter-Intra Decson Regons pdf 6 5 6 5 Energy 4

More information

Music/Voice Separation using the Similarity Matrix. Zafar Rafii & Bryan Pardo

Music/Voice Separation using the Similarity Matrix. Zafar Rafii & Bryan Pardo Musc/Voce Separaton usng the Smlarty Matrx Zafar Raf & Bryan Pardo Introducton Muscal peces are often characterzed by an underlyng repeatng structure over whch varyng elements are supermposed Propellerheads

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

Efficient Content Representation in MPEG Video Databases

Efficient Content Representation in MPEG Video Databases Effcent Content Representaton n MPEG Vdeo Databases Yanns S. Avrths, Nkolaos D. Doulams, Anastasos D. Doulams and Stefanos D. Kollas Department of Electrcal and Computer Engneerng Natonal Techncal Unversty

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

Enhanced AMBTC for Image Compression using Block Classification and Interpolation

Enhanced AMBTC for Image Compression using Block Classification and Interpolation Internatonal Journal of Computer Applcatons (0975 8887) Volume 5 No.0, August 0 Enhanced AMBTC for Image Compresson usng Block Classfcaton and Interpolaton S. Vmala Dept. of Comp. Scence Mother Teresa

More information

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE Dorna Purcaru Faculty of Automaton, Computers and Electroncs Unersty of Craoa 13 Al. I. Cuza Street, Craoa RO-1100 ROMANIA E-mal: dpurcaru@electroncs.uc.ro

More information

Simulation Based Analysis of FAST TCP using OMNET++

Simulation Based Analysis of FAST TCP using OMNET++ Smulaton Based Analyss of FAST TCP usng OMNET++ Umar ul Hassan 04030038@lums.edu.pk Md Term Report CS678 Topcs n Internet Research Sprng, 2006 Introducton Internet traffc s doublng roughly every 3 months

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Face Recognition University at Buffalo CSE666 Lecture Slides Resources: Face Recognton Unversty at Buffalo CSE666 Lecture Sldes Resources: http://www.face-rec.org/algorthms/ Overvew of face recognton algorthms Correlaton - Pxel based correspondence between two face mages Structural

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

COMPARISON OF ENHANCED SCHEMES FOR AUDIO CLASSIFICATION

COMPARISON OF ENHANCED SCHEMES FOR AUDIO CLASSIFICATION Volume 4, No. 1, December 13 Journal of Global Research n Computer Scence REVIEW ARTICLE Avalable Onlne at www.grcs.nfo COMPARISON OF ENHANCED SCHEMES FOR AUDIO CLASSIFICATION Dr. V. Radha *1 and G.Anuradha

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Shape-adaptive DCT and Its Application in Region-based Image Coding

Shape-adaptive DCT and Its Application in Region-based Image Coding Internatonal Journal of Sgnal Processng, Image Processng and Pattern Recognton, pp.99-108 http://dx.do.org/10.14257/sp.2014.7.1.10 Shape-adaptve DCT and Its Applcaton n Regon-based Image Codng Yamn Zheng,

More information

Key-Selective Patchwork Method for Audio Watermarking

Key-Selective Patchwork Method for Audio Watermarking Internatonal Journal of Dgtal Content Technology and ts Applcatons Volume 4, Number 4, July 2010 Key-Selectve Patchwork Method for Audo Watermarkng 1 Ch-Man Pun, 2 Jng-Jng Jang 1, Frst and Correspondng

More information

Coding Artifact Reduction Using Edge Map Guided Adaptive and Fuzzy Filter

Coding Artifact Reduction Using Edge Map Guided Adaptive and Fuzzy Filter MEL A MITSUBISHI ELECTIC ESEACH LABOATOY http://www.merl.com Codng Artfact educton Usng Edge Map Guded Adaptve and Fuzzy Flter Hao-Song Kong Yao Ne Anthony Vetro Hufang Sun Kenneth E. Barner T-2004-056

More information

Audio Content Classification Method Research Based on Two-step Strategy

Audio Content Classification Method Research Based on Two-step Strategy (IJACSA) Internatonal Journal of Advanced Computer Scence and Applcatons, Audo Content Classfcaton Method Research Based on Two-step Strategy Sume Lang Department of Computer Scence and Technology Chongqng

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics Introducton G10 NAG Fortran Lbrary Chapter Introducton G10 Smoothng n Statstcs Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Smoothng Methods... 2 2.2 Smoothng Splnes and Regresson

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

Convolutional interleaver for unequal error protection of turbo codes

Convolutional interleaver for unequal error protection of turbo codes Convolutonal nterleaver for unequal error protecton of turbo codes Sna Vaf, Tadeusz Wysock, Ian Burnett Unversty of Wollongong, SW 2522, Australa E-mal:{sv39,wysock,an_burnett}@uow.edu.au Abstract: Ths

More information

Real-time interactive applications

Real-time interactive applications Real-tme nteractve applcatons PC-2-PC phone PC-2-phone Dalpad Net2phone vdeoconference Webcams Now we look at a PC-2-PC Internet phone example n detal Internet phone over best-effort (1) Best effort packet

More information

A Gradient Difference based Technique for Video Text Detection

A Gradient Difference based Technique for Video Text Detection A Gradent Dfference based Technque for Vdeo Text Detecton Palaahnakote Shvakumara, Trung Quy Phan and Chew Lm Tan School of Computng, Natonal Unversty of Sngapore {shva, phanquyt, tancl }@comp.nus.edu.sg

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

IP Camera Configuration Software Instruction Manual

IP Camera Configuration Software Instruction Manual IP Camera 9483 - Confguraton Software Instructon Manual VBD 612-4 (10.14) Dear Customer, Wth your purchase of ths IP Camera, you have chosen a qualty product manufactured by RADEMACHER. Thank you for the

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

A Gradient Difference based Technique for Video Text Detection

A Gradient Difference based Technique for Video Text Detection 2009 10th Internatonal Conference on Document Analyss and Recognton A Gradent Dfference based Technque for Vdeo Text Detecton Palaahnakote Shvakumara, Trung Quy Phan and Chew Lm Tan School of Computng,

More information

Detecting MP3Stego using Calibrated Side Information Features

Detecting MP3Stego using Calibrated Side Information Features 2628 JOURNAL OF SOFTWARE, VOL. 8, NO. 10, OCTOBER 2013 Detectng P3Stego usng Calbrated Sde Informaton Features Xanmn Yu School of Informaton Scence and Engneerng, Nngbo Unversty Emal: mlhappy1016@163.com

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Face Detection with Deep Learning

Face Detection with Deep Learning Face Detecton wth Deep Learnng Yu Shen Yus122@ucsd.edu A13227146 Kuan-We Chen kuc010@ucsd.edu A99045121 Yzhou Hao y3hao@ucsd.edu A98017773 Mn Hsuan Wu mhwu@ucsd.edu A92424998 Abstract The project here

More information

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process

More information

A Deflected Grid-based Algorithm for Clustering Analysis

A Deflected Grid-based Algorithm for Clustering Analysis A Deflected Grd-based Algorthm for Clusterng Analyss NANCY P. LIN, CHUNG-I CHANG, HAO-EN CHUEH, HUNG-JEN CHEN, WEI-HUA HAO Department of Computer Scence and Informaton Engneerng Tamkang Unversty 5 Yng-chuan

More information

3D vector computer graphics

3D vector computer graphics 3D vector computer graphcs Paolo Varagnolo: freelance engneer Padova Aprl 2016 Prvate Practce ----------------------------------- 1. Introducton Vector 3D model representaton n computer graphcs requres

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

Video Proxy System for a Large-scale VOD System (DINA)

Video Proxy System for a Large-scale VOD System (DINA) Vdeo Proxy System for a Large-scale VOD System (DINA) KWUN-CHUNG CHAN #, KWOK-WAI CHEUNG *# #Department of Informaton Engneerng *Centre of Innovaton and Technology The Chnese Unversty of Hong Kong SHATIN,

More information

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007 Syntheszer 1.0 A Varyng Coeffcent Meta Meta-Analytc nalytc Tool Employng Mcrosoft Excel 007.38.17.5 User s Gude Z. Krzan 009 Table of Contents 1. Introducton and Acknowledgments 3. Operatonal Functons

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Performance Assessment and Fault Diagnosis for Hydraulic Pump Based on WPT and SOM

Performance Assessment and Fault Diagnosis for Hydraulic Pump Based on WPT and SOM Performance Assessment and Fault Dagnoss for Hydraulc Pump Based on WPT and SOM Be Jkun, Lu Chen and Wang Zl PERFORMANCE ASSESSMENT AND FAULT DIAGNOSIS FOR HYDRAULIC PUMP BASED ON WPT AND SOM. Be Jkun,

More information

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated. Some Advanced SP Tools 1. umulatve Sum ontrol (usum) hart For the data shown n Table 9-1, the x chart can be generated. However, the shft taken place at sample #21 s not apparent. 92 For ths set samples,

More information

Machine Learning. Topic 6: Clustering

Machine Learning. Topic 6: Clustering Machne Learnng Topc 6: lusterng lusterng Groupng data nto (hopefully useful) sets. Thngs on the left Thngs on the rght Applcatons of lusterng Hypothess Generaton lusters mght suggest natural groups. Hypothess

More information

A Statistical Model Selection Strategy Applied to Neural Networks

A Statistical Model Selection Strategy Applied to Neural Networks A Statstcal Model Selecton Strategy Appled to Neural Networks Joaquín Pzarro Elsa Guerrero Pedro L. Galndo joaqun.pzarro@uca.es elsa.guerrero@uca.es pedro.galndo@uca.es Dpto Lenguajes y Sstemas Informátcos

More information

Machine Learning 9. week

Machine Learning 9. week Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below

More information

VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) , Fax: (370-5) ,

VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) , Fax: (370-5) , VRT012 User s gude V0.1 Thank you for purchasng our product. We hope ths user-frendly devce wll be helpful n realsng your deas and brngng comfort to your lfe. Please take few mnutes to read ths manual

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Object-Based Techniques for Image Retrieval

Object-Based Techniques for Image Retrieval 54 Zhang, Gao, & Luo Chapter VII Object-Based Technques for Image Retreval Y. J. Zhang, Tsnghua Unversty, Chna Y. Y. Gao, Tsnghua Unversty, Chna Y. Luo, Tsnghua Unversty, Chna ABSTRACT To overcome the

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

Hybrid Non-Blind Color Image Watermarking

Hybrid Non-Blind Color Image Watermarking Hybrd Non-Blnd Color Image Watermarkng Ms C.N.Sujatha 1, Dr. P. Satyanarayana 2 1 Assocate Professor, Dept. of ECE, SNIST, Yamnampet, Ghatkesar Hyderabad-501301, Telangana 2 Professor, Dept. of ECE, AITS,

More information

Brushlet Features for Texture Image Retrieval

Brushlet Features for Texture Image Retrieval DICTA00: Dgtal Image Computng Technques and Applcatons, 1 January 00, Melbourne, Australa 1 Brushlet Features for Texture Image Retreval Chbao Chen and Kap Luk Chan Informaton System Research Lab, School

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Load-Balanced Anycast Routing

Load-Balanced Anycast Routing Load-Balanced Anycast Routng Chng-Yu Ln, Jung-Hua Lo, and Sy-Yen Kuo Department of Electrcal Engneerng atonal Tawan Unversty, Tape, Tawan sykuo@cc.ee.ntu.edu.tw Abstract For fault-tolerance and load-balance

More information

Design and Implementation of an Energy Efficient Multimedia Playback System

Design and Implementation of an Energy Efficient Multimedia Playback System Desgn and Implementaton of an Energy Effcent Multmeda Playback System Zhjan Lu, John Lach, Mrcea Stan, Kevn Skadron, Departments of Electrcal and Computer Engneerng and Computer Scence, Unversty of Vrgna

More information

OPTIMAL VIDEO SUMMARY GENERATION AND ENCODING. (ICIP Draft v0.2, )

OPTIMAL VIDEO SUMMARY GENERATION AND ENCODING. (ICIP Draft v0.2, ) OPTIMAL VIDEO SUMMARY GENERATION AND ENCODING + Zhu L, * Aggelos atsaggelos and + Bhavan Gandh (ICIP Draft v.2, -2-23) + Multmeda Communcaton Research Lab, Motorola Labs, Schaumburg * Department of Electrcal

More information

EXTENDED BIC CRITERION FOR MODEL SELECTION

EXTENDED BIC CRITERION FOR MODEL SELECTION IDIAP RESEARCH REPORT EXTEDED BIC CRITERIO FOR ODEL SELECTIO Itshak Lapdot Andrew orrs IDIAP-RR-0-4 Dalle olle Insttute for Perceptual Artfcal Intellgence P.O.Box 59 artgny Valas Swtzerland phone +4 7

More information

A Background Subtraction for a Vision-based User Interface *

A Background Subtraction for a Vision-based User Interface * A Background Subtracton for a Vson-based User Interface * Dongpyo Hong and Woontack Woo KJIST U-VR Lab. {dhon wwoo}@kjst.ac.kr Abstract In ths paper, we propose a robust and effcent background subtracton

More information

Related-Mode Attacks on CTR Encryption Mode

Related-Mode Attacks on CTR Encryption Mode Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

Performance Analysis of Data Hiding in MPEG-4 AAC Audio *

Performance Analysis of Data Hiding in MPEG-4 AAC Audio * TSINGHUA SCIENCE AND TECHNOLOGY ISSNll1007-0214ll07/21llpp55-61 Volume 14, Number 1, February 2009 Performance Analyss of Data Hdng n MPEG-4 AAC Audo * XU Shuzheng ( ) **, ZHANG Peng ( ), WANG Pengjun

More information

Comparison Study of Textural Descriptors for Training Neural Network Classifiers

Comparison Study of Textural Descriptors for Training Neural Network Classifiers Comparson Study of Textural Descrptors for Tranng Neural Network Classfers G.D. MAGOULAS (1) S.A. KARKANIS (1) D.A. KARRAS () and M.N. VRAHATIS (3) (1) Department of Informatcs Unversty of Athens GR-157.84

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

Grading Image Retrieval Based on DCT and DWT Compressed Domains Using Low-Level Features

Grading Image Retrieval Based on DCT and DWT Compressed Domains Using Low-Level Features Journal of Communcatons Vol. 0 No. January 0 Gradng Image Retreval Based on DCT and DWT Compressed Domans Usng Low-Level Features Chengyou Wang Xnyue Zhang Rongyang Shan and Xao Zhou School of echancal

More information

USING GRAPHING SKILLS

USING GRAPHING SKILLS Name: BOLOGY: Date: _ Class: USNG GRAPHNG SKLLS NTRODUCTON: Recorded data can be plotted on a graph. A graph s a pctoral representaton of nformaton recorded n a data table. t s used to show a relatonshp

More information