Parametric Coding of Spatial Audio

Size: px

Start display at page:

Download "Parametric Coding of Spatial Audio"

Louisa Miles
5 years ago
Views:

1 Parametric Coding of Spatial Audio Ph.D. Thesis Christof Faller, September 24, 2004 Thesis advisor: Prof. Martin Vetterli Audiovisual Communications Laboratory, EPFL Lausanne

2 Parametric Coding of Spatial Audio Contents: - Audio Coding and Thesis Motivation - Background - Binaural Cue Coding (BCC) - Variations of BCC - Source Localization in Complex Listening Scenarios - Conclusions

3 Audio Coding Audio coding Transmission x Encoder Decoder x Convert audio signal into a representation suitable for: - Transmission - Storage Minimize bitrate - Optimal for storage - Optimal for transmission Storage

4 Audio Coding Lossless coding x Encoder Decoder x 50% Memory/bitrate reduction (Put music of 10 CDs on 1 CD) x = x Redundancy reduction: -Time to frequency transform -Entropy coding

5 Perceptual audio coding Audio Coding x Encoder Decoder x 90% Memory/bitrate reduction (Put music of 10 CDs on 1 CD) x x x x x x

6 Parametric audio coding Audio Coding x Encoder Decoder x >90% Memory/bitrate reduction Drawback: Quality loss

7 Thesis Motivation Receiver properties and source properties have been extensively considered in audio coding for monaural audio signals. This thesis: Extensively consider receiver and source properties for stereo and multi-channel audio signals.

8 Parametric Coding of Spatial Audio Contents: - Audio Coding and Thesis Motivation - Background - Binaural Cue Coding (BCC) - Variations of BCC - Source Localization in Complex Listening Scenarios - Conclusions

9 Background Spatial audio playback Two-channel stereo (headphone and loudspeaker playback) 5.1 Surround LFE Left Center Right Left Right Left Right Rear left Rear right Spatial audio playback: - perception of an auditory spatial image

10 Background Interaural differences One source, free-field s Source azimuth and ITD/ILD: (Narrowband signal) φ = 90 ILD φ = 0 ITD φ φ = -90 distance difference, head shadowing interaural time and level difference (ITD and ILD)

11 Background Inter-channel differences: Inter-channel time-difference (ICTD) Inter-channel level-difference (ICLD) Inter-channel coherence (ICC) ICTD, ICLD ICC

12 Background Mixing stereo signals: a 1 a 1 d 1 d 1 s 1 a 2 s 1 a 2 d 2 d 2 Σ Σ Σ Σ a 3 a 3 d 3 d 3 s 2 a 4 s 2 a 4 d 4 d 4

13 Background Other auditory spatial image attributes: Auditory event distance Auditory event width Listener envelopment - Power of ear-input signals - Ratio of power of direct to reflected sound - Lateral fraction - Late lateral energy fraction Spatial audio playback: These attributes are controlled by adding reflections to the signal channels.

14 Parametric Coding of Spatial Audio Contents: - Audio Coding and Thesis Motivation - Background - Binaural Cue Coding (BCC) - Variations of BCC - Source Localization in Complex Listening Scenarios - Conclusions

15 Binaural Cue Coding (BCC) Conventional multi-channel transmission Proposed multi-channel transmission MONO SIGNAL ANALYSIS + DOWNMIX SYNTHESIS

16 Binaural Cue Coding (BCC) Estimation of inter-channel cues: ICTD, ICLD, and ICC Frequency ICC ICTD, ICLD Time ICTD/ICLD: 1-3 kb/s (per channel pair) ICC: 1-3 kb/s

17 Binaural Cue Coding (BCC) ICTD/ICLD/ICC synthesis s FB s d 1 d 2 a 1 a 2 h 1 h 2 x 1 x 2 IFB IFB x 1 x 2 d C a C h C x C IFB x C d i : delays a i : scale factors h i : ICTD ICLD h i : filters f f

18 Binaural Cue Coding (BCC) ICTD, ICLD, and ICC and auditory spatial image attributes Auditory event localization: ICTD, ICLD Effect of late reflections: (ambience, listener envelopment, auditory event distance) ICC (de-correlation) ICLD (reverberation decay) Effect of early reflections: (auditory event width, coloration) ICC (de-correlation) ICLD (comb filter)

19 Binaural Cue Coding (BCC) Subjective evaluation: Stereo audio quality GRADING anchor 1 anchor 2 no ICC ICC reference INDIVIDUAL ITEMS A B C D E F G H I J K L M N AVERAGE Excellent Good Fair Poor Bad MUSHRA (ITU-R BS.1534), headphone listening, 7 experienced subjects BCC: "excellent"

20 Surround 5.1 Demo: Binaural Cue Coding (BCC) Original Mono Synthesized ANALYSIS + DOWNMIX SYNTHESIS 15 kb/s

21 Binaural Cue Coding (BCC) Surround 5.1 Demo: 44 kb/s (lowest bitrate ever demonstrated) Memory of 1 CD = 32 CDs worth of music! Not stereo, but surround! Original Synthesized ANALYSIS + DOWNMIX 32 kb/s HE-AAC SYNTHESIS 12 kb/s

22 Parametric Coding of Spatial Audio Contents: - Audio Coding and Thesis Motivation - Background - Binaural Cue Coding (BCC) - Variations of BCC - Source Localization in Complex Listening Scenarios - Conclusions

23 Variations of BCC "MP3 Surround": Stereo backwards compatible coding of 5.1 Stereo-downmix: 2-to-5 Synthesis: Lt FB y 1 Upmixing.. s 1 s 4 d 1 d 4 a 1 a 4 A x 1 x 4 IFB IFB L sl + s 3 a 3 x 3 IFB C Lt = L + 0.7C + sl Rt = R + 0.7C + sr Rt FB y 2.. s 2 s 5 d 2 d 5 a 2 a 5 A x 2 x 5 IFB IFB R sr

24 Variations of BCC Low complexity 5-to-2 BCC: Subjective evaluation: MUSHRA (ITU-R BS.1534) Hidden reference, Anchor, AAC, 5-to-2 BCC + MP3, Dolby Prologic II 256 kb/s 192 kb/s PCM

25 Variations of BCC MP3 Surround Demo: Lt Rt L LFE C R Ls Rs MP3 Surround

26 Variations of BCC Variations of BCC Conventional flexible rendering BCC for flexible rendering ANALYSIS + DOWNMIX MONO SIGNAL

27 Variations of BCC BCC for Flexible Rendering Side information: Time-frequency structure of sum signal FREQUENCY TIME SUBBAND INDEX TIME INDEX k

28 Binaural Cue Coding (BCC) BCC for Flexible Rendering Demo: 4 Instruments playing together. ANALYSIS + DOWNMIX MONO SIGNAL 2.5kb/s

29 Parametric Coding of Spatial Audio Contents: - Audio Coding and Thesis Motivation - Background - Binaural Cue Coding (BCC) - Variations of BCC - Source Localization in Complex Listening Scenarios - Conclusions

30 Source Localization in Complex Listening Scenarios Model for source localization in complex listening scenarios: - concurrently active sources - direct and reflected sound

31 Source Localization in Complex Listening Scenarios IC and concurrent sources s 1 s 2 e 1 e 2 1 IC p 1 /p 2 [db] IC can be used as a "single-source-active indicator"

32 Source Localization in Complex Listening Scenarios IC and reflections s' e 1 s e 2 e 1 e 2 IC 1 n 1 n 2 n IC can be used as a "first-wavefront indicator"

33 Source Localization in Complex Listening Scenarios Model for source localization Psychological aspect of model Cue selection: Use ITD and ILD when IC > c 0 ITD, ILD, IC Binaural processor Inner ear (cochlea) Middle ear [db] [db] f [Hz] External ear f [Hz] h 1, h 2

34 Source Localization in Complex Listening Scenarios Two concurrent male speech sources, free-field 500Hz 1 IC c 0.9 c 0 = 0.95 s 1 s Δ L [db] ILD ITD τ [ms] TIME [s]

35 Source Localization in Complex Listening Scenarios Precedence effect: (Broadband pulses, 5ms lead/lag delay, free-field) 500Hz 1 s 1 s ms IC c c 0 = ms s 1 s 2 Δ L [db] ILD ITD τ [ms] TIME [s]

36 Source Localization in Complex Listening Scenarios Source localization in reverberant environment: (2 male speech sources, 500Hz: RT = 2s, 2kHz: RT = 1.4s) 5 p(δl) Without cue (B) selection ΔL [db] ILD 0 s 1 s 2 5 p(τ) τ [ms] ITD p(δl c>c 0 ) With cue (D) selection 5 ΔL [db] ILD 0 5 p(τ c>c 0 ) τ [ms] ITD

37 Parametric Coding of Spatial Audio Contents: - Audio Coding and Thesis Motivation - Background - Binaural Cue Coding (BCC) - Variations of BCC - Source Localization in Complex Listening Scenarios - Conclusions

38 Parametric Coding of Spatial Audio Conclusions Binaural Cue Coding (BCC): - low bitrate coding of stereo and multi-channel audio signals - low bitrate transmission of independent sources - bridging between different audio formats Proposed source localization model: - speculates about role of IC for "cue selection" - attempts at explaining source localization in complex listening

39 Parametric Coding of Spatial Audio Future work: Multi-channel BCC parameters: Different more efficient and flexible parametrization. BCC applied to binaural recordings: Further investigate. Psychoacoustic experiments with BCC stimuli. Source localization model: - Adaptation of cue selection threshold - More simulations, psychoacoustic experiments - Can model be related to other attributes of auditory spatial image?

40 Parametric Coding of Spatial Audio Acknowledgements: Martin Vetterli, Peter Kroon, and all colleagues I collaborated with. Thanks to all of you for coming to this defense!

41 Audio Coding Hybrid audio coding: Intensity stereo coding (ISC): Frequency Frequency Frequency Azimuth parameters Frequency Spectral bandwidth replication (SBR): Frequency Spectral parameters Frequency

42 Binaural Cue Coding (BCC) Downmix with equalization x 1 x 2 FB FB x 1 x 2! x e s IFB s x C FB x C Numerical example: x 1 x 1 x 2 x 2 x + x 1 +x TIME [ms] Time [ms] X 1 [db] X 1 X 2 X 2 [db] X 1 [db] +X FREQUENCY [khz] Frequency [Hz]

43 Binaural Cue Coding (BCC) Alternative ICTD/ICLD/ICC synthesis s FB s d 1 a 1 b 1! x 1 IFB x 1 LR LR s 1 s 2 d 2 a 2 b 2! x 2 IFB x 2 a i, b i : scale factors LR: late reverberation filter Coherent/incoherent channel power:

44 Binaural Cue Coding (BCC) Subjective evaluation: 5-channel audio quality GRADING INDIVIDUAL ITEMS a b c d e f g h AVERAGE Imperceptible Perceptible but not annoying Slightly annoying Annoying Very annoying Hidden reference test (ITU-R BS.1116), loudspeaker listening, 9 subjects

45 Binaural Cue Coding (BCC) Subjective evaluation: ICLD time resolution stable Auditory spatial image stability image stability grading instable 5 grading speech singing percussions applause ms ms 8512 ms 4256 ms window size Overall overall quality speech singing percussions applause ms ms 8512 ms 4256 ms window size imperceptible perceptible but not annoying slightly annoying annoying very annoying

46 Variations of BCC C-to-E BCC x 1 x 2 x C ENCODER DOWNMIX C-to-E y 1 y E DECODER ICTD, ICLD, ICC SYNTHESIS x 1 x 2 x C ICTD, ICLD, ICC ESTIMATION SIDE INFORMATION - Potentially better quality than C-to-1 BCC - E-channel backwards compatible coding of C-channel audio

47 Variations of BCC 6-to-5 BCC: 5.1 backwards compatible coding of 6.1 Downmix: 5-to-6 Synthesis: y 1 y 2 Upmixing delay delay x 1 x 2 y 3 delay x 3 y 4 FB y 4. + s 4 s 6 a 4 a 6 x 4 x 6 IFB IFB x 4 x 6 Dolby Surround EX loudspeaker positioning y 5 FB y 5. s 5 a 5 x 5 IFB x 5

48 Variations of BCC 7-to-5 BCC: 5.1 backwards compatible coding of 7.1 Downmix: 5-to-7 Synthesis: y 1 y 2 y 3 y 4 FB y 1 Upmixing. s 4 s 6 delay delay delay a 4 d 4 a 6 d 6 A x 4 x 6 IFB IFB x 1 x 2 x 3 x 4 x 6 Lexicon Logic 7 loudspeaker positioning y 5 FB y 2. s 5 s 7 d 5 d 7 a 5 a 7 A x 5 x 7 IFB IFB x 5 x 7

49 Variations of BCC Hybrid BCC Frequency Frequency Frequency f 0 ICTD, ICLD ICC Frequency 100 QUALITY Hybrid BCC Conventional coder BCC BIT RATE Differences to intensity stereo coding: - consider more cues (not only intensities) - use separate filterbank (better performance)

50 Variations of BCC Hybrid BCC 100 anchor 1 anchor 2 f 0 =0kHz f 0 =1kHz f 0 =2kHz f 0 =6kHz f 0 =16kHz AVERAGE 80 GRADING x 1 x 2 FB FB BCC ENCODER 59 kb/s 68 kb/s y 1 IFB y 2 IFB 74 kb/s FB FB 84 kb/s BCC DECODER 101 kb/s IFB IFB x 1 x 2 x C FB IFB y C FB IFB x C SIDE INFORMATION

51 Source Localization in Complex Listening Scenarios Three and five concurrent male speech sources: (-30, 0, 30, -80, 80, free-field) Without cue selection With cue selection ΔL [db] ILD ΔL [db] ILD p(δl) (A) p(τ) τ [ms] ITD p(δl c 12 >c 0 ) (C) 5 0 ΔL [db] ILD ILD ΔL [db] p(δl) (B) p(τ) τ [ms] ITD p(δl c 12 >c 0 ) (D) p(τ c 12 >c 0 ) τ [ms] ITD p(τ c 12 >c 0 ) τ [ms] ITD

52 Source Localization in Complex Listening Scenarios Cue selection: Three phases of the precedence effect, 500Hz critical band WITHOUT Without CUE cue SELECTION selection 200 ms ITD τ [ms] ILD Δ L [db] ITD τ [ms] ICI [ms] With cue selection WITH CUE SELECTION ICI [ms] lead/lag delay ILD Δ L [db] p 0, c p 0 c ICI [ms] lead/lag delay [ms] ICI [ms]

53 Source Localization in Complex Listening Scenarios Amplitude panning, free-field, 500Hz critical band Reference REFERENCE BCC BCC (no without ICC) 1 IC c 0.9 c 0 = c 0 = c 0 = ILD τ [ms] Δ L [db] ITD Standard stereo setup Two male speech sources +-8dB amplitude panning TIME [s] TIME [s] TIME [s]

Efficient Representation of Sound Images: Recent Developments in Parametric Coding of Spatial Audio

Efficient Representation of Sound Images: Recent Developments in Parametric Coding of Spatial Audio Dr. Jürgen Herre 11/07 Page 1 Jürgen Herre für (IIS) Erlangen, Germany Introduction: Sound Images? Humans