TECHNICAL PAPER. Personal audio mix for broadcast programs. Fraunhofer Institute for Integrated Circuits IIS

Similar documents
Technical PapER. between speech and audio coding. Fraunhofer Institute for Integrated Circuits IIS

WHITE PAPER. Director Prof. Dr.-Ing. Albert Heuberger Am Wolfsmantel Erlangen

EXCELLENCE IN AUDIO AND MEDIA ENGINEERING

FAST FACTS. Fraunhofer Institute for Integrated Circuits IIS

WHITE PAPER. Fraunhofer Institute for Integrated Circuits IIS

application Bulletin Fraunhofer Institute for Integrated Circuits IIS

The AAC audio Coding Family For

WHITE PAPER Fraunhofer DCPServer Professional

TECHNICAL PAPER. Fraunhofer Institute for Integrated Circuits IIS

Experiencing MultiChannel Sound in

Introducing the MPEG-H Audio Alliance s New Interactive and Immersive TV Audio System

Introducing the MPEG-H Audio Alliance s New Interactive and Immersive TV Audio System

PRESS RELEASE. Fraunhofer IIS, Qualcomm and Technicolor to Demonstrate the World s First Live Broadcast of MPEG-H Interactive and Immersive TV Audio

IMMERSIVE AUDIO WITH MPEG 3D AUDIO STATUS AND OUTLOOK

Advanced Functionality & Commercial Aspects of DRM. Vice Chairman DRM Technical Committee,

Chapter 5.5 Audio Programming

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

Efficient Representation of Sound Images: Recent Developments in Parametric Coding of Spatial Audio

Enhanced Audio Features for High- Definition Broadcasts and Discs. Roland Vlaicu Dolby Laboratories, Inc.

The following bit rates are recommended for broadcast contribution employing the most commonly used audio coding schemes:

Networking Applications

MPEG-4 aacplus - Audio coding for today s digital media world

Surround for Digital Radio. NRSC - SSATG May 9, Mike Canevaro Director, Strategic Development SRS Labs, Inc.

Professional Monitoring Receiver for

Starting production in interactive and immersive sound

ISO/IEC Information technology High efficiency coding and media delivery in heterogeneous environments. Part 3: 3D audio

Fraunhofer Institute for Integrated Circuits IIS

Preview only reaffirmed Fifth Avenue, New York, New York, 10176, US

Proceedings of Meetings on Acoustics

FRAUNHOFER INSTITUTE FOR INTEGRATED CIRCUITS IIS DESTINATION FRAUNHOFER IIS

AUDIOVISUAL COMMUNICATION

2.4 Audio Compression

Surround Codec Background, Technology, and

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

RESOUND ENYA: BRINGING THE BENEFITS OF WIRELESS CONNECTIVITY TO THE BASIC HEARING INSTRUMENT MARKET

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd

The MPEG-4 General Audio Coder

Audio Systems for DTV

ITNP80: Multimedia! Sound-II!

SAOC and USAC. Spatial Audio Object Coding / Unified Speech and Audio Coding. Lecture Audio Coding WS 2013/14. Dr.-Ing.

ETSI TR V ( )

MP3 - then and now Breakthrough technology applications for the New Digital World

Object-Based Audio and Sound Reproduction

5: Music Compression. Music Coding. Mark Handley

AAC-ELD based Audio Communication on ios A Developer s Guide

AUDIOVISUAL COMMUNICATION

ELL 788 Computational Perception & Cognition July November 2015

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd

SOUND ENGINEERING SOLUTIONS NEIM Noise Eliminating In-Line Module. Audio Out. Line. Connections. Input Selector NEIM VDC Input + -

For Mac and iphone. James McCartney Core Audio Engineer. Eric Allamanche Core Audio Engineer

Fraunhofer - Overview

Parametric Coding of Spatial Audio

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

CINEMA CONNECT. The solution for 100% inclusion in your cinema. Reach out to all audiences, and generate genuine added value.

Squeeze Play: The State of Ady0 Cmprshn. Scott Selfon Senior Development Lead Xbox Advanced Technology Group Microsoft

Audio and video compression

Chapter 14 MPEG Audio Compression

Multimedia Communications. Audio coding

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Audio-coding standards

Mpeg 1 layer 3 (mp3) general overview

Optical Storage Technology. MPEG Data Compression

DVB Audio. Leon van de Kerkhof (Philips Consumer Electronics)

MPEG Spatial Audio Coding Multichannel Audio for Broadcasting

bhi bhi SOUND ENGINEERING SOLUTIONS SOUND ENGINEERING SOLUTIONS NEIM 1031 bhi Ltd 22 Woolven Close Burgess Hill West Sussex RH15 9RR England

Audio-coding standards

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

Voice Over LTE (VoLTE) Technology. July 23, 2018 Tim Burke

Before starting the troubleshooting, make sure you have installed the latest version of audio driver and Nahimic on your notebook.

SAOC FOR GAMING THE UPCOMING MPEG STANDARD ON PARAMETRIC OBJECT BASED AUDIO CODING

Parametric Coding of High-Quality Audio

Lecture 16 Perceptual Audio Coding

ENHANCED NEXT GENERATION AUDIO FOR LIVE SPORTS BROADCAST

Image and Video Coding I: Fundamentals

Sennheiser Business Solutions. Visitor guidance & audio streaming

Efficiënte audiocompressie gebaseerd op de perceptieve codering van ruimtelijk geluid

Embedding Audio into your RX Application

25*$1,6$7,21,17(51$7,21$/('(1250$/,6$7,21,62,(&-7&6&:* &2',1*2)029,1*3,&785(6$1'$8',2 &DOOIRU3URSRVDOVIRU1HZ7RROVIRU$XGLR&RGLQJ

Audio Processing on ARM Cortex -M4 for Automotive Applications

MP3. Panayiotis Petropoulos

14th European Signal Processing Conference (EUSIPCO 2006), Florence, Italy, September 4-8, 2006, copyright by EURASIP

bhi Ltd User Guide for bhi Dual In-Line Module stereo/dual channel DSP Noise Cancellation Module

Principles of Audio Coding

Best-in-class audio recording

Principles of MPEG audio compression

Audio Compression. Audio Compression. Absolute Threshold. CD quality audio:

Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy

MODIFIED IMDCT-DECODER BASED MP3 MULTICHANNEL AUDIO DECODING SYSTEM Shanmuga Raju.S 1, Karthik.R 2, Sai Pradeep.K.P 3, Varadharajan.

Car Information Systems for ITS

DAB. Digital Audio Broadcasting

MPEG-4 Structured Audio Systems

EN54-16 COMPLIANT HARDWARE BYPASS MICROPHONE PORTS 12 ANALOGUE AUDIO INPUTS 12 GPIO INPUTS & 12 OUTPUTS 12 ANALOGUE AUDIO OUTPUTS

Optimizing A/V Content For Mobile Delivery

Value Electronics Authorized Sony Dealer

Completing the Multimedia Architecture

DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS

Audio over IP devices System description Ed 2.0

Rich Recording Technology Technical overall description

Dolby AC-4: Audio Delivery for Next-Generation Entertainment Services

Transcription:

TECHNICAL PAPER Personal audio mix for broadcast programs Finding the right balance between dialogue and ambient sound within a broadcast program is a major challenge for sound engineers and an increasing cause of audience complaints. Now, with, each viewer can individually decide what he wants to hear: Ambient sound or dialogue; the stadium atmosphere during a sports game or a commentator s up-dates. Especially for hearing-impaired viewers or non-native speakers this new opportunity will make a great difference as the technology brings an end to the well-known inconvenience of obscured dialogue when the background music s volume becomes too overwhelming. from Fraunhofer IIS is the first technology that enables TV viewers and radio listeners to control the audio balance according to their personal preferences, listening environments and hearing abilities. allows broadcasters to meet the demand for improved speech intelligibility as part of their services for hearing impaired audiences. Fraunhofer Institute for Integrated Circuits IIS Management of the institute Prof. Dr.-Ing. Albert Heuberger (executive) Dr.-Ing. Bernhard Grill Am Wolfsmantel 33 91058 Erlangen www.iis.fraunhofer.de Contact Matthias Rose Phone +49 9131 776-6175 matthias.rose@iis.fraunhofer.de Contact USA Fraunhofer USA, Inc. Digital Media Technologies* Phone +1 408 573 9900 codecs@dmt.fraunhofer.org Contact China china@iis.fraunhofer.de Contact Japan Fahim Nawabi Phone: +81 90-4077-7609 fahim.nawabi@iis.fraunhofer.de Contact Korea Youngju Ju Phone: +82 2 948 1291 youngju.ju@iis-extern.fraunhofer.de * Fraunhofer USA Digital Media Technologies, a division of Fraunhofer USA, Inc., promotes and supports the products of Fraunhofer IIS in the U. S. www.iis.fraunhofer.de/audio

1. A good audio mix is not enough Delivering a well-balanced audio mix does not guarantee an enjoyable television or radio program for a broadcast audience. There are several aspects that determine how well listeners understand the dialogue of a broadcast program, explained in the following. 1.1 Listening environment The listening environment and the reproduction equipment have a considerable influence on how listeners perceive an audio mix. For instance, when listening with headphones, noisy background audio can mask important dialogue, and thus a different balance of dialogue versus background would be beneficial. 1.2 Foreign languages Listening to programs in non-native languages typically requires more concentration. A higher dialogue volume in comparison to the background levels would make listening less straining and would help to improve intelligibility. 1 1.3 Hearing abilities Individuals with hearing impairments benefit from an increased signal-to-noise-ratio (SNR) in order to enjoy and understand broadcast programs equally as well as people with normal hearing abilities. Raising the SNR only by one db alone achieves a great increase of the speech intelligibility. 2 Due to different listening situations and hearing abilities, every person in the audience would benefit from their own personal mix to enhance their listening experience. The Enhancement technology is a solution for this, allowing the audience to personally tune the audio balance according to their preferences. 1 Experiments have shown that an approximately 3dB increase in signal-to-noise-ratio (SNR) enhances dialogue intelligibility to that of the listener s native language (Florentine (1). In some cases when the speech material is too complex, 3dB are not enough. According to Warzybok et al., these situations call for an increase of 5-10 db, depending on the audience s language skills (2). 2 For more information about the effect of an increased SNR, please see Brand and Kollmeier (4). For more information about the extension and development of hearing loss, please see Heger and Holube (5) as well as Kochkin (6). www.iis.fraunhofer.de/audio 2 / 7

2. Introduction: The foundation for is MPEG Spatial Audio Object Coding (MPEG SAOC), a technology developed to manipulate any number of audio objects during rendering. Single instruments, dialogue or a singer s voice are just some examples of possible audio objects. The main principle is that the objects are all mixed into one downmix audio signal for transmission. The downmix signal can either be a mono, stereo or multichannel signal. Additional object-related side information is transmitted along with the downmix signal. This information enables the object manipulation in the receiver based on user interaction. For, only a subset of the MPEG SAOC functionality is needed. No more than two objects (dialogue and background) are used, reducing the decoder complexity substantially. Further, only needs loudness level reduction or amplification for interactivity during rendering. 2.1 Basic working principle Figure 1 gives an overview of the end-to-end signal flow from encoder input to decoder output. After analyzing the objects, the encoder generates a stream of parametric side information. It is possible to either mix the dialogue and background input signals within the encoder or use an external mix together with a clean dialogue or background signal. Then, any audio codec can encode the mixed signal in this example, it is MPEG-4 AAC/HE-AAC. The side information stream is integrated into the encoded audio bit stream. As an alternative, the parameters could also be transported in a separate stream. Default Mix * AAC Encoder AAC Bitstream Parameters AAC Decoder Default mix with legacy devices Background Figure 1: end-to-end signal flow. Default Mix * User controlled mix with enabled devices Dialouge Enhancement Encoder Enhancement Decoder/ Renderer Background * Producer controlled default mix; Mono, Stereo or 5.1 User Interaction + www.iis.fraunhofer.de/audio 3 / 7

2.2 Integrating into the signal flow There are two ways of integrating the encoder into the signal flow. The most common method is shown in Figure 1, where the encoder is supplied with the externally created default mix and the dialogue signal. Alternatively, the background signal could replace the dialogue signal. As a second option, the two audio sources dialogue and background could be fed into the encoder, where they are mixed into one signal that subsequently enters the AAC audio encoder. 2.3 The different modes of offers two encoding modes for the side information: The basic parametric mode already described above: The side information stream is generated by the encoder and then embedded into the encoded audio bit stream. The enhanced residual mode that additionally embeds residual information for the dialogue object into the side information. The parametric mode is typically useful for dialogue level modifications of up to 6 db. The residual mode allows for more attenuation or enhancement beyond 6 db at high quality, but this method requires additional bit rate for the residual information. As a result of discussions with several broadcasters, an upper limit of 12 db for the alterations was found to be a good compromise between bit rate and flexibility. The value of 12 db is also used in the experiments described below. can be used for any number of transmission channels, such as mono, stereo or multichannel. In the case of multichannel audio, two typical modes became apparent from discussions with broadcasters. The dialogue is either present in the center channel only or it is part of the front three channels: left, center and right. 2.4. Technical advantages is backwards compatible with existing transmission equipment and receivers. Existing devices that are not capable of decoding the parametric side information will ignore it and play back the default mix. Only one single audio track with the additional side information needs to be transmitted. This saves channel capacities compared to transmitting several pre-mixed versions or additional dialogue and background audio tracks. www.iis.fraunhofer.de/audio 4 / 7

3. Testing the technology s feasibility The integration of into the production environment has been tested in a number of experiments, starting with a feasibility test in 2011 during the Wimbledon lawn tennis tournament. UK listeners of the BBC Radio 5 Live Internet stream were given the opportunity to download an audio playback client including as the main feature. This allowed them to control the volume of the audio component Commentary separately from the component Court Atmosphere with a maximum of 12 db attenuation or amplification. The subsequent feedback was consistently positive and the possibilities offered by the technology addressed the listeners needs. A general statement was that would be especially useful for sports and drama programs. Only a minority of 7% of the listeners chose to listen to the default mix, the remaining listeners chose to change the volume levels with two main groups emerging. The first one slightly turned down the commentary s volume to enjoy more of the court atmosphere. The second group significantly raised the volume of the commentary to improve intelligibility. 3 To validate the early results from the Wimbledon experiment, a speech intelligibility experiment was conducted to verify the benefits for a hearing-impaired audience. The results of this experiment 4 indicate that the technology can be used as intended. An enhancement of the dialogue by 6 db already showed a substantial improvement in intelligibility. An enhancement of 12 db improves the speech intelligibility for hearing-impaired listeners to values comparable to normal-hearing listeners. 4. Standardization has been standardized within DVB as Advanced Clean Audio Services of the audio-video-coding toolbox. 3 Further details on the survey results can be found in Fuchs et al. (7) and Meltzer (8). 4 Details on the test results can be found in Fuchs et al. (9). www.iis.fraunhofer.de/audio 5 / 7

References 1. Florentine, M. 1985., Speech perception in noise by fluent, non native listeners. J. Acoust. Soc. Am., Volume 77, Issue S1, pp. S106-S106. 2. Warzybok, A. et al., 2010. Influence of the linguistic complexity in relation to speech material on non-native speech perception in noise. DAGA 2010, Berlin. 3. Jansen, S., Luts, H., Wagener, K. C., Kollmeier, B., Del Rio, M., Dauman, R., James, C., et al., 2012. Comparison of three types of French speech-in-noise tests: a multi-center study. International Journal of Audiology, 51(3), 164 73. doi:10.3109/14992027.2011.633568 4. Brand, T. and Kollmeier, B., 2002. Efficient adaptive procedures for threshold and concurrent slope estimates for psychophysics and speech intelligibility tests. The Journal of the Acoustical Society of America, 111(6), 2801. doi:10.1121/1.1479152 5. Heger, D., and Holube, I., 2010. Wie viele Menschen sind schwerhörig? Zeitschrift für Audiologie, 49(2), pp. 61 70. 6. Kochkin, S., 2005. MarkeTrak VII: Hearing loss population tops 31 million people. Hearing Review, 12(7). http://www.betterhearing.org/pdfs/marketrak7_kochkin_july05.pdf 7. Fuchs, H., Tuff, S. and Bustad C., 2012. - technology and experiments. EBU Technology Review, June 2012. http://tech.ebu.ch/webdav/site/tech/shared/ techreview/trev_2012-q2_-enhancement_fuchs.pdf 8. Meltzer, S., 2011. - A new approach. Nordig Sound Symposium, Sept. 2011. http://www.nrk.no/soundsymp/2011_meltzer.html 9. Fuchs, H., Oetting, D., 2013. Advanced Clean Audio Solution:. IBC Conference, Sept. 2013. www.iis.fraunhofer.de/audio 6 / 7

INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. INFORMATION IN THIS DOCUMENT IS OWNED AND COPYRIGHTED BY THE FRAUNHOFER- GESELLSCHAFT AND MAY BE CHANGED AND/OR UPDATED AT ANY TIME WITHOUT FURTHER NOTICE. PERMISSION IS HEREBY NOT GRANTED FOR RESALE OR COMMERCIAL USE OF THIS SERVICE, IN WHOLE OR IN PART, NOR BY ITSELF OR INCORPORATED IN ANOTHER PRODUCT. Copyright March 2017 Fraunhofer-Gesellschaft ABOUT FRAUNHOFER IIS The division of Fraunhofer IIS has been an authority in its field for more than 25 years, starting with the creation of mp3 and co-development of AAC formats. Today, there are more than 10 billion licensed products worldwide with Fraunhofer s media technologies, and over one billion new products added every year. Besides the global successes mp3 and AAC, the Fraunhofer technologies that improve consumers audio experiences include Cingo (spatial VR audio), Symphoria (automotive 3D audio), xhe-aac (adaptive streaming and digital radio), the 3GPP EVS VoLTE codec (crystal clear telephone calls), and the interactive and immersive MPEG-H TV Audio System. With the test plan for the Digital Cinema Initiative and the recognized software suite easydcp, Fraunhofer IIS significantly pushed the digitization of cinema. The most recent technological achievement for moving pictures is Realception, a tool for light-field data processing. Fraunhofer IIS, based in Erlangen, Germany, is one of 69 divisions of Fraunhofer-Gesellschaft, Europe s largest application-oriented research organization. For more information, contact amm-info@iis.fraunhofer.de, or visit www.iis.fraunhofer.de/amm. www.iis.fraunhofer.de/audio 7 /7