R&D White Paper WHP 078. Audio Watermarking the state of the art and results of EBU tests. Research & Development BRITISH BROADCASTING CORPORATION

Similar documents
R&D White Paper WHP 101. EBU tests of commercial audio watermarking systems. Research & Development BRITISH BROADCASTING CORPORATION.

SPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL

The following bit rates are recommended for broadcast contribution employing the most commonly used audio coding schemes:

Application Note PEAQ Audio Objective Testing in ClearView

R&D White Paper WHP 070. A distributed live subtitling system. Research & Development BRITISH BROADCASTING CORPORATION. September M.

Multimedia Communications. Audio coding

Simple Watermark for Stereo Audio Signals with Modulated High-Frequency Band Delay

DRA AUDIO CODING STANDARD

A NEW DCT-BASED WATERMARKING METHOD FOR COPYRIGHT PROTECTION OF DIGITAL AUDIO

Proceedings of Meetings on Acoustics

MPEG-4 aacplus - Audio coding for today s digital media world

Principles of Audio Coding

Audio Compression. Audio Compression. Absolute Threshold. CD quality audio:

25*$1,6$7,21,17(51$7,21$/('(1250$/,6$7,21,62,(&-7&6&:* &2',1*2)029,1*3,&785(6$1'$8',2 &DOOIRU3URSRVDOVIRU1HZ7RROVIRU$XGLR&RGLQJ

R&D White Paper WHP 087. A quantitive comparison of TV-Anytime and DVB-SI. Research & Development BRITISH BROADCASTING CORPORATION.

Audio issues in MIR evaluation

Principles of MPEG audio compression

Audio-coding standards

Transporting audio-video. over the Internet

AN AUDIO WATERMARKING SCHEME ROBUST TO MPEG AUDIO COMPRESSION

Audio-coding standards

Mpeg 1 layer 3 (mp3) general overview

Audio Coding and MP3

ITNP80: Multimedia! Sound-II!

Audio Watermarking Based on PCM Technique

Data Hiding in Video

Fundamentals of Perceptual Audio Encoding. Craig Lewiston HST.723 Lab II 3/23/06

Digital Image Processing

Chapter 5.5 Audio Programming

Sign-up Sheet posted outside of my office HFH 1121

RECOMMENDATION ITU-R BS Procedure for the performance test of automated query-by-humming systems

DVB Audio. Leon van de Kerkhof (Philips Consumer Electronics)

R&D White Paper WHP 018. The DVB MHP Internet Access profile. Research & Development BRITISH BROADCASTING CORPORATION. January J.C.

Image and Video Watermarking

MPEG-1. Overview of MPEG-1 1 Standard. Introduction to perceptual and entropy codings

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

Information and Communications Security: Encryption and Information Hiding

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

Rich Recording Technology Technical overall description

Audio Watermarking using Colour Image Based on EMD and DCT

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

5: Music Compression. Music Coding. Mark Handley

Chapter 14 MPEG Audio Compression

Audio Watermarking using Empirical Mode Decomposition

_APP B_549_10/31/06. Appendix B. Producing for Multimedia and the Web

Subjective and Objective Assessment of Perceived Audio Quality of Current Digital Audio Broadcasting Systems and Web-Casting Applications

2.4 Audio Compression

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

Efficient Signal Adaptive Perceptual Audio Coding

Appendix 4. Audio coding algorithms

Voice Over LTE (VoLTE) Technology. July 23, 2018 Tim Burke

Workshop W14 - Audio Gets Smart: Semantic Audio Analysis & Metadata Standards

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd

PCM30U-ROK 2 048/256 kbps Broadcast Codec Brief Overview

Robust Audio Watermarking Based on Secure Spread Spectrum and Auditory Perception Model

DAB. Digital Audio Broadcasting

Lecture 16 Perceptual Audio Coding

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd

Optical Storage Technology. MPEG Data Compression

Secret Communication through Audio for Defense Application

Parametric Coding of Spatial Audio

A Robust Wavelet-Based Watermarking Algorithm Using Edge Detection

Copyright 2017 by Kevin de Wit

A Detailed look of Audio Steganography Techniques using LSB and Genetic Algorithm Approach

CHAPTER 10: SOUND AND VIDEO EDITING

Figure 1. Generic Encoder. Window. Spectral Analysis. Psychoacoustic Model. Quantize. Pack Data into Frames. Additional Coding.

Unit Title: Capture Pictures and Sound for Non-Linear Editing

EBU Tech Doc Three parts of Tech Doc 3344

MP3. Panayiotis Petropoulos

Compression; Error detection & correction

Voice Quality Assessment for Mobile to SIP Call over Live 3G Network

ITEC310 Computer Networks II

ETSI TS V2.1.3 ( )

The MPEG-4 General Audio Coder

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved.

Clemens H. Cap Universität Rostock clemens.cap (at) uni-rostock (dot) de STEGANOGRAPHY. BaSoTI 2012, Tartu

CHAPTER-4 WATERMARKING OF GRAY IMAGES

OTP-Steg. One-Time Pad Image Steganography Using OTP-Steg V.1.0 Software October 2015 Dr. Michael J. Pelosi

Net-Log Network Audio Logger

SD Recorder/Multi Source Player CDR-01

/ / _ / _ / _ / / / / /_/ _/_/ _/_/ _/_/ _\ / All-American-Advanced-Audio-Codec

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

Capturing and Editing Digital Audio *

A PSYCHOACOUSTIC MODEL WITH PARTIAL SPECTRAL FLATNESS MEASURE FOR TONALITY ESTIMATION

MUSIC A Darker Phonetic Audio Coder

Digital Media. Daniel Fuller ITEC 2110

Audio Systems for DTV

08 Sound. Multimedia Systems. Nature of Sound, Store Audio, Sound Editing, MIDI

SD / USB / CD / Multi Source Player Instant Playback from SD Memory High Quality DAC - Burr Brown PCM-1796

Performance Analysis of Discrete Wavelet Transform based Audio Watermarking on Indian Classical Songs

1. Before adjusting sound quality

SPEECH WATERMARKING USING DISCRETE WAVELET TRANSFORM, DISCRETE COSINE TRANSFORM AND SINGULAR VALUE DECOMPOSITION

MPEG-7 Audio: Tools for Semantic Audio Description and Processing

Perspectives on Multimedia Quality Prediction Methodologies for Advanced Mobile and IP-based Telephony

STEGANOGRAPHY Question Definition. Historical Cases of Steganography. Can we send hidden information?

ABSTRACT. that it avoids the tolls charged by ordinary telephone service

Audiovisual QoS for communication over IP networks

S.K.R Engineering College, Chennai, India. 1 2

ITU-T J.288. Encapsulation of type length value (TLV) packet for cable transmission systems

DSP. Presented to the IEEE Central Texas Consultants Network by Sergio Liberman

Transcription:

R&D White Paper WHP 078 January 2004 Audio Watermarking the state of the art and results of EBU tests A.J. Mason Research & Development BRITISH BROADCASTING CORPORATION

BBC Research & Development White Paper WHP 078 Audio Watermarking the state of the art and results of EBU tests A.J. Mason Abstract This paper discusses three different aspects of audio watermarking: techniques, applications, and results of recent evaluations. Although audio watermarking is often thought of as a new technology, many of the current techniques have been known for decades. Several of these are described. The potential applications of audio watermarking are numerous, including the obvious copy protection and content identification functions. Some examples of applications and the implications for location of watermark embedders and detectors, are presented. The balance between payload capacity and robustness for different applications is assessed. The EBU has recently completed tests of the robustness and subjective quality of state-of-the-art audio watermarking systems. The test methods are described, including the attacks to which the watermarks were subjected, and the subjective test method (a combination of BS.1116 and a forced choice). The results of the tests are presented. This paper was presented at the 21 st Nordic Sound Symposium, Bolkesjø, Norway. October 2 nd 5 th, 2003. BBC 2004. All rights reserved.

White Papers are distributed freely on request. Authorisation of the Chief Scientist is required for publication. BBC 2004. All rights reserved. Except as provided below, no part of this document may be reproduced in any material form (including photocopying or storing it in any medium by electronic means) without the prior written permission of BBC Research & Development except in accordance with the provisions of the (UK) Copyright, Designs and Patents Act 1988. The BBC grants permission to individuals and organisations to make copies of the entire document (including this copyright notice) for their own internal use. No copies of this document may be published, distributed or made available to third parties whether by paper, electronic or other means without the BBC's prior written permission. Where necessary, third parties should be directed to the relevant page on BBC's website at http://www.bbc.co.uk/rd/pubs/whp for a copy of this document.

BBC Research & Development White Paper WHP 078 Audio Watermarking - the State of the Art and Results of EBU Tests Andrew Mason Abstract This paper discusses three different aspects of audio watermarking: techniques, applications, and results of recent evaluations. Although audio watermarking is often thought of as a new technology, many of the current techniques have been known for decades. Several of these are described. The potential applications of audio watermarking are numerous, including the obvious copy protection and content identification functions. Some examples of applications and the implications for location of watermark embedders and detectors, are presented. The balance between payload capacity and robustness for different applications is assessed. The EBU has recently completed tests of the robustness and subjective quality of state-of-theart audio watermarking systems. The test methods are described, including the attacks to which the watermarks were subjected, and the subjective test method (a combination of BS.1116 and a forced choice). The results of the tests are presented. What is watermarking Watermarking has been used for many years for the authentication of banknotes, but in the context of audio (and video) signals, a watermark is an imperceptible modification to the signal that can be detected on request, and that can convey hidden data. Watermarking should not be confused with on-screen logos, or with content analysis and recognition (also known as fingerprinting ), or with scrambling. Watermarks can be robust or fragile, can convey a lot of hidden data, or a little. The properties of a watermark depend on the application for which it is intended. In general though, watermarking provides a secret, indelible mark, embedded in the signal itself, that can survive conversions from one medium to another. The idea of hidden data channels in audio signals has been around for some years. BBC Research Reports from over 30 years ago [1,2] describe investigations into a wide variety of techniques. This work was constrained somewhat by the technology available at the time. More recently, the availability of digital signal processing techniques, has made the work much easier. Many papers have now been published, widely cited ones being by Bender [3], Cox [4], Boney[5]. Some learned journals have dedicated whole issues to the subject of data hiding and security [6,7]. 1

Why use watermarking The advent of digital audio media (CD, MD, DVD, MP3) and the ability easily to transfer content without loss of quality have caused a resurgence of interest. However, there are several applications of watermarking beyond copy protection. They can be broadly classified into two types: identification and authentication. Identification associates the signal with some descriptive information. The watermark could carry the information, or it could carry an identification number pointing to information in a database elsewhere. Authentication establishes whether the signal is what it claims to be or not. The interest for broadcasters is almost entirely focused on applications of identification[8]. Figure 1, below, shows several different points in the broadcast chain where watermarks might be embedded for different applications. Acquisition 1 mixer emission 1 VTR Production 2 Broadcast Presentation or Playout contribution 3 distribution 5 4 watermark embedder Archive Figure 1 : Watermark embedding points in broadcast applications At point 1, each camera or microphone could embed an identifier unique to itself, conveying time, place, equipment serial number, and so on; At point 2, each finished programme could have a programme identifier added to it; At point 3, each broadcast network could be identified, including time and date of transmission; At point 4, as each programme enters the archive its catalogue number could be added; At point 5, as requests for programmes from the archive are fulfilled, a watermark to identify the recipient could be added. 2

There are corresponding different points where watermark detection might be done, again, depending on the application. These are shown in Figure 2, below. Acquisition 1 mixer emission contribution VTR Production 1 2 Broadcast Presentation or Playout 3 4 distribution 5 set top box 6 TV watermark detector Archive Figure 2 : Watermark detection points in broadcast applications At point 1, the input to production, the identity of material coming in that will be used in a programme can be established. Because of the short length of some clips, and the large amount of data that might be required, the payload capacity requirement is large. However, the robustness requirement is slight; At point 2, the output of production the ability to identify automatically programme material, and the rights associated with its use, can be very useful. The payload capacity requirement is again high, but robustness still slight, provided high quality is maintained; At point 3, in presentations, being able to identify signal sources is very valuable in order to ensure that the right programme feed is routed to the right network. Payload capacity is less and the robustness requirement should still be slight, unless poor quality connections are used; At point 4, as programmes are sent to the network, automatic logging of content could be used to ensure that rights usage is accurately recorded. The payload capacity is high, but robustness depends again on the programme production and contribution processes; At point 5, information can automatically be linked from production to archive by detection of watermarks that point to the production metadata. Payload capacity depends on whether there is a need only to identify complete programmes, or to identify short segments within programmes; 3

At point 6, the whole world, monitoring can be performed to detect whether programme material is being broadcast as it should be. There is a need to know that content that should be broadcast has been broadcast, and there is a need to know that content that should not be broadcast has not been broadcast. Audience monitoring can also be done at this point to find out how many people watched a particular broadcast. The payload capacity can be high or low, depending on the length of segment of interested. Robustness generally needs to be high. An information channel within the audio or video content can be used to convey all kinds of data to enable a wide variety of consumer applications. Web URLs, telephone numbers, automated event triggers, are examples things that could be sent, invisibly or inaudibly, through an existing audio or video circuit. However, as mentioned earlier, one of the highest profile applications at the moment is that of copy protection (or copy prevention). A watermark can be used to carry information about whether a particular signal is able to be recorded, or re-distributed, or not. The payload capacity requirement is relatively low, but the robustness requirement is extremely difficult, if not impossible, to achieve. All the above examples are of identification. Authentication also has a place, for example to ensure that web site content has not been tampered with (on the original site, or on other sites where it is being used with, or without, permission). Authentication is beyond the scope of this paper. Some techniques for audio watermarking Space here does not permit a detailed description of all the techniques available at present, so a brief overview of some of the more common ones is all that is included. See the references for more information and more references. Least-significant bit modification In a 24-bit, or even a 16-bit, linear digital audio signal, modifications to the least significant bit are almost always inaudible, if done with a little consideration. If used as a data channel this can provide around 90bit/s - enough for another (compressed) audio signal. An example of this technique was standardised for conveying MPEG coder control data in the ATLANTIC project[9,10]. Although the data capacity is high and the audibility low, the hidden data can be destroyed by even small alterations to the signal. 4

Signalling in defined frequency bands Data can be embedded by adding low-level tones to the signal. See Figure 3. Here advantage can be taken of masking provided by the host signal. The figure shows two tones hidden in the mid-part of the audio spectrum, masked by the presence of another signal. In another application, the tones could be carried high-up in the spectrum, above 20kHz for example. Here a wider bandwidth might be available, the tones are likely to be inaudible, but might be removed by low-pass filtering. power threshold during silence tonal signal watermark tones masked by signal Figure 3 : Addition of low-level tones masking threshold frequency Spread spectrum modulation Rather than adding tones to the signal, data can be carried by spreading it across a wide spectral band. Here the data might still be recoverable even if some parts of the spectrum are removed. The added noise can be generated by multiplying a pesudo-random number sequence by each of the bits in the stream of data to be embedded. To minimise audibility the sequences can be filtered according to a psycho-acoustic model. This is shown in Figure 4. power threshold during silence tonal signal pseudo-random noise pattern masked by signal masking threshold frequency Figure 4 : Addition of spread spectrum signal 5

Echo hiding Another way of conveying data in a signal is by adding low-level echoes of the signal to itself. The time-delays and relative amplitudes need to be chosen carefully to avoid undue audibility. Data can be conveyed by adding echoes with different time intervals. See Figure 5. Although investigated by Hill [1] it has been the advances in digital signal processing hardware that have rendered this a practical technique: detection involves calculating the auto-correlation of the cepstrum of the signal. amplitude t 1 t 2 Figure 5 : Addition of low-level echoes time There are, of course, other techniques. Whether any are useful for a particular application depends on the fundamental characteristics of perceptibility, robustness, and data capacity. EBU testing of watermarking systems The EBU has been actively involved in watermarking, for video and audio, for several years. A technical project group, N/WTM, was formed to evaluate the application of watermarking signals transmitted over the Eurovision network for purposes of copyright protection. Thorough tests were conducted on video watermarking systems in 2000 [11]. Audio watermarking tests followed and have recently been completed. The process was lengthy, starting in June 2001, finishing in July 2003. In summary, a list of technical requirements was made; calls for systems were issued; systems were received and tested. Several members of the EBU Project Group B/AIM (Audio In Multimedia) were involved in the test process, in particular IRT, Polish TV, and France Télécom, as well as the BBC. EBU audio watermarking system requirements Perceptibility The audio watermarks were required to be inaudible in studio listening conditions. This would be assessed by subjective tests according to ITU-R BS.1116 [12]. However, because it was hoped that the standard of systems submitted would be high, such that any impairments to the audio signal would not be revealed by BS.1116 tests, an addition was made to the test method used. Subjects were required to indicate which of the two signals to be graded (the hidden reference and the watermarked signal) they thought was watermarked, even if they would have indicated that the effect was imperceptible. This is called a forced choice. The normal BS.1116 selection panel process was used to find the set of items to be used in the test. These are listed in Table 1. 6

Table 1: Signals used in test sessions, with watermark at normal level Test item number Item type (and source) 4 xylophone (EBU SQAM, track 41) 8 flute (EBU SQAM, track 13) 9 glockenspiel (EBU SQAM, track 35) 13 triangle (EBU SQAM, track 32) 16 violin (EBU SQAM, track 8) 27 harpsichord (EBU SQAM, track 40) 31 English male speech (EBU SQAM, track 49) 34 German male speech (EBU SQAM, track 54) 36 wind ensemble (EBU SQAM, track 36) 39 Tennis (BBC) A small number of other items was used, with the watermark at boosted levels. Boosted level watermarks were created in order to train listeners, and to provide something in the test that would be detectable by ear. Only one of the systems submitted provided the means to do this, as requested in the call. Unfortunately this system was not included in the listening tests. The other systems did not provide a means, so difference signals were calculated and added at a higher level to the original signal. Robustness The requirements for robustness were expressed as a list of processes to which an audio signal might be subject during programme production, distribution, or emission, after which it would still be desirable to be able to detect the watermark. Those used in the tests were as follows: MPEG-1 Layer II, 128kbit/s joint stereo; MPEG-1 Layer II, 32kbit/s mono; Added white noise, -30dB relative to maximum level of test sequence; pitch-corrected time stretch, 105%; MPEG Layer III, 96 kbit/s, stereo; MPEG Layer III, 64 kbit/s, stereo; Linear time stretch, 110%; Dolby AC-3, 128 kbit/s; MPEG AAC, 32 kbit/s stereo; voice-over, attenuation of -18.16dB; Dynamic compression; Minidisc; Conversion to analogue; Various combined processes, consistent with current broadcast chains, including BetaSP, NICAM 728, Dolby E, and so on. 7

To measure the robustness a series of pseudo-random payloads was embedded using each watermark system into a 20 minute selection of audio. The watermarked audio from each system was then segmented, and unrelated audio, with a fixed payload, inserted between the segments. Software to generate Cool Edit Pro session files was written to automate this task. The segmented sequences were then played into the watermark detectors. The robustness to a particular process is expressed as the number of payloads recovered as a percentage of the number of segments. The unsegmented sequences were also played into the detectors. This makes the test less severe in some cases. The results are expressed in the same way as for the segmented signals, that is, as the number of payloads recovered as a percentage of the same number of segments. A payload being recovered that was not embedded is called a false positive detection. The probability of these should be very small, so as not to cause too many false accusations of misuse. The EBU requirement was set at 10-8. Payload capacity The purpose of applying a watermark in the EBU Eurovision distribution model is that of identification for purposes of copyright protection. The payload carried by the watermark should identify audio material uniquely, so that its use can be monitored. The shortest segment of audio that was thought to be useful, and so worth protecting, was 5s. The amount of data required to identify each segment of material was initially set at 64 bits, but this was later reduced to 48 bits. Two different interpretations were made of the payload capacity requirement by system proponents. This meant that two series of robustness tests had to be run, one with a segment length of 5s, and another with a segment length of 10s. Description of systems submitted for testing Company XX supplied a PC running Linux fitted with a soundcard with S/PDIF interfaces. A simple script was run to allow the selection of embedder or one of two detector implementations (referred to as System A and System B ). Payloads were read from the serial port for embedding, and written to a file on disc by the detectors. Company YY supplied two PCs running Windows 2000 each fitted with a soundcard with AES/EBU interfaces. Programs were provided for embedding and for detecting. Payloads were read from the serial port for embedding, and written to the serial port and a log file by the detector. Company ZZ supplied an embedder and detector, each in a 2U 19 rackmount box. The only controls on the boxes were the on-off switches. The boosted-level watermark setting was controlled by sending a special data sequence into the RS-232 port used for the payloads. Audio input and output was via AES/EBU interfaces. In preliminary tests this system was found to have some problems with false positive detections and perceptibility on some signals. As a result it was not subjected to the full set of tests. 8

Results of subjective tests The results of the subjective tests are in three categories: ITU-R BS.1116 diff-grades and 95% confidence intervals; A Wilcoxon rank sum test on the BS.1116 diff-grades for each subjective test item; Recognition rate from forced choice as a function of test item and of listener. As was noted earlier, robustness tests were conducted in two phases because of different interpretations of the 5s watermark minimum segment length requirement. The formal subjective tests were conducted on the systems supplied for the first phase of robustness tests. Assurances were given by the system proponents that the alternative configurations supplied for the second phase would not differ in perceptibility. Informal, but expert, listening confirmed this. Figures 6, 7, and 8 show results for the XX system. Figures 9, 10, and 11 show results for the YY system. 9

XX subjective test results statistical mean XX XX_4 1,00 0,50 0,00-0,50-1,00-1,50-2,00-2,50-3,00-3,50-4,00 XX _8 XX _9 XX XXs_ 13 95% confidence interval XX _16 XX _27 XX XX_31 XX XX_34 XX _36 XX XX _39 6dB_ XX _10 XX 6dB_ XX_14 XX 6dB_ XX _22 XX 6dB_ XX_29 XX Figure 6 : Mean grade and 95% confidence interval for XX system The Wilcoxon rank sum test on the diff-grades did not find any significant difference from the zero distribution that would be expected from un-watermarked signals. 10

100% 90% 80% 70% 60% 50% XX 40% 30% 20% 10% 0% exam ple 4 example 8 exam ple 9 exam ple 13 exam ple 16 exam ple 27 exam ple 31 exam ple 34 exam ple 36 exam ple 39 Figure 7 : Recognition rate of items for XX system 100% 90% 80% 70% 60% 50% XX 40% 30% 20% 10% 0% subject 1 subject 3 subject 5 subject 7 subject 9 subject 11 subject 13 subject 15 subject 17 subject 19 subject 21 subject 23 subject 25 Figure 8 : Recognition rate of subjects for XX system The overall recognition rate for the XX system was 56.0% 11

YY subjective test results statistical mean Y YY 2_4 1,00 0,50 0,00-0,50-1,00-1,50-2,00-2,50-3,00-3,50-4,00 YYY 2_8 YYY 2_9 95% confidence interval YYY 2_13 YYY 2_16 YYY 2_27 YYY 2_31 Y YY 2_34 YYY 2_36 Y YY 2_39 6dB_ YY YY 2_10 6dB_ YY 2_14 YY Figure 9 : Mean grade and 95% confidence interval for the YY system The Wilcoxon rank sum test on the diff-grades showed a difference from the zero distribution that would be expected from un-watermarked signals for only one test item, number 34 (German male speech). 12

100% 90% 80% 70% 60% 50% YY 40% 30% 20% 10% 0% exam ple 4 exam ple 8 exam ple 9 exam ple 13 exam ple 16 exam ple 27 exam ple 31 exam ple 34 exam ple 36 exam ple 39 Figure 10 : Recognition rate of items for YY system 100% 90% 80% 70% 60% 50% YY 40% 30% 20% 10% 0% subject 1 subject 3 subject 5 subject 7 subject 9 subject 11 subject 13 subject 15 subject 17 subject 19 subject 21 subject 23 subject 25 Figure 11 : Recognition rate of subjects of YY system The overall recognition rate for the YY system was 46.8%. 13

Results of robustness tests Robustness tests were run in two phases. In the first phase the XX system had an effective watermark minimum segment length of 10s and the YY system one of 5s. In the second phase the XX system was modified to have a watermark minimum segment length of 5s and the YY system one of 10s. Table 2 shows the results of the robustness tests using a 10s watermark minimum segment length (XX first phase original and YY second phase bis ). Table 3 shows the results of the robustness tests using a 5s watermark minimum segment length (XX second phase bis, and YY first phase original ). Because of restrictions on the time available to conduct the second phase of tests not all processes were tried in the second phase: entries in the tables are left blank in this case and the rows shaded grey. Some tests caused erroneous behaviour by the detectors. In these cases it was not possible to calculate a valid result: these are shown as - in the tables. 14

Table 2: Percentage of marks recovered with 10s watermark segments: XX system from first phase of tests, YY system bis from second phase of tests Continuous replay Segmented replay Attack XX A XX B YY bis XX A 10s XX B 10s YY bis 10s No attack 95.2 94.8 97.6-91.2 96.0 minidisc (BBC) 93.5 94.4 96.0 84.6 91.2 94.4 minidisc (TVP) 93.5 94.0-88.8 Dolby AC-3, 128kbit/s, stereo 92.7 93.7-88.0 MPEG Layer II, 128kbit/s, joint stereo 94.4 93.1 96.8-89.6 91.9 MP3, 96kbit/s, stereo 91.5 93.5 62.1 58.8 89.6 57.3 MP3, 64 kbit/s, stereo 73.8 75.0 6.5-63.8 8.1 AAC 32kbit/s, stereo 7.7 0.8 0.0-0.0 0.0 MPEG Layer II, 32 kbit/s, mono 0 0.4-0.0 linear time stretch, (10%) 0 84.6 95.2 0 0.0 43.5 pitch-corrected time-stretch, 5% 0.4 0.8 0.0 1.6 0.0 0.0 voice-over, +15dB 0.8 0.0-0.8 added white noise, -30dB 0.8 0.0 0.8 0.8 dynamic range compression 95.2 95.2 96.0 60.4 92.0 94.4 analogue conversion 92.3 91.5 91.2 88.8 combined audio processing 94.8 94.8-92.8 Broadcast chain 1, FM 87.5 86.3 57.3 83.0 80.6 58.9 Broadcast chain 2, NICAM 91.5 90.3 78.2 90.4 Broadcast chain 3, NICAM + MPEG 90.7 90.7 91.1 61.2 69.4 71.8 Broadcast chain 4, Dolby E+AC3 93.1 89.5-86.2 first watermark detection after application of second watermark 95.2 94.4 93.5-88.8 96.8 second mark detection 94.0 95.6 96.8-95.2 95.2 Colour code: 100%-80% 79%-60% 59%-40% 39%-20% 19%-0% 15

Table 3: Percentage of marks recovered with 5s watermark segments: XX system bis from second phase of tests, YY system from first phase of tests Continuous replay Segmented replay Attack XX A bis XX B bis YY XX A bis 5s XX B bis 5s YY 5s No attack 98.0 98.0 73.4 - a 90.7 89.1 minidisc (BBC) 95.6 97.6 7.7-85.9 14.5 minidisc (TVP) 0.0 0.0 Dolby AC-3, 128kbit/s, stereo 0.0 0.0 MPEG Layer II, 128kbit/s, joint stereo 97.2 96.4 0.0-86.3 0.0 MP3, 96kbit/s, stereo 90.3 91.9 0.0-68.1 0.0 MP3, 64 kbit/s, stereo 56.5 42.7 0.0-15.3 0.0 AAC 32kbit/s, stereo 4.8 0.0 0.0 0.0 0.0 0.0 MPEG Layer II, 32 kbit/s, mono 0.0 0.0 linear time stretch, (10%) 0.0 97.2 0.4 0.0 0.0 2.6 pitch-corrected time-stretch, 5% 4.0 1.6 18.1 2.8 2.0 1.2 voice-over, +15dB 0.0 0.0 added white noise, -30dB 0.0 0.0 dynamic range compression 98.0 98.0 75.0-92.7 60.1 analogue conversion 0.0 0.0 combined audio processing 0.0 0.0 Broadcast chain 1, FM 79.4 63.7 0.0 32.7 52.0 0.0 Broadcast chain 2, NICAM 0.0 0.0 Broadcast chain 3, NICAM + MPEG 96.0 96.0 0.0 64.1 82.3 0.0 Broadcast chain 4, Dolby E+AC3 7.3 19.4 first watermark detection after application of second watermark 98.0 98.0 39.9-90.3 63.3 second mark detection 98.0 98.0 41.1 - b 93.5 39.5 a. When the interstitial audio was digital 0, 60.5% correct was obtained b. When the interstitial audio was digital 0, 79.8% correct was obtained colour code: 100%-80% 79%-60% 59%-40% 39%-20% 19%-0% 16

Conclusions A call for audio watermarking systems resulted in a number of systems being submitted for test. The technical requirements listed in the call included the robustness, audibility, and payload capacity against which the systems would be evaluated. The robustness for a particular payload capacity was measured by a series of objective tests. Audibility was measured by subjective evaluation. Subjective tests Formal subjective listening tests were conducted, according to BS.1116 plus a forced choice, on the XX 10s watermark minimum segment length and YY 5s watermark minimum segment length systems. Analysis of difference grades The difference grades and 95% confidence intervals indicate that for the XX system none of the items could be said to be different from the original, while for the YY system one of the items (item 34, German male speech, EBU SQAM disc), the 95% confidence interval does not quite cross the zero diff-grade axis and so does show a difference. The Wilcoxon rank sum test on the difference grades confirms this result: according to this test none of the items with XX watermark showed a significant difference from the original, item 34 watermarked by the YY system showed a significant difference from the original. None of the mean difference grades for any of the items for any of the systems were lower than -0.25 and none of the 95% confidence intervals extended below a difference grade of -0.5. Analysis of forced choice The analysis of recognition rate merits more study. If a subject scores a high percentage of correct identifications there is a certain probability that they are able to hear the watermark rather than be guessing. However, what one should do with the cases where someone scores a very low percentage of correct identifications is not clear. Some subjects did score high recognition rates for some systems: it remains to be determined whether that was by chance or not. Given different probabilities of audibility there are different probability distributions of recognition rate. The recognition rate required to differentiate between zero and close-to-zero probabilities of audibility has not yet been determined. Updated systems audibility Informal listening to the systems updated to have a different watermark minimum segment length did not reveal any change to the audibility of the signals compared with the originally submitted systems. Objective tests The detectability of watermarks in response to a wide range of types of attack has been measured for systems with watermark minimum segment lengths of 5s and 10s. None of the systems proposed showed significant robustness to all of the attacks, which, it must be recognised, range from those with negligible effect on audio quality to those which produced material with questionable usability. Because of the different performance of the different 17

systems in response to the same attack it is not possible to say that one system is always better than another. 5s watermark minimum segment length The tests performed with a 5s watermark minimum segment showed clear differences between the systems. For unsegmented detection both XX systems showed good robustness to mild attacks, but poor robustness to some others, for example, AAC 32 kbit/s, while the YY system showed poor robustness to most attacks. With segmented detection the XX system B detector showed quite good robustness although the detection rate fell quite rapidly for more severe attacks, for example from 68.1% to 15.3% with MPEG Layer III at 96kbit/s and at 64kbit/s respectively. XX system A was found to be unreliable with segmented signals. The YY system showed poor robustness as it did for unsegmented detection. 10s watermark minimum segment length The tests performed with a 10s watermark minimum segment again showed some significant differences between the systems. For example, for segmented linear time-stretched material, XX had a detection rate of 0.0% while YY had a detection rate of 43.5%. On the other hand, for segmented MPEG Layer III at 64kbit/s, XX system B had a detection rate of 63.8% and YY had 8.1%. The XX system A detector was again found to be unreliable with segmented signals. The difference between the robustness of the YY and XX systems was small for unsegmented signals with a watermark minimum segment length of 10s and mild attacks. The YY system was less robust to more severe attacks. Overall robustness The XX system B 10s and 5s watermark minimum segment length systems and the YY 10s watermark minimum segment length system showed reasonable robustness to attacks that did not significantly affect the quality of the audio signal. In general it was attacks that resulted in poor audio quality caused by very low bit rate coding that produced low detection rates. The YY 5s watermark minimum segment length system was not very robust to most attacks. The XX system A detector was found not to be reliable. The complete test report is available to EBU members[13]. 18

References 1. P.C.J. Hill, Simultaneous subliminal signalling in conventional sound circuits: a feasibility study, BBC Research Department Report No. 1971/1, 1971. [ Available from http://www.bbc.co.uk/rd/pubs/reports/index.html ] 2. J.R. Chew, Simultaneous signalling in digital sound circuits: a feasibility study, BBC Research Department Report No. 1971/26, 1971. [ Available from http://www.bbc.co.uk/rd/pubs/reports/index.html ] 3. W. Bender, D. Gruhl, N. Morimoto, A. Lu, Techniques for data hiding, IBM Systems Journal, vol. 35, Nos. 3&4, 1996. 4. I. Cox, J. Kilian, F. Leighton, T. Shamoon, Secure spread spectrum watermarking for multimedia, IEEE Transactions on Image Processing, vol. 6, No. 12, December 1997. 5. L. Boney, A. Tewfik, K. Hamdy, Digital watermarks for audio signals, 1996 IEEE Int. Conf. on Multimedia Computing and Systems June 17-23, Hiroshima, Japan, 6. Multimedia and security, IEEE multimedia, Vol. 8, No. 4, October-December 2001 7. Special issue on signal processing for data hiding in digital media and secure content delivery, IEEE Transactions on Signal Processing, Vol. 51, No. 4, April 2003 8. A. Mason, R. Salmon, O. Werner, J. Devlin, User requirements for watermarking in broadcast applications, International Broadcasting Convention, Amsterdam, 2001. [ Also available from http://www.bbc.co.uk/rd/pubs/papers/index.html ] 9. N.H.C. Gilchrist, ATLANTIC audio: preserving technical quality during low bit rate coding and decoding, AES 104th Convention, Amsterdam, 1998, AES pre-print No. 4694. 10. AES41-2000 AES standard for digital audio - Recoding data set for audio bit-rate reduction, Audio Engineering Society, 2000 11. L. Cheveau, E. Goray, R. Salmon, Watermarking - summary results of EBU tests, EBU Technical Review, No. 286, March 2001 [ http://www.ebu.ch/trev_286-cheveau.pdf ] 12. Methods for the subjective assessment of small impairment in audio systems including multichannel sound systems, Recommendation ITU-R BS.1116-1, International telecommunications Union 13. Report on EBU tests of audio watermarking systems, European Broadcasting Union internal publication, BPN 058, 2003. 19