Exposing MP3 Audio Forgeries Using Frame Offsets
|
|
- Jasmin Craig
- 6 years ago
- Views:
Transcription
1 Exposing MP3 Audio Forgeries Using Frame Offsets RUI YANG, ZHENHUA QU, and JIWU HUANG, Sun Yat-sen University Audio recordings should be authenticated before they are used as evidence. Although audio watermaring and signature are widely applied for authentication, these two techniques require accessing the original audio before it is published. Passive authentication is necessary for digital audio, especially for the most popular audio format: MP3. In this article, we propose a passive approach to detect forgeries of MP3 audio. During the process of MP3 encoding the audio samples are divided into frames, and thus each frame has its own frame offset after encoding. Forgeries lead to the breaing of framing grids. So the frame offset is a good indication for locating forgeries, and it can be retrieved by the identification of the quantization characteristic. In this way, the doctored positions can be automatically located. Experimental results demonstrate that the proposed approach is effective in detecting some common forgeries, such as deletion, insertion, substitution, and splicing. Even when the bit rate is as low as 32 bps, the detection rate is above 99%. Categories and Subject Descriptors: H.4. [Information Systems Applications]: General; K.6.5 [Management of Computing and Information Systems]: Security and Protection General Terms: Security, Algorithms, Verification Additional Key Words and Phrases: MP3 audio forgery, forgery detection, audio authentication ACM Reference Format: Yang, R., Qu, Z., and Huang, J Exposing MP3 audio forgeries using frame offsets. ACM Trans. Multimedia Comput. Commun. Appl. 8, S2, Article 35 (September 212), 2 pages. DOI = / INTRODUCTION With the development of digital voice recorders and cell phones, nowadays speech and conversation can be easily recorded as evidence. However, hearing cannot be believing since these audio recordings can be tampered with very easily by pervasive audio editing software. An audio recording may contain some important words or sentences synthesized from other audio, so authentication technologies need to be developed for digital audio. The existing audio authentication technologies can be divided into two groups: active authentication (including digital watermaring and digital signature) and passive authentication. Active authentication requires accessing original audio before it is distributed, for example, embedding a watermar or generating a signature, while passive audio authentication A portion of this article was presented at the 1 th ACM Multimedia and Security Worshop. The wor was supported in part by 973 Program (211CB3224) in China and NSFC (U11351, ). J. Huang is also a visiting researcher of State Key Laboratory of Information Security, Beijing 119, China. Authors addresses: R. Yang, Z. Qu, and J. Huang (corresponding author), Sun Yat-sen University, Guangzhou 516, China; isshjw@mail.sysu.edu.cn. Permission to mae digital or hard copies of part or all of this wor for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this wor owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this wor in other wors requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 71, New Yor, NY USA, fax +1 (212) , or permissions@acm.org. c 212 ACM /212/9-ART35 $15. DOI / ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 8, No. S2, Article 35, Publication date: September 212.
2 35:2 R. Yang et al. means checing the integrity of audio recording by analyzing its inherent properties. In most authentication cases, audio does not actually contain any digital watermar or signature. Thus it is necessary to passively examine the integrity of the digital audio. Until now, there were few wors on passive authentication for digital audio. Based on the assumption that a natural signal has wea higher-order statistical correlations in the frequency domain and that forgery in speech would introduce unnatural correlations, Farid [1999] used bispectral analysis to detect digital forgery for speech signals. It was shown that the zero phase of bispectral decreased a lot for forged speech. However, the method is only suitable for uncompressed audio. Grigoras [25] pointed out that digital equipment captures not only the intended speech but also the 5/6 Hz Electric Networ Frequency (ENF) when recording. The ENF criterion could be used to chec the integrity of digital audio recordings and to verify the exact time when a digital recording was created. This could be done by compared the ENF of audio recordings with a reference frequency database from the electric company or the laboratory. The method is highly dependent on the accuracy of the extracted ENF, while ENF is a quite wea signal compared to the audio recording. Dittmann et al. [Kraetzer et al. 27] proposed a method to determine the authenticity of the speaer s environment. In their paper it was said that the extraction of the bacground features in an audio stream could provide an informative basis for determining the location of its origin and the used microphone. But a lot of audio recordings are required for training. MP3 audio format is popularly used in most applications, and is now the most popular format among all formats in digital voice recorders. The top 2 best-selling digital voice recorders of amazon.com all support the MP3 format, and some of them only support the MP3 format. For most cell phones, the default recording format is the MP3 format. Digital voice recorder and cell phone are the most frequent recording machines for people in daily life. It would be fairly easy to remove complete sections of a recording or splice two sentences from different recordings. Small changes in the audio stream can cause a different meaning of the whole sentence. Exposing forgeries in MP3 files can authenticate the daily recordings presented as evidence in criminal and civil court cases, and such as undercover surveillance recordings made by the police, recordings presented by feuding parties in a divorce, recorded telephone conversation in domestic violence cases, and recordings from corporations seeing to prove employee wrongdoing or industrial espionage. At the same time, forgeries detection solutions are needed for manufacturers of audio recording equipment. There are as yet still no reported passive authentication methods focusing on MP3 format audio. An existing related wor is the classification of MP3 encoders, which was proposed by Boehm and Westfeld [24]. The wor outlines a method to discriminate 2 different MP3 encoders with 1 features. Experimental results show that these features have accurate classification for MP3 encoders and can improve the performance of MP3 steganalysis. The application of the method to passive authentication is not discussed in the paper. Theoretically the method could handle tampered audio by splicing audio from different recorders, but tampering within an audio recording is out of its range. As MP3 audio becomes popular, it is necessary to develop passive approaches to chec the integrity of MP3 audio. Passive authentication on JPEG image and MPEG video has attracted many researchers. Some approaches have been proposed, such as the quantization-table-based method [Luas and Fridrich 23], the periodical-artifacts-based method [Popescu and Farid 24], Benford s-law-based method [Fu et al. 27], and the shift double JPEG detection-based method [Qu et al. 28]. One direct question arises: can these methods be applied to passive authentication on MP3? Unfortunately, direct extension of the existing JPEG methods to MP3 audio does not wor, because there are many differences between MP3 compression and JPEG compression. For example, an MP3 encoder divides the samples of the time domain into frames with 5% overlap, while JPEG compression is without overlap. This leads to the impossibility of detection of bloc artifacts in MP3 compression. The calculation and quantization ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 8, No. S2, Article 35, Publication date: September 212.
3 Exposing MP3 Audio Forgeries Using Frame Offsets 35:3 Fig. 1. Bloc diagram of MP3: (a) encoder; (b) decoder. in MP3 compression are performed with float point representation. So the quantization-table-based method in JPEG which performs well with integer numbers is useless for MP3 compression. In this article we will propose a forgery detection method for digital audio of MP3 format. Note that forgeries at MP3 files are always performed in this way: first decoding, then tampering, and finally re-encoding. Based on the discovery that forgeries brea the original frame segmentation, we utilize frame offsets to locate forgeries automatically. The original frame offsets are retrieved by a quantization characteristic. Via extensive experiments, it is shown that the proposed method can detect most common forgeries, such as deletion, insertion, substitution, and splicing. At the same time, the proposed method is robust to some common postprocesses lie filtering and adding noise. The article is organized as follows. In Section 2, we give a brief analysis of MP3 coding and claim that only identical frame offsetting can introduce the quantized spectral characteristic. Then we develop a method to detect frame offsets in Section 3. Based on the detection method, we propose that the change of frame offsets could locate forgeries effectively in Section 4. The experimental results are shown in Section 5. Finally, we conclude our article with a discussion and future wor in Section ANALYSIS OF MP3 COMPRESSION CHARACTERISTICS In this section, first we will give a brief overview of MP3 coding, then explain two important concepts of this article: frame offset and quantization characteristics. In Section 2.1 we only explain those principles that are relevant to our detection method, especially the spectral decomposition and quantization. Detailed architecture and specification of MP3 coding may be referred to ISO [1992]. In Section 2.2, the definition of frame offset is demonstrated via an example. In Section 2.3, the quantization characteristics are analyzed. 2.1 MP3 Coding Figure 1(a) shows the bloc diagram of a typical MP3 encoder [Painter and Spanias 2]. The input PCM signal is first separated into 32 sub-bands by the analysis filterban, and the Modified Discrete Cosine Transform (MDCT) window further divides each of these 32 sub-bands into 18 sub-bands (long windows) or 6 sub-bands (short windows). Then a total of 576 or 192 spectral lines are generated respectively. ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 8, No. S2, Article 35, Publication date: September 212.
4 35:4 R. Yang et al. Fig. 2. Framing grids and frame offsets. The top panel shows three continuous framing grids for the first encoding, and the bottom panel shows the corresponding frame grids for the second encoding. The frame offsets of the three framing grids are identical. The psychoacoustic model analyses the audio content and estimates the masing thresholds. The output of this model consists of the just noticeable noise level for each sub-band and the information about the window type for MDCT. According to the masing thresholds estimated by the psychoacoustic model, the spectral values are quantized via a power-law quantizer. The quantization step introduces an iterative algorithm to control both the bit rate and the distortion level, so that the perceived distortion is as small as possible, under the limitations of the desired bit rate. Finally, the quantized spectral values are encoded using Huffman code tables to form a bitstream. The bloc diagram of MP3 decoder is shown in Figure 1(b). Firstly, Huffman decoding is performed on the MP3 bitstream, and then the decoder restores the quantized MDCT coefficient values and the side information related to them, such as the window type that is assigned to each frame. After inverse quantization, the coefficients are inverse-mdct transformed to the sub-band domain. Finally, the PCM waveforms are reconstructed by the synthesis filterban. 2.2 Frame Offset The frame offset [Yang et al. 28] is defined as the shifting samples of the frame grid between the first and second encoding in this article. It is noted that forgeries at MP3 files are always performed in this way: first decoding, then tampering, and finally re-encoding. So the frame offset would become nonzero when forgeries are conducted on MP3 files, and is always zero for no forgery. Figure 2 shows an illustration of the generation of frame offset. When performing the first encoding, the framing grids of the original signal are shown in the top of Figure 2. Each framing grid contains 1152 samples with 5% overlap. After decoding, some extra zero samples are added at the beginning of the signal by the ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 8, No. S2, Article 35, Publication date: September 212.
5 Exposing MP3 Audio Forgeries Using Frame Offsets 35:5 value value.5 (a) unquantized spectral in a real value form frequency index (b) quantized spectral in a real value form frequency index (c) unquantized spectral in a logarithmic representation 1 No troughs 5 magnitude (db) magnitude (db) frequency index (d) quantized spectral in a logarithmic representation 1 Many troughs frequency index Fig. 3. Unquantized and quantized spectral coefficients: (a) and (b) are in a real value form, while (c) and (d) are in a logarithmic representation. The major difference between the unquantized and quantized spectral is the number of zero coefficients, which are shown as troughs. decoder. During the second encoding, new framing grids are generated. Obviously, if forgeries occur, frame offsets of some frames may change. 2.3 Quantization Characteristics Many spectral coefficients are usually quantized to zero during the encoding. This is due to some spectral components being completely mased by other components and the existence of some coefficients around zero which is the inherent probability distribution of the spectral coefficients. The increase in zero spectral coefficients is a quantization characteristic of MP3 coding. This characteristic is firstly described by Herre and Schug [2] and Herre et al. [22]. They utilized it to optimize audio cascaded coding. In the following, we will analyze this characteristic. The difference between an unquantized spectral coefficient and its quantized one is not easily visible in their real value form, as illustrated in Figures 3(a) and (b). But they can be discriminated by looing at the spectral coefficients in a logarithmic representation. As shown in Figures 3(c) and (d), there are many zero values which appear as troughs in the quantized spectral, while this phenomenon cannot be found in the unquantized spectral. These troughs in the spectral representation will be visible only if the framing grids are the same as those in the first encoding. This means that only if the identical frame offset with the first encoding is ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 8, No. S2, Article 35, Publication date: September 212.
6 35:6 R. Yang et al. magnitude (db) magnitude (db) magnitude (db) 1 5 (a) offset = frequency index (b) offset = frequency index (c) offset = frequency index Fig. 4. Spectral coefficients when with frame offsets of 1,, +1 samples. The quantization characteristics appear only if the correct frame offset () is applied. applied will these troughs appear. This fact is illustrated by Figure 4, which shows MDCT coefficients of a decoded signal with one-sample-left shift (offset = 1),no-sample shift (offset= ) and one-sampleright shift (offset =+1) from the encoder framing grid, respectively. As we see, the troughs disappear even with the frame offset being one-sample shift in the decoded signal. 3. METHOD OF RETRIEVING FRAME OFFSETS The ey of detecting frame offsets is the identification of quantization characteristics. In this section, we develop a method of retrieving frame offsets based on the observations in the previous section. 3.1 Number of Active Coefficients From Figure 4, it is noted that a significant difference between spectral coefficients without offsets (Figure 4(b)) and with offset (Figures 4(a) and (c)) is the number of active (nonzero) spectral coefficients. For convenience, we denote the number of active coefficients as NAC in this article. In Figure 4, the NACs for offset 1 and+1 (shifted offsets) are 36 and 3, respectively; while the NAC for offset (matching offset) is only 197. For a robust and automatic identification of the characteristic spectral, the NACs as a function of frame offset can be used as a feature. Such a criterion yields reliable results, as shown in Figure 5. We observe that the beginning of each frame is clearly detectable by an obvious decrease in the NACs. A period of 576 can be observed. Why is there a period of 576? It is noted that 576 = %, where 1152 is the length of a frame and 5% is the amount of overlap specified by the MP3 standard. A frame with offset 576 exactly corresponds to the next frame. ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 8, No. S2, Article 35, Publication date: September 212.
7 Exposing MP3 Audio Forgeries Using Frame Offsets 35:7 Number of active coefficients Number of active coefficients via different frame offsets frame offset Fig. 5. NACs via different frame offsets. NAC achieves minimums when the frame offsets are multiples of Theoretical Analysis Now let us examine why the quantization characteristics appear only if the matching offset is applied. It arises from the inherent property of MDCT. The MDCT transform performed in MP3 coding is as follows [Wang and Velermo 23]. X (p) [] = 2 2N 1 ( π x (p) [n] h[n] cos N N (n + N + 1 ( ) + 1 )), N 1 (1) 2 2 n= By applying an inverse-mdct transform to the frame, we get 2N time-aliased samples. ˆx (p) [n] = 2 N 1 ( ( π X (p) [] cos N N n + N + 1 ) ( + 1 )), n 2N 1 (2) 2 2 = In order to cancel the aliasing and get the original samples, we have to use the OLA (Overlapping Addition) procedure. An inverse-mdct is applied to the previous and the next frame. Then, each of the resulting aliased segments is multiplied by its corresponding window function and the overlapping time segments are added together. We thus recover the original samples. { ˆx(p 1) [n + N] h[n n 1] + ˆx (p) [n] h[n], n N 1 x (p) [n] = (3) ˆx (p) [n] h[2n n 1] + ˆx (p+1) [n N] h[n N], N n 2N 1 Denote that x (p) [n] = x (p) [n] h[n], n 2N 1. (4) If a signal exhibits local symmetry such that { x(p) [n] = x (p) [N n 1], n N 1 (5) x (p) [n] = x (p) [3N n 1], N n 2N 1 its MDCT coefficients become zero. That is, X (p) [] = for =,...,N 1. In Wang et al. [2], it has been proven that x (p) [n] fulfills Eq. (5) if X (p) [] =. This inherent property of the MDCT gives the answer to why NAC has a significant decrease only if the identical frame offset is applied. After MP3 encoding, many spectral coefficients are mased or quantized to ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 8, No. S2, Article 35, Publication date: September 212.
8 35:8 R. Yang et al. Table I. Mean Value and Standard Diviation of NACs at Different Bit Rates bit rate shifted NACs matching NAC Mean Std Mean Std 32 bps bps bps bps zero. When decoding, these zero spectral coefficients are restored to the time domain, and x (p) [n] fulfills Eq. (5). While performing MDCT on the decoded data with the identical frame offset to the first encoding process, we will get a lot of X (p) [] equal to zero. If there is a different frame offset, the local symmetry in Eq. (5) is broen, and then the corresponding spectral X (p) [] will not be zero. 3.3 Experiments on Retrieving Frame Offsets To illustrate the preceding analysis, we randomly select 3 different audio frames, and encode these frames with LAME v3.97 [LAM 212] at the bit rates of 32 bps, 64 bps, 96 bps, and 128 bps, respectively. For each bit rate, we apply offsets from 575 to 575 on these frames, and calculate NACs corresponding to all offsets. Then we get 1151 NACs for each frame totally. The 115 NACs corresponding to wrong offsets are named as shifted NACs, and the NAC corresponding to the correct offset is denoted as matching NAC. The shifted NACs and the matching NAC are plotted, respectively. As shown in Figure 6, for each bit rate, there are 3 boxes representing the distribution of shifted NACs. As shown in Figure 6(a), the minimum value of shifted NACs is larger than 15 for each frame, while the matching NAC is below 8. For all frames, we observe that matching NAC is very discriminative from shifted NACs. The case of 64 bps, 96 bps, and 128 bps are illustrated in Figures 6(b), (c), and (d), respectively. Although frames may be encoded with different bit rates, the matching NAC is always smaller than shifted NACs. This means that we can regard the minimum NAC as the matching NAC. From Figure 6, we also notice that the distance between shifted NACs and the matching NAC becomes small while the bit rate increases. This is because signal distortion and lost information is less when the bit rate is higher, and MDCT coefficients contain less s. As the aforesaid investigation is based on only 3 frames, the conclusion may be not general enough. In the following, we will tae statistics on 128 frames, including 64 frames of speech and 64 frames of music. We compute 115 shifted NACs and the matching NAC for each frame. Table I displays the mean values and standard deviations of NAC based on 128 frames. It is found that the mean values of shifted NACs and the matching NAC have a significant distance. The standard deviations are all small compared to the mean values. However, as we noted before, the difference between shifted NACs and the matching NAC becomes small when with a high bit rate, such as 128 bps. 4. LOCATING FORGERIES VIA CHECKING FRAME OFFSETS As audio samples are divided into frames for encoding, the frame offset could be useful evidence of tampering. When forgeries occur, all frames after the forged points will be affected. The detected offsets of corresponding frames will change. Figure 7 is an example of cropping. The original sentence I am not guilty is recorded with sampling rate of 44.1Hz and saved as MP3 format by a digital recorder, as shown in Figure 7(a). We manipulate this audio recording with CoolEdit v2.1, and remove the ey word not. The meaning of the sentence becomes the opposite: I am guilty, shown in Figure 7(b). The detected offsets of all frames in the original audio and the doctored one are demonstrated in Figure 7(c) ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 8, No. S2, Article 35, Publication date: September 212.
9 Exposing MP3 Audio Forgeries Using Frame Offsets 35: (a) NAC result of frames encoded with 32 bps 32bps NAC distribution of 115 NACs with wrong offsets for 14th audio frame NAC with the correct offset for 14th audio frame different audio frames (b) NAC result of frames encoded with 64 bps 4 64bps 35 3 NAC different audio frames (c) NAC result of frames encoded with 96 bps NAC bps different audio frames (d) NAC result of frames encoded with 128 bps NAC bps different audio frames Fig. 6. The distribution of NACs corresponding to frame offsets from 575 to 575 on 3 different audio frames, which are encoded using LAME v3.97, mono. The box stands for the distribution of 115 NACs with wrong offsets, while the isolated point is the NAC with the correct offset. In panel (a) (b) (c) (d) are the cases for 32 bps, 64 bps, 96 bps, 128 bps, respectively. ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 8, No. S2, Article 35, Publication date: September 212.
10 35:1 R. Yang et al. detected offset detected offset.5 (a) Original Waveform I am not guilty Cropping x 1 4 (b) Doctored Waveform I am guilty x 1 4 (c) detection result of original audio different 15 frame 2 25 (d) detection result of doctored audio different frame Fig. 7. Example of locating one cropping. The sentence I am not guilty is cropped to I am guilty, shown as (a) and (b). (c) is the detection result of the original audio. The detected offsets of all frames are, which means there are no forgeries. (d) is the detection result of the doctored audio. The detected offsets change at frame 119, which means there is a forgery. Note that the horizontal-axis represents samples in (a)(b), but frames in (c)(d). 16 samples corresponds to 277 frames exactly. and Figure 7(d), respectively. We observe that all frames in the original audio have the same offset. But for the doctored one, the detected offsets have two different values, for frames 1 to frame 118, and 384 for the remainder. We can draw a conclusion that there is a forgery at frame 119. From the previous example, we have the general procedures of locating forgeries: (i) detecting offsets of all frames; (ii) checing the differences between frame offsets. Now how can the offsets of all frames be retrieved effectively? Given an audio signal of L samples, we denote it with vector-notation x, and mar the j-sampleshifted version (which means appending j zero samples at the beginning of x) asx ( j) ( j < 576). x () = x, x ( j+1) = [,x ( j)], j =,...,574 For each offset j, we split x ( j) into 1152 samples per frame with 5% overlap, so we totally get N = L/576 1 frames as follows. We have ( j) j) [ˆx ˆx( N 1] = Fx ( j), ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 8, No. S2, Article 35, Publication date: September 212.
11 Exposing MP3 Audio Forgeries Using Frame Offsets 35:11 where F represents frame segmentation as well as applying the window function, and ˆx ( j) is the -th frame of x ( j), We apply the filterban and MDCT to each frame and obtain its spectral (576 MDCT coefficients). We have s ( j) = T ˆx ( j), where T represents both filtering by the filterban and MDCT. s ( j) represents the spectral of the -th frame of x ( j). We change s ( j) ( into the logarithm representation M j). M ( j) = 1 log ( max ( s ( j) j) s( 11, 1 )) ( We express M j) in a logarithm representation by projecting all values into the range [,1]. ( We then count the number of active value in M j).wehave c ( j) = CM ( j), where C represents the counting operation. For frame, the detected offset is where mean(c ( j) ) = j= offset = { arg min j c ( j), if mean( c ( j) ) ( j) min c θ, ) ( j) min c <θ, 1, if mean ( c ( j) j) c(, θ is a threshold to discriminate whether the frame offset is detectable. are close, but there is always a For some cases the frame offset does not exist or is not covered, all c ( j) min c ( j). So we need a threshold θ to indicate these cases, and we accept the frame offset is detectable only when mean(c ( j) j) ) min c( is large enough. Otherwise the frame offset is undetectable. Note that each frame would expect a offset for no forgery, since there is no sample shift on each frame. However, the detection results of some frames would come up with nonzero offset for forgery. To locate the forgeries, we just differentiate offset. Ifoffset offset 1, a forgery occurs at frame. 5. EXPERIMENTAL RESULTS 5.1 Illustration of Locating Forgeries In Section 4, we show that the proposed method can locate one deletion correctly. However, the frame offset method is effective not only for one deletion, but also for multiple deletions. Here we demonstrate an example where a sentence only consists of numbers, as often appears in witness statements. As shown in Figure 8, three numbers are cropped away from the original sentence. The detected offsets of all frames in the doctored audio are shown in Figure 8(c). We observe that the frame offsets change at the 7th, 18th, and 47th frame. This means that some forgeries occur at these locations. From Figure 8, if the manipulations on the MP3 audio destroy frame segmentations of the previous encoding, the frame offset method would be able to locate those forgeries. After insertion, the doctored audio is separated into three segments. Obviously the three segments have different frame offsets. Figure 9 shows an example of insertion detection. It is shown that the method locates those forgeries very exactly. As two spliced parts often come from the different sources, they often have different frame offsets, so our method is also effective for detecting splicing. The case of substitution is illustrated in Figure 1. ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 8, No. S2, Article 35, Publication date: September 212.
12 35:12 R. Yang et al. 1 (a) waveform of original audio x 1 5 (b) waveform of doctored audio one two three four five six seven eight nine one three five six seven nine x (c) detect result of doctored audio detected offset different frames Fig. 8. Example of locating multiple deletions. Three numbers are cropped away from a series of numbers, shown as (a) and (b). (c) is the detection result of the doctored audio. Frame offsets change at the 7th, 18th, and 47th frames, which means there are forgeries at these frames. 5.2 Extensive Experiments Our experiments also include extensive tests of different types of audio clips. Our tested audio includes 64 speech clips (each 3 s long) and 64 music clips (each 3 s long). These original audio clips are in WAV format, 22.5 Hz, 16 bit, mono. We use LAME 3.97 to encode the audio clips into MP3 with bit rates of 32 bps, 64 bps, and 96 bps, respectively. Then each clip consists of 1142 frames. For each clip, we randomly select 1 frames and each frame performs 2 sample deletion and 2 sample insertion, respectively. So for each bit rate, we test our approach on 128 doctored frames with deletion and another 128 frames with insertion. We apply our method to these audio clips. We use the false positive error to measure the undoctored frames incorrectly identified as doctored, while the false negative error represents the doctored ones that are not detected. We denote the false positive error rate and false negative error rate as f p and f n, respectively. The accurate detection rate AR is calculated as follows. ( AR = 1 f ) p + f n 1% (6) 2 The test results for speech and music are shown in Table II and Table III, respectively. As we see, whether we are locating deletion or insertion in these audio frames, all accuracy rates are above 99%. We notice that the detection results of low bit rates are a little better than those of high bit ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 8, No. S2, Article 35, Publication date: September 212.
13 Exposing MP3 Audio Forgeries Using Frame Offsets 35:13 detected offset.5 (a) Original waveform 1 I don t thin so Insertion x 1 4 (b) Original waveform 2.5 I agree with it x 1 4 (c) Forgery waveform.5 I don t agree with it x 1 4 (d) Detect result of doctored audio different frames Fig. 9. Example of locating insertion. A ey word don t is inserted into a sentence, shown as (a) and (b). (c) is the detection result of the doctored audio. Frame offsets change at the 48th and 1th frames, which means there are forgeries at these frames. rates. This is due to MP3s with lower bit rates having stronger compression traces which means that the frame offset can be detected more accurately. The f p s of speech are higher than those of music, while the opposite is the case for f n s. This may be due to the presence of fewer silent samples in the music clips, and frame offset detection of silent portions introduces errors more easily. It is noted that the detection rate cannot achieve 1%. For some special cases our method will fail to locate forgeries. When the frame contains lots of zero samples, for example, one half, the correct offset cannot be detected via NAC, as shown in Figure 11. The actual offset of the frame is 2. However, the detected offset is 575. While applying different offsets, the number of zero samples varies rapidly, which leads to unstable NAC. 5.3 Sensitivity and Robustness In this subsection, we discuss the sensitivity and robustness of the proposed method against a variety of attac schemes Splicing at the Boundary. If the adversary is smart enough to splice or crop exactly multiple of 576 samples to achieve the exact boundary of one frame, will the detection method still wor? After generating the desired audio, the adversary only needs to adjust some (1 575) samples to match ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 8, No. S2, Article 35, Publication date: September 212.
14 35:14 R. Yang et al. (a) Original waveform I lie it Substitution x 1 4 (b) Original waveform I hate doing that x 1 4 (c) Forgery waveform.5.5 I lie doing that x 1 4 detected offset (d) Detect result of doctored audio different frames Fig. 1. Example of locating substitution. A ey word hate is replaced by lie, shown as (a) and (b). (c) is the detection result of the doctored audio. Frame offsets change at the 48th and 9th frames, which means there are forgeries at these frames. Table II. Detection Results for Speech Forgery Type bit rate f p f n AR deletion 32 bps.5%.3% 99.73% deletion 64 bps.9%.14% 99.48% deletion 96 bps 1.12%.34% 99.27% insertion 32 bps.51%.3% 99.73% insertion 64 bps.85%.2% 99.47% insertion 96 bps 1.1%.37% 99.31% Table III. Detection Results for Music Forgery Type bit rate f p f n AR deletion 32 bps.2%.27% 99.76% deletion 64 bps.27%.47% 99.63% deletion 96 bps.32%.61% 99.53% insertion 32 bps.16%.2% 99.82% insertion 64 bps.23%.42% 99.67% insertion 96 bps.28%.45% 99.63% the frame boundary. Because samples only last less than 575/441 =.13 s for a 44.1 Hz sampling rate, this adjustment would not affect the meaning of the desired audio. Thans to the 5% overlap framing method during the MP3 encoding, we can still find the trace of this forgery. We give a demonstration in Figure 12. Suppose that one forgery occurs at the boundary of frame. There exactly ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 8, No. S2, Article 35, Publication date: September 212.
15 Exposing MP3 Audio Forgeries Using Frame Offsets 35:15 1 (a)waveform of an undetectable frame.5 amplitude sample index (b) NAC result NAC frame offset Fig. 11. An example of fail case. Shown in (a) is the waveform of one frame with undetectable frame offset. Shown in (b) are the NACs via different frame offsets. 576 samples are cropped. The spectral of new frame + 1 will not have the quantization characteristic no matter with which offset, but frame and frame + 2 still have many troughs with the original offset Additive Noise. Additive noise may be added to the tampered speech to cover forgeries, and this presents a challenge for forgery detection. To investigate the robustness of the proposed scheme undergone with additive noise, a short speech clip consisting of 45 frames is tested. The audio samples of the 2th frame are added with white Gaussian noise of 3dB, as shown in Figure 13(a). Since both the 19th and 21st frames are 5% overlapping with the 2th frame, it means that the 19th and 21st frames are half doctored at the same time. Then we investigate the effect of additive noise on NAC. All frames are applied with offsets from to 575, and the corresponding NACs are recorded and plotted vertically, as shown in Figure 13(b). It is noted that all the plots have a significantly small value except those plots of the 18th, 19th, 2th, 21st, and 22nd frames. This means frame offsets of all frames except these five frames can be detected via NAC. Since there is not such a remarable decrease among the NACs of the 18th, 19th, 2th, 21st, and 22nd frames, the frame offsets of these five frames are undetectable and mared with a special value 1 as mentioned in Section 3. The detection result of the tampered speech is shown as Figure 13(c). From the detection result, it shows that the proposed method can resist locally added noise, which means that forgeries covered by noise can be located. However, if the noise is globally added after forgeries, all the frame offsets become undetectable and mared as 1. In this case, the proposed method is not able to locate the forgeries, but it still ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 8, No. S2, Article 35, Publication date: September 212.
16 35:16 R. Yang et al..2 (a) waveform Original audio amplitude Doctored audio (b) spectral 1 frame frame +1 frame +2 magnitude(db) Fig. 12. The case of splicing at the boundary. Shown in (a) is a waveform of audio whose 576 samples are cropped from the 1153rd sample. Shown in (b) is the spectral of the three frames of doctored audio. All the frames have the quantization characteristics except the middle frame. 1 (a) audio with additive noise NAC detected offset adding noise x 1 4 (b) NAC result of each frame different frame (c) detection result of each frame different frame Fig. 13. The effect of additive noise on NAC. Shown in (a) is the waveform of audio with partially additive noise. Shown in (b) are the NAC results of all frames. Shown in (c) is the detection result of frame offsets. indicates that the audio is abnormal and must be postprocessed. In this case, the audio is suspect and rejected as evidence Filtering. Another common way to cover forgeries is filtering the tampered signal. Here we test with a median filter, mean filter, and low-pass filter. The same speech clip as in the preceding section is selected for testing. Since the effect of different filters on NAC is similar, under the limitation of page range only the result of the median filter is illustrated. ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 8, No. S2, Article 35, Publication date: September 212.
17 Exposing MP3 Audio Forgeries Using Frame Offsets 35:17 1 (a) audio with filtering NAC detected offset 1.5 1filtering x 1 4 (b) NAC result of each frame different frame (c) detection result of each frame different frame Fig. 14. The effect of median filtering on NAC. Shown in (a) is the waveform of audio partially filtered. Shown in (b) are the NAC results of all frames. Shown in (c) is the detection result of frame offsets. First, the 2th frame of the audio signal is filtered by a median filter with length of 7, as shown in Figure 14(a). Since both 19th and 21st frames are 5% overlapping with the 2th frame, it means that the 19th and 21st frames are half filtered at the same time. Then NACs of all frames are investigated and the proposed detection method is applied to the whole speech clip. As shown in Figure 14(b), similar to the case of adding noise, the plots of NACs of the 18th, 19th, 2th, 21st, and 22nd frames have no significant decreases, while the plots of other frames have an obviously small value. From the detection result at Figure 14(c), it shows that frame offsets of the 18th, 19th, 2th, 21st, and 22nd frames are undetectable, but other frames have a obvious offset as. It means that the proposed method can indicate the filtered portion of an audio signal if the signal is partially filtered. However, similar as the case of adding noise, if the audio signal is globally filtered, the proposed method could not locate forgeries automatically, but still indicates the filtered signal has been manipulated. 6. DISCUSSIONS AND CONCLUSIONS 6.1 Extension to Other Formats Although we only investigate audio of MP3 format, the idea of locating forgeries via the frame offset is suitable for audio of other compressed formats, such as AAC, WMA, and OGG Vorbis. Since the generation of audio with these formats is performed frame by frame, the frame offset of each frame is achievable. To confirm this, we use audio signal encoded with AAC for testing. Notice that the length of each frame in AAC is 124, and the frequency spectral is also of MDCT coefficients. The tool we utilize to encode and decode audio signals is FAAC [FAA]. The test clip consists of 4 frames audio, and its sampling rate is 44.1 Hz. The encoding parameters of FAAC are 96 bps, mono. First, we investigate whether the AAC audio has the quantization characteristic. Offsets 1,, and +1 are applied to the 9th frame, respectively. For each offset, 124 MDCT coefficients can be obtained. Then we plot these coefficients in a logarithmic representation, as shown in Figures 15(a), (b), and (c). It is obvious that ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 8, No. S2, Article 35, Publication date: September 212.
18 35:18 R. Yang et al. magnitude (db) magnitude (db) magnitude (db) 4 2 (a) offset = frequency index (b) offset = frequency index (c) offset = frequency index NAC 8 6 (d) NAC result of audio encoded with AAC frame offset Fig. 15. Quantization characteristic of AAC. Subfigure (a), (b), (c) are corresponding to spectral of the 9th frame with offsets 1,, and +1, respectively. Similar with the case of MP3, the quantization characteristic shows up when only with the matching offset (). Subfigure (d) shows the NAC result of 9th frame with offsets 1 to 25. only Figure 15(b) shows the quantization characteristic. Furthermore, we apply offsets 1 to 25 on the frame, and obtain the corresponding NAC results, as shown in Figure 15(d). A period of 124 can be observed. Within the length of the frame, there is only one matching offset, and its NAC is discriminative from other 123 NACs. Now we are in a step of checing AAC audio forgeries. The audio with 4 frames has totally 496 samples. We delete samples from index 1 to 15. Then we apply the proposed method to the doctored AAC audio. Each frame generates 124 NACs, and the matching offset is recognized as the one corresponding to minimize NAC. The detection result is shown as Figure 16. Therefore we show that the proposed method can detect forgeries on AAC audio. Our method is also able to extend to other frame-based encoders, since applying the matching offset is easier to approximate with the first-encoding spectral than using other shifted offsets. What we must remember is the procedure of extracting spectral varying from different encoders, since they use different frame length and windows. 6.2 Conclusions In this article, we propose a method to expose MPEG audio forgeries using frame offsets. The main contributions of this wor are as follows. First, according to our best nowledge, this is the first piece of wor on detecting forgeries on MP3 audio. It extends the research topics of forgery detection. Second, this wor illustrates that MDCT coefficients can reflect forgery traces very well for MPEG audio. Via theoretical analysis and extensive experiments, we show that NAC is a reliable feature to retrieve ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 8, No. S2, Article 35, Publication date: September 212.
19 Exposing MP3 Audio Forgeries Using Frame Offsets 35:19 detected offset (a) original audio x 1 4 (b) doctored audio x 1 4 (c) detection result frame index Fig. 16. Forgeries detection result of AAC audio. frame offsets. Based on the fact that most common forgeries change frame offsets of audio, the proposed method can locate these forgeries effectively. Extensive experimental results show that the proposed method has very good performance on both speech and music. All the accuracy rates are above 99%, which shows the effectiveness of our proposed method. Another advantage of the proposed method is the simplicity in computation. We only need to investigate the MDCT coefficients of the audio. However, if audio is transcoded between different compressed formats, the frame offset is difficult to obtain and the proposed method will fail in this case. It is noted that at a high bit rate such as 128 bps the NAC method is not very suitable for retrieving frame offsets, since zero coefficients are few at high bit rates. So in the future, we will focus on obtaining the frame offset when transcoding and at high bit rates. ACKNOWLEDGMENTS The authors would lie to than the anonymous reviewers for their constructive comments. Their suggestions will be very helpful for our future wor. REFERENCES BOEHM, R. AND WESTFELD, A. 24. Statistical characterisation of mp3 encoders for steganalysis. In Proceedings of the 6th ACM Multimedia and Security Worshop. ACM. FAAC Freeware advanced audio coder. FARID, H Detecting digital forgeries using bispectral analysis. MIT AI Memo AIM-1657, MIT. FU, D., SHI, Y., AND SU, W. 27. A generalized benford s law for jpeg coefficients and its applications in image forensics. In Proceedings of SPIE Conference on Security, Steganography, and Watermaring of Multimedia Contents. GRIGORAS, C. 25. Digital audio recording analysis: The electric networ frequency (enf) criterion. Int. J. Speech Lang. Law 2, 1, HERRE, J. AND SCHUG, M. 2. Analysis of decompressed audio The inverse decoder. In Proceedings of the 19th AES Convention. HERRE, J., SCHUG, M., AND GEIGER, R. 22. Analysing decompressed audio with the inverse decoder Towards an operative algorithm. In Proceedings of the 112th AES Convention. ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 8, No. S2, Article 35, Publication date: September 212.
20 35:2 R. Yang et al. ISO Iso/iec international standard is Information technology Coding of moving pictures and associated audio for digital storage media up to about 1.5 Mbit/s. detail.htm?csnumber= KRAETZER, C., OERMANN, A., DITTMANN, J., AND LANG, A. 27. Digital audio forensics: A first practical evaluation on microphone and environment classification. In Proceedings of the 9th ACM Multimedia and Security Worshop. LAME Mp3 encoder. LUKAS, J. AND FRIDRICH, J. 23. Estimation of primary quantization matrix in double compressed jpeg images. In Proceedings of the Digital Forensic Research Worshop. PAINTER, T. AND SPANIAS, A. 2. Perceptual coding of digital audio. Proc. IEEE 88, 4, POPESCU, A. AND FARID, H. 24. Statistical tools for digital forensics. In Proceedings of the 6th International Worshop on Information Hiding. QU, Z., LUO, W., AND HUANG, J. 28. A convolutive mixing model for shift double jpeg compression with application to passive image authentication. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. WANG, Y. AND VELERMO, M. 23. Modified discrete cosine transform Its implications for audio coding and error concealment. AES J. 51, 1, WANG, Y., YAROSLAVSKY, L., VILERMO, M., AND VAANANEN, M. 2. Some peculiar properties of the mdct. In Proceedings of the 16th IFIP World Computer Congress. YANG,R.,QU,Z.,AND HUANG, J. 28. Detecting digital audio forgeries by checing frame offsets. In Proceedings of the 1th ACM Multimedia and Security Worshop. ACM. Received November 21; revised July 211; accepted August 211 ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 8, No. S2, Article 35, Publication date: September 212.
Identifying Compression History of Wave Audio and Its Applications
Identifying Compression History of Wave Audio and Its Applications DA LUO, WEIQI LUO, RUI YANG, Sun Yat-sen University JIWU HUANG, Shenzhen University Audio signal is sometimes stored and/or processed
More informationVoIP Forgery Detection
VoIP Forgery Detection Satish Tummala, Yanxin Liu and Qingzhong Liu Department of Computer Science Sam Houston State University Huntsville, TX, USA Emails: sct137@shsu.edu; yanxin@shsu.edu; liu@shsu.edu
More informationPolitecnico di Torino. Porto Institutional Repository
Politecnico di Torino Porto Institutional Repository [Proceeding] Detection and classification of double compressed MP3 audio tracks Original Citation: Tiziano Bianchi;Alessia De Rosa;Marco Fontani;Giovanni
More informationAudio-coding standards
Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system.
More informationIMAGE COMPRESSION USING ANTI-FORENSICS METHOD
IMAGE COMPRESSION USING ANTI-FORENSICS METHOD M.S.Sreelakshmi and D. Venkataraman Department of Computer Science and Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India mssreelakshmi@yahoo.com d_venkat@cb.amrita.edu
More informationA Novel Method for Block Size Forensics Based on Morphological Operations
A Novel Method for Block Size Forensics Based on Morphological Operations Weiqi Luo, Jiwu Huang, and Guoping Qiu 2 Guangdong Key Lab. of Information Security Technology Sun Yat-Sen University, Guangdong,
More informationA deblocking filter with two separate modes in block-based video coding
A deblocing filter with two separate modes in bloc-based video coding Sung Deu Kim Jaeyoun Yi and Jong Beom Ra Dept. of Electrical Engineering Korea Advanced Institute of Science and Technology 7- Kusongdong
More informationELL 788 Computational Perception & Cognition July November 2015
ELL 788 Computational Perception & Cognition July November 2015 Module 11 Audio Engineering: Perceptual coding Coding and decoding Signal (analog) Encoder Code (Digital) Code (Digital) Decoder Signal (analog)
More informationAudio-coding standards
Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system.
More informationModeling of an MPEG Audio Layer-3 Encoder in Ptolemy
Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy Patrick Brown EE382C Embedded Software Systems May 10, 2000 $EVWUDFW MPEG Audio Layer-3 is a standard for the compression of high-quality digital audio.
More information5: Music Compression. Music Coding. Mark Handley
5: Music Compression Mark Handley Music Coding LPC-based codecs model the sound source to achieve good compression. Works well for voice. Terrible for music. What if you can t model the source? Model the
More informationEXPOSING THE DOUBLE COMPRESSION IN MP3 AUDIO BY FREQUENCY VIBRATION. Tianzhuo Wang, Xiangwei Kong, Yanqing Guo, Bo Wang
EXPOSIG THE DOUBLE COMPRESSIO I MP3 AUDIO BY FREQUECY VIBRATIO Tianzhuo Wang, Xiangwei Kong, Yanqing Guo, Bo Wang School of Information and Communication Engineering Dalian University of Technology, Dalian,
More informationThe Analysis and Detection of Double JPEG2000 Compression Based on Statistical Characterization of DWT Coefficients
Available online at www.sciencedirect.com Energy Procedia 17 (2012 ) 623 629 2012 International Conference on Future Electrical Power and Energy Systems The Analysis and Detection of Double JPEG2000 Compression
More informationPerceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding
Perceptual Coding Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding Part II wrap up 6.082 Fall 2006 Perceptual Coding, Slide 1 Lossless vs.
More informationCompressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved.
Compressed Audio Demystified Why Music Producers Need to Care About Compressed Audio Files Download Sales Up CD Sales Down High-Definition hasn t caught on yet Consumers don t seem to care about high fidelity
More informationData Hiding in Video
Data Hiding in Video J. J. Chae and B. S. Manjunath Department of Electrical and Computer Engineering University of California, Santa Barbara, CA 9316-956 Email: chaejj, manj@iplab.ece.ucsb.edu Abstract
More informationPacket Loss Concealment for Audio Streaming based on the GAPES and MAPES Algorithms
26 IEEE 24th Convention of Electrical and Electronics Engineers in Israel Packet Loss Concealment for Audio Streaming based on the GAPES and MAPES Algorithms Hadas Ofir and David Malah Department of Electrical
More informationDetection of Montage in Lossy Compressed Digital Audio Recordings
ARCHIVES OF ACOUSTICS Vol.39,No.1, pp.65 72(2014) Copyright c 2014byPAN IPPT DOI: 10.2478/aoa-2014-0007 Detection of Montage in Lossy Compressed Digital Audio Recordings Rafał KORYCKI Institute of Radioelectronics,
More informationVideo Inter-frame Forgery Identification Based on Optical Flow Consistency
Sensors & Transducers 24 by IFSA Publishing, S. L. http://www.sensorsportal.com Video Inter-frame Forgery Identification Based on Optical Flow Consistency Qi Wang, Zhaohong Li, Zhenzhen Zhang, Qinglong
More informationSPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL
SPREAD SPECTRUM WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL 1 Yüksel Tokur 2 Ergun Erçelebi e-mail: tokur@gantep.edu.tr e-mail: ercelebi@gantep.edu.tr 1 Gaziantep University, MYO, 27310, Gaziantep,
More informationModule 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur
Module 9 AUDIO CODING Lesson 29 Transform and Filter banks Instructional Objectives At the end of this lesson, the students should be able to: 1. Define the three layers of MPEG-1 audio coding. 2. Define
More informationVideo Compression Method for On-Board Systems of Construction Robots
Video Compression Method for On-Board Systems of Construction Robots Andrei Petukhov, Michael Rachkov Moscow State Industrial University Department of Automatics, Informatics and Control Systems ul. Avtozavodskaya,
More informationQUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose
QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose Department of Electrical and Computer Engineering University of California,
More informationAdvanced Digital Image Forgery Detection by Using SIFT
RESEARCH ARTICLE OPEN ACCESS Advanced Digital Image Forgery Detection by Using SIFT Priyanka G. Gomase, Nisha R. Wankhade Department of Information Technology, Yeshwantrao Chavan College of Engineering
More informationA Novel Statistical Distortion Model Based on Mixed Laplacian and Uniform Distribution of Mpeg-4 FGS
A Novel Statistical Distortion Model Based on Mixed Laplacian and Uniform Distribution of Mpeg-4 FGS Xie Li and Wenjun Zhang Institute of Image Communication and Information Processing, Shanghai Jiaotong
More informationFace Hallucination Based on Eigentransformation Learning
Advanced Science and Technology etters, pp.32-37 http://dx.doi.org/10.14257/astl.2016. Face allucination Based on Eigentransformation earning Guohua Zou School of software, East China University of Technology,
More informationJPEG Copy Paste Forgery Detection Using BAG Optimized for Complex Images
JPEG Copy Paste Forgery Detection Using BAG Optimized for Complex Images Dessalegn Atnafu AYALNEH*, Hyoung Joong KIM*, Yong Soo CHOI** *CIST (Center for Information Security Technologies), Korea University
More informationPrinciples of Audio Coding
Principles of Audio Coding Topics today Introduction VOCODERS Psychoacoustics Equal-Loudness Curve Frequency Masking Temporal Masking (CSIT 410) 2 Introduction Speech compression algorithm focuses on exploiting
More informationCISC 7610 Lecture 3 Multimedia data and data formats
CISC 7610 Lecture 3 Multimedia data and data formats Topics: Perceptual limits of multimedia data JPEG encoding of images MPEG encoding of audio MPEG and H.264 encoding of video Multimedia data: Perceptual
More informationUser-Friendly Sharing System using Polynomials with Different Primes in Two Images
User-Friendly Sharing System using Polynomials with Different Primes in Two Images Hung P. Vo Department of Engineering and Technology, Tra Vinh University, No. 16 National Road 53, Tra Vinh City, Tra
More informationNew Results in Low Bit Rate Speech Coding and Bandwidth Extension
Audio Engineering Society Convention Paper Presented at the 121st Convention 2006 October 5 8 San Francisco, CA, USA This convention paper has been reproduced from the author's advance manuscript, without
More informationEstimating MP3PRO Encoder Parameters From Decoded Audio
Estimating MP3PRO Encoder Parameters From Decoded Audio Paul Bießmann 1, Daniel Gärtner 1, Christian Dittmar 1, Patrick Aichroth 1, Michael Schnabel 2, Gerald Schuller 1,2, and Ralf Geiger 3 1 Semantic
More informationAdaptive Spatial Steganography Based on the Correlation of Wavelet Coefficients for Digital Images in Spatial Domain Ningbo Li, Pan Feng, Liu Jia
216 International Conference on Information Engineering and Communications Technology (IECT 216) ISBN: 978-1-69-37- Adaptive Spatial Steganography Based on the Correlation of Wavelet Coefficients for Digital
More informationDYADIC WAVELETS AND DCT BASED BLIND COPY-MOVE IMAGE FORGERY DETECTION
DYADIC WAVELETS AND DCT BASED BLIND COPY-MOVE IMAGE FORGERY DETECTION Ghulam Muhammad*,1, Muhammad Hussain 2, Anwar M. Mirza 1, and George Bebis 3 1 Department of Computer Engineering, 2 Department of
More informationAudio Compression. Audio Compression. Absolute Threshold. CD quality audio:
Audio Compression Audio Compression CD quality audio: Sampling rate = 44 KHz, Quantization = 16 bits/sample Bit-rate = ~700 Kb/s (1.41 Mb/s if 2 channel stereo) Telephone-quality speech Sampling rate =
More information2.4 Audio Compression
2.4 Audio Compression 2.4.1 Pulse Code Modulation Audio signals are analog waves. The acoustic perception is determined by the frequency (pitch) and the amplitude (loudness). For storage, processing and
More informationMultimedia Communications. Audio coding
Multimedia Communications Audio coding Introduction Lossy compression schemes can be based on source model (e.g., speech compression) or user model (audio coding) Unlike speech, audio signals can be generated
More informationPerceptual Pre-weighting and Post-inverse weighting for Speech Coding
Perceptual Pre-weighting and Post-inverse weighting for Speech Coding Niranjan Shetty and Jerry D. Gibson Department of Electrical and Computer Engineering University of California, Santa Barbara, CA,
More informationDetection and localization of double compression in MP3 audio tracks
Bianchi et al. EURASIP Journal on Information Security 214, 214:1 http://jis.eurasipjournals.com/content/214/1/1 RESEARCH Detection and localization of double compression in MP3 audio tracks Tiziano Bianchi
More informationOptical Storage Technology. MPEG Data Compression
Optical Storage Technology MPEG Data Compression MPEG-1 1 Audio Standard Moving Pictures Expert Group (MPEG) was formed in 1988 to devise compression techniques for audio and video. It first devised the
More informationDeliverable D6.3 Release of publicly available datasets and software tools
Grant Agreement No. 268478 Deliverable D6.3 Release of publicly available datasets and software tools Lead partner for this deliverable: PoliMI Version: 1.0 Dissemination level: Public April 29, 2013 Contents
More informationMPEG-1. Overview of MPEG-1 1 Standard. Introduction to perceptual and entropy codings
MPEG-1 Overview of MPEG-1 1 Standard Introduction to perceptual and entropy codings Contents History Psychoacoustics and perceptual coding Entropy coding MPEG-1 Layer I/II Layer III (MP3) Comparison and
More informationCHAPTER 6 Audio compression in practice
CHAPTER 6 Audio compression in practice In earlier chapters we have seen that digital sound is simply an array of numbers, where each number is a measure of the air pressure at a particular time. This
More informationAppendix 4. Audio coding algorithms
Appendix 4. Audio coding algorithms 1 Introduction The main application of audio compression systems is to obtain compact digital representations of high-quality (CD-quality) wideband audio signals. Typically
More informationCompression of RADARSAT Data with Block Adaptive Wavelets Abstract: 1. Introduction
Compression of RADARSAT Data with Block Adaptive Wavelets Ian Cumming and Jing Wang Department of Electrical and Computer Engineering The University of British Columbia 2356 Main Mall, Vancouver, BC, Canada
More informationA Detailed look of Audio Steganography Techniques using LSB and Genetic Algorithm Approach
www.ijcsi.org 402 A Detailed look of Audio Steganography Techniques using LSB and Genetic Algorithm Approach Gunjan Nehru 1, Puja Dhar 2 1 Department of Information Technology, IEC-Group of Institutions
More information(JBE Vol. 23, No. 6, November 2018) Detection of Frame Deletion Using Convolutional Neural Network. Abstract
(JBE Vol. 23, No. 6, November 2018) (Regular Paper) 23 6, 2018 11 (JBE Vol. 23, No. 6, November 2018) https://doi.org/10.5909/jbe.2018.23.6.886 ISSN 2287-9137 (Online) ISSN 1226-7953 (Print) CNN a), a),
More informationA Scalable Watermarking Scheme for the Scalable Audio Coder
1 A Scalable Watermaring Scheme for the Scalable Audio Coder 1, Z. Li, 1 Q.B. Sun, Y. Lian and 1 R.S. Yu 1 Institute for Infocomm Research (I R) 1 Heng Mui Keng Terrace, Singapore 119613 Department of
More informationMpeg 1 layer 3 (mp3) general overview
Mpeg 1 layer 3 (mp3) general overview 1 Digital Audio! CD Audio:! 16 bit encoding! 2 Channels (Stereo)! 44.1 khz sampling rate 2 * 44.1 khz * 16 bits = 1.41 Mb/s + Overhead (synchronization, error correction,
More informationImage Error Concealment Based on Watermarking
Image Error Concealment Based on Watermarking Shinfeng D. Lin, Shih-Chieh Shie and Jie-Wei Chen Department of Computer Science and Information Engineering,National Dong Hwa Universuty, Hualien, Taiwan,
More informationChapter 14 MPEG Audio Compression
Chapter 14 MPEG Audio Compression 14.1 Psychoacoustics 14.2 MPEG Audio 14.3 Other Commercial Audio Codecs 14.4 The Future: MPEG-7 and MPEG-21 14.5 Further Exploration 1 Li & Drew c Prentice Hall 2003 14.1
More informationCHAPTER 5 AUDIO WATERMARKING SCHEME INHERENTLY ROBUST TO MP3 COMPRESSION
CHAPTER 5 AUDIO WATERMARKING SCHEME INHERENTLY ROBUST TO MP3 COMPRESSION In chapter 4, SVD based watermarking schemes are proposed which met the requirement of imperceptibility, having high payload and
More informationCompression transparent low-level description of audio signals
University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 25 Compression transparent low-level description of audio signals Jason
More informationDOI: /jos Tel/Fax: by Journal of Software. All rights reserved. , )
ISSN 1000-9825, CODEN RUXUEW E-mail: jos@iscasaccn Journal of Software, Vol17, No2, February 2006, pp315 324 http://wwwjosorgcn DOI: 101360/jos170315 Tel/Fax: +86-10-62562563 2006 by Journal of Software
More informationCompressive Sensing for Multimedia. Communications in Wireless Sensor Networks
Compressive Sensing for Multimedia 1 Communications in Wireless Sensor Networks Wael Barakat & Rabih Saliba MDDSP Project Final Report Prof. Brian L. Evans May 9, 2008 Abstract Compressive Sensing is an
More informationA Novel Audio Watermarking Algorithm Based On Reduced Singular Value Decomposition
A Novel Audio Watermarking Algorithm Based On Reduced Singular Value Decomposition Jian Wang 1, Ron Healy 2, Joe Timoney 3 Computer Science Department NUI Maynooth, Co. Kildare, Ireland jwang@cs.nuim.ie
More informationCopy-Move Forgery Detection using DCT and SIFT
Copy-Move Forgery Detection using DCT and SIFT Amanpreet Kaur Department of Computer Science and Engineering, Lovely Professional University, Punjab, India. Richa Sharma Department of Computer Science
More informationThe Steganography In Inactive Frames Of Voip
The Steganography In Inactive Frames Of Voip This paper describes a novel high-capacity steganography algorithm for embedding data in the inactive frames of low bit rate audio streams encoded by G.723.1
More informationReduced Time Complexity for Detection of Copy-Move Forgery Using Discrete Wavelet Transform
Reduced Time Complexity for of Copy-Move Forgery Using Discrete Wavelet Transform Saiqa Khan Computer Engineering Dept., M.H Saboo Siddik College Of Engg., Mumbai, India Arun Kulkarni Information Technology
More informationIn this article, we present and analyze
[exploratory DSP] Manuel Richey and Hossein Saiedian Compressed Two s Complement Data s Provide Greater Dynamic Range and Improved Noise Performance In this article, we present and analyze a new family
More informationPattern based Residual Coding for H.264 Encoder *
Pattern based Residual Coding for H.264 Encoder * Manoranjan Paul and Manzur Murshed Gippsland School of Information Technology, Monash University, Churchill, Vic-3842, Australia E-mail: {Manoranjan.paul,
More informationPerceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.
Perceptual coding Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual encoders, however, have been designed for the compression of general
More informationBlock Mean Value Based Image Perceptual Hashing for Content Identification
Block Mean Value Based Image Perceptual Hashing for Content Identification Abstract. Image perceptual hashing has been proposed to identify or authenticate image contents in a robust way against distortions
More informationITNP80: Multimedia! Sound-II!
Sound compression (I) Compression of sound data requires different techniques from those for graphical data Requirements are less stringent than for video data rate for CD-quality audio is much less than
More informationCHAPTER 4 REVERSIBLE IMAGE WATERMARKING USING BIT PLANE CODING AND LIFTING WAVELET TRANSFORM
74 CHAPTER 4 REVERSIBLE IMAGE WATERMARKING USING BIT PLANE CODING AND LIFTING WAVELET TRANSFORM Many data embedding methods use procedures that in which the original image is distorted by quite a small
More informationA Comparison of Still-Image Compression Standards Using Different Image Quality Metrics and Proposed Methods for Improving Lossy Image Quality
A Comparison of Still-Image Compression Standards Using Different Image Quality Metrics and Proposed Methods for Improving Lossy Image Quality Multidimensional DSP Literature Survey Eric Heinen 3/21/08
More information19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007
19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 SUBJECTIVE AND OBJECTIVE QUALITY EVALUATION FOR AUDIO WATERMARKING BASED ON SINUSOIDAL AMPLITUDE MODULATION PACS: 43.10.Pr, 43.60.Ek
More informationPractical methods for digital video forensic authentication
Practical methods for digital video forensic authentication Jinhua Zeng, * Shaopei Shi, Yan Li, Qimeng Lu, Xiulian Qiu Institute of Forensic Science, Ministry of Justice, Shanghai 200063, China *Corresponding
More informationDWT-SVD based Multiple Watermarking Techniques
International Journal of Engineering Science Invention (IJESI) ISSN (Online): 2319 6734, ISSN (Print): 2319 6726 www.ijesi.org PP. 01-05 DWT-SVD based Multiple Watermarking Techniques C. Ananth 1, Dr.M.Karthikeyan
More informationCopy Move Forgery using Hu s Invariant Moments and Log-Polar Transformations
Copy Move Forgery using Hu s Invariant Moments and Log-Polar Transformations Tejas K, Swathi C, Rajesh Kumar M, Senior member, IEEE School of Electronics Engineering Vellore Institute of Technology Vellore,
More informationComparative Analysis of 2-Level and 4-Level DWT for Watermarking and Tampering Detection
International Journal of Latest Engineering and Management Research (IJLEMR) ISSN: 2455-4847 Volume 1 Issue 4 ǁ May 2016 ǁ PP.01-07 Comparative Analysis of 2-Level and 4-Level for Watermarking and Tampering
More informationEfficient MPEG-2 to H.264/AVC Intra Transcoding in Transform-domain
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Efficient MPEG- to H.64/AVC Transcoding in Transform-domain Yeping Su, Jun Xin, Anthony Vetro, Huifang Sun TR005-039 May 005 Abstract In this
More informationCompression Part 2 Lossy Image Compression (JPEG) Norm Zeck
Compression Part 2 Lossy Image Compression (JPEG) General Compression Design Elements 2 Application Application Model Encoder Model Decoder Compression Decompression Models observe that the sensors (image
More informationScalable Perceptual and Lossless Audio Coding based on MPEG-4 AAC
Scalable Perceptual and Lossless Audio Coding based on MPEG-4 AAC Ralf Geiger 1, Gerald Schuller 1, Jürgen Herre 2, Ralph Sperschneider 2, Thomas Sporer 1 1 Fraunhofer IIS AEMT, Ilmenau, Germany 2 Fraunhofer
More informationUniversity of Mustansiriyah, Baghdad, Iraq
Volume 5, Issue 9, September 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Audio Compression
More informationBoth LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.
Perceptual coding Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual encoders, however, have been designed for the compression of general
More informationAdaptive Quantization for Video Compression in Frequency Domain
Adaptive Quantization for Video Compression in Frequency Domain *Aree A. Mohammed and **Alan A. Abdulla * Computer Science Department ** Mathematic Department University of Sulaimani P.O.Box: 334 Sulaimani
More informationA NEW DCT-BASED WATERMARKING METHOD FOR COPYRIGHT PROTECTION OF DIGITAL AUDIO
International journal of computer science & information Technology (IJCSIT) Vol., No.5, October A NEW DCT-BASED WATERMARKING METHOD FOR COPYRIGHT PROTECTION OF DIGITAL AUDIO Pranab Kumar Dhar *, Mohammad
More informationLecture 16 Perceptual Audio Coding
EECS 225D Audio Signal Processing in Humans and Machines Lecture 16 Perceptual Audio Coding 2012-3-14 Professor Nelson Morgan today s lecture by John Lazzaro www.icsi.berkeley.edu/eecs225d/spr12/ Hero
More informationFundamentals of Perceptual Audio Encoding. Craig Lewiston HST.723 Lab II 3/23/06
Fundamentals of Perceptual Audio Encoding Craig Lewiston HST.723 Lab II 3/23/06 Goals of Lab Introduction to fundamental principles of digital audio & perceptual audio encoding Learn the basics of psychoacoustic
More informationAudio Coding and MP3
Audio Coding and MP3 contributions by: Torbjørn Ekman What is Sound? Sound waves: 20Hz - 20kHz Speed: 331.3 m/s (air) Wavelength: 165 cm - 1.65 cm 1 Analogue audio frequencies: 20Hz - 20kHz mono: x(t)
More informationCompression of Stereo Images using a Huffman-Zip Scheme
Compression of Stereo Images using a Huffman-Zip Scheme John Hamann, Vickey Yeh Department of Electrical Engineering, Stanford University Stanford, CA 94304 jhamann@stanford.edu, vickey@stanford.edu Abstract
More informationResearch Article Improvements in Geometry-Based Secret Image Sharing Approach with Steganography
Hindawi Publishing Corporation Mathematical Problems in Engineering Volume 2009, Article ID 187874, 11 pages doi:10.1155/2009/187874 Research Article Improvements in Geometry-Based Secret Image Sharing
More informationAN IMPROVISED LOSSLESS DATA-HIDING MECHANISM FOR IMAGE AUTHENTICATION BASED HISTOGRAM MODIFICATION
AN IMPROVISED LOSSLESS DATA-HIDING MECHANISM FOR IMAGE AUTHENTICATION BASED HISTOGRAM MODIFICATION Shaik Shaheena 1, B. L. Sirisha 2 VR Siddhartha Engineering College, Vijayawada, Krishna, Andhra Pradesh(520007),
More informationDRA AUDIO CODING STANDARD
Applied Mechanics and Materials Online: 2013-06-27 ISSN: 1662-7482, Vol. 330, pp 981-984 doi:10.4028/www.scientific.net/amm.330.981 2013 Trans Tech Publications, Switzerland DRA AUDIO CODING STANDARD Wenhua
More informationRobust Steganography Using Texture Synthesis
Robust Steganography Using Texture Synthesis Zhenxing Qian 1, Hang Zhou 2, Weiming Zhang 2, Xinpeng Zhang 1 1. School of Communication and Information Engineering, Shanghai University, Shanghai, 200444,
More informationConfusion/Diffusion Capabilities of Some Robust Hash Functions
Confusion/Diffusion Capabilities of Some Robust Hash Functions Baris Coskun Department of Electrical and Computer Engineering Polytechnic University Brooklyn, NY 24 Email: baris@isis.poly.edu Nasir Memon
More informationImage Classification for JPEG Compression
Image Classification for Compression Jevgenij Tichonov Vilnius University, Institute of Mathematics and Informatics Akademijos str. 4 LT-08663, Vilnius jevgenij.tichonov@gmail.com Olga Kurasova Vilnius
More informationImage Tampering Detection Using Methods Based on JPEG Compression Artifacts: A Real-Life Experiment
Image Tampering Detection Using Methods Based on JPEG Compression Artifacts: A Real-Life Experiment ABSTRACT Babak Mahdian Institute of Information Theory and Automation of the ASCR Pod Vodarenskou vezi
More informationCOMPARISONS OF DCT-BASED AND DWT-BASED WATERMARKING TECHNIQUES
COMPARISONS OF DCT-BASED AND DWT-BASED WATERMARKING TECHNIQUES H. I. Saleh 1, M. E. Elhadedy 2, M. A. Ashour 1, M. A. Aboelsaud 3 1 Radiation Engineering Dept., NCRRT, AEA, Egypt. 2 Reactor Dept., NRC,
More informationHigh Capacity Reversible Watermarking Scheme for 2D Vector Maps
Scheme for 2D Vector Maps 1 Information Management Department, China National Petroleum Corporation, Beijing, 100007, China E-mail: jxw@petrochina.com.cn Mei Feng Research Institute of Petroleum Exploration
More informationSo, what is data compression, and why do we need it?
In the last decade we have been witnessing a revolution in the way we communicate 2 The major contributors in this revolution are: Internet; The explosive development of mobile communications; and The
More informationWorkshop W14 - Audio Gets Smart: Semantic Audio Analysis & Metadata Standards
Workshop W14 - Audio Gets Smart: Semantic Audio Analysis & Metadata Standards Jürgen Herre for Integrated Circuits (FhG-IIS) Erlangen, Germany Jürgen Herre, hrr@iis.fhg.de Page 1 Overview Extracting meaning
More informationFILE CONVERSION AFTERMATH: ANALYSIS OF AUDIO FILE STRUCTURE FORMAT
FILE CONVERSION AFTERMATH: ANALYSIS OF AUDIO FILE STRUCTURE FORMAT Abstract JENNIFER L. SANTOS 1 JASMIN D. NIGUIDULA Technological innovation has brought a massive leap in data processing. As information
More informationTowards a Telltale Watermarking Technique for Tamper-Proofing
Towards a Telltale Watermarking Technique for Tamper-Proofing Deepa Kundur and Dimitrios Hatzinakos 10 King s College Road Department of Electrical and Computer Engineering University of Toronto Toronto,
More informationSource Coding Basics and Speech Coding. Yao Wang Polytechnic University, Brooklyn, NY11201
Source Coding Basics and Speech Coding Yao Wang Polytechnic University, Brooklyn, NY1121 http://eeweb.poly.edu/~yao Outline Why do we need to compress speech signals Basic components in a source coding
More informationData Hiding in Binary Text Documents 1. Q. Mei, E. K. Wong, and N. Memon
Data Hiding in Binary Text Documents 1 Q. Mei, E. K. Wong, and N. Memon Department of Computer and Information Science Polytechnic University 5 Metrotech Center, Brooklyn, NY 11201 ABSTRACT With the proliferation
More informationMPEG-4 Structured Audio Systems
MPEG-4 Structured Audio Systems Mihir Anandpara The University of Texas at Austin anandpar@ece.utexas.edu 1 Abstract The MPEG-4 standard has been proposed to provide high quality audio and video content
More informationRobust biometric image watermarking for fingerprint and face template protection
Robust biometric image watermarking for fingerprint and face template protection Mayank Vatsa 1, Richa Singh 1, Afzel Noore 1a),MaxM.Houck 2, and Keith Morris 2 1 West Virginia University, Morgantown,
More informationExcerpt from "Art of Problem Solving Volume 1: the Basics" 2014 AoPS Inc.
Chapter 5 Using the Integers In spite of their being a rather restricted class of numbers, the integers have a lot of interesting properties and uses. Math which involves the properties of integers is
More information