STEGANOGRAPHY is the technique that conceals secret. Spec-ResNet: A General Audio Steganalysis scheme based on Deep Residual Network of Spectrogram

Size: px
Start display at page:

Download "STEGANOGRAPHY is the technique that conceals secret. Spec-ResNet: A General Audio Steganalysis scheme based on Deep Residual Network of Spectrogram"

Transcription

1 1 Spec-ResNet: A General Audio Steganalysis scheme based on Deep Residual Network of Spectrogram Yanzhen Ren, Member, IEEE, Dengkai Liu, Qiaochu Xiong, Jianming Fu, and Lina Wang arxiv: v1 [cs.mm] 21 Jan 219 Abstract The widespread application of audio and video communication technology make the compressed audio data flowing over the Internet, and make it become an important carrier for covert communication. There are many steganographic schemes emerged in the mainstream audio compression data, such as AAC and MP3, followed by many steganalysis schemes. However, these steganalysis schemes are only effective in the specific embedded domain. In this paper, a general steganalysis scheme Spec- ResNet (Deep Residual Network of Spectrogram) is proposed to detect the steganography schemes of different embedding domain for AAC and MP3. The basic idea is that the steganographic modification of different embedding domain will all introduce the change of the decoded audio signal. In this paper, the spectrogram, which is the visual representation of the spectrum of frequencies of audio signal, is adopted as the input of the feature network to extract the universal features introduced by steganography schemes; Deep Neural Network Spec-ResNet is welldesigned to represent the steganalysis feature; and the features extracted from different spectrogram windows are combined to fully capture the steganalysis features. The experiment results show that the proposed scheme has good detection accuracy and generality. The proposed scheme has better detection accuracy for three different AAC steganographic schemes and MP3Stego than the state-of-arts steganalysis schemes which are based on traditional hand-crafted or CNN-based feature. To the best of our knowledge, the audio steganalysis scheme based on the spectrogram and deep residual network is first proposed in this paper. The method proposed in this paper can be extended to the audio steganalysis of other codec or audio forensics. Index Terms Audio Steganalysis, Spectrogram, Deep residual network, Feature representation. I. INTRODUCTION STEGANOGRAPHY is the technique that conceals secret information in digital carriers, such as text, image, audio, and video, to build covert communication. In many cases, these steganography schemes will be used by some malicious software or malicious organizations. Therefore, steganalysis technology has been studied extensively, which is the countermeasure technique of steganography to detect whether there is any secret information hidden in the data which looks normal. With the widespread application of various audio and video communication technologies, compressed audio is stored or spread in Internet ubiquitously. Today, the most widely used This work is supported by the Natural Science Foundation of China(NSFC) under the grant NO , NO. U , NO. U153624, and China Scholarship Council. Yanzhen Ren, Dengkai Liu, Qiaochu Xiong, Jianming Fu and Lina Wang are with Key Laboratory of Aerospace Information Security and Trusted Computing, Ministry of Education, School of Cyber Science and Engineering, Wuhan University, Wuhan 4372, China. ( renyz@whu.edu.cn; dengkailiu@whu.edu.cn; emmaxiong@whu.edu.cn; jmfu@whu.edu.cn; lnawang@163.com; ) audio compression standards are AAC (Advanced Audio Coding) and MP3 (MPEG-1 Audio Layer III). AAC plays a dominant role in various internet communication applications, and can achieve better sound quality than MP3 at similar bit-rate, so AAC is gradually replacing MP3 now. AAC is specified both as Part of the MPEG-2 and MPEG-4 standard [1], so it is used almost everywhere for the mainstream audio applications. Today, AAC has been the default audio coding format for many mainstream HDTV Standards, hardwares, mobilephones, and Internet communication applications [2], such as DVB(Digital Video Broadcasting), itunes and ipod, iphone, Play Station, YouTube, Twitter, Facebook, Youku, Tudou, Baidu Music, QQ Music, Himalayan, etc. The ubiquitous storage and communication of audio compression data provides a huge covert communication carrier channel. Many steganography schemes based on AAC and MP3 have been proposed [3] [15]. There are three main embedding domains in the compression parameters of AAC: Modified Discrete Cosine Transform coefficients (MDCT) [3], scale factor [4], [5], and Huffman coding [6] [8]. The basic method of these steganography schemes is modifying the compression parameters to hide the secret information. Wang et al. [3] proposed to embed secret information by modifying the small value of quantized MDCT coefficients to achieve good imperceptibility. In [6], the least significant bit (LSB) of the escape sequences in AAC Huffman coding are modified with matrix encoding [16] to embed secret information. In [7], the sign bit of the corresponding MDCT coefficient is modified to implement steganography, which utilizes the fact that the sign bits of the MDCT coefficients in the signed codebook will be coded separately. In [8], Zhu et al. proposed to embed secret information by modifying the sections of Huffman coding of AAC. In [9], Luo et al. proposed an adaptive audio steganography scheme in the time domain based on the advanced audio coding (AAC) and the Syndrome- Trellis coding (STC). Except for [9], the above steganography schemes are all modifying the compression parameters of AAC. Although the embedding domains are different for each scheme, but eventually the MDCT coefficients of each AAC frame are modified in different level. For MP3, there are many steganographic schemes emerged [1] [12]. Some MP3 steganographic tools have been published early, such as MP3Stego [13], UnderMP3Cover [14], and MP3Stegz [15]. Among them, MP3Stego [13] is the most tested MP3 steganography schemes of state-of-arts, which embeds the secret information by modifying the quantization step size in the encoding procedure. With the development of the steganography schemes of

2 2 AAC and MP3, many steganalysis schemes are followed [17] [25]. To detect the steganography schemes of AAC Huffman coding domain, Ren et al. [17] proposed to extract the Markov transition probability of neighboring scale factor bands codebook (SFB) as the steganalysis feature to model the correlation of the neighboring SFBs codebook. Calibration technology is used to improve the detection accuracy. To detect the steganography schemes of AAC MDCT coefficient domain, Ren et al. [19] proposed to extract the Markov transition probabilities and joint probability densities of first-order and second-order differential residuals of inter-frame and intra-frame MDCT coefficients as steganalysis features. The performance of the schemes [17], [19] are good for specific embedding domain but will be decreased for the other embedding domain. To detect MP3Stego [13], which is almost the most tested MP3 steganography scheme, there are many steganalysis schemes. Yan et al. [18] proposed to extract the difference between the quantization steps of MP3 neighboring frames as the steganalysis features. Yan et al. [2] proposed to extract the statistical inconsistencies in the statistical distribution of the number of bits in the audio bit pool before and after recompression as steganalysis scheme. Yu et al. [21] used the recompression calibration technique and proposed a steganalysis scheme based on the statistical distribution feature of Big value. Wang et al. [25] used Markov feature to capture the correlation of the quantized MDCT coefficients, which will be destroyed by MP3stego embedding procedure. However, the steganalysis features of all schemes above are extracted from the specific embedding domain, so they have good performance for the steganographic schemes of the identical embedding domain. With the excellent effect of Deep Neural Network (DNN) in various research fields, DNN had been used in many steganalysis schemes [26] [38]. Among them, there are two audio steganalysis schemes based on DNN. In [32], a CNN (convolutional neural networks) is designed to detect the audio steganography schemes of the time domain. This work shows the effect of the CNN to improve the performance of audio steganalysis scheme. However the scheme is good to detect the steganography scheme of audio time domain, and is ineffective for the steganography scheme of AAC and MP3. In [36], to detect the MP3 steganography schemes in Huffman coding domain, a steganalysis scheme based on CNN is proposed. There are three characters for this scheme: the quantified MDCT matrix is used as the input of the CNN network; a high pass filter is used to enhance the difference introduced by steganography; and many network optimization strategy are adopted. The experimental results show that the CNN-applied scheme performs better than the hand-crafted steganalysis features. However, the same problem existed, the performance of the scheme will be reduced to detect the other steganography schemes of different embedding domain. According to the current research results in steganalysis research area, it can be concluded that the deep neural network technology can really improve the detection accuracy of steganalysis [32], [35] [37]. However, the key point is how to express the steganalysis data more rich and accurate, and to design the deep neural network more effective and suitable for the steganalysis work. The purpose of this paper is aiming at these points to improve the generality of steganalysis scheme for AAC and MP3. Although there are many embedding schemes in the audio compression parameter of AAC and MP3, such as MDCT coefficients [3], scale factor [4], [5], Huffman coding [6] [8], all these schemes will lead to the change of the MDCT values in different ways, and eventually will change the correlation of the time-frequency of the audio signal. Since that, if the steganalysis features are extracted from the time-frequency characteristics of the audio signal, the steganalysis scheme will be more generative and can be used to detect the steganographic schemes which cause the change in the time-frequency characteristics, and almost all the steganographic schemes for the compression audio of AAC and MP3 [1] [12] belong to this category. Based on this idea, a steganalysis scheme Spec-ResNet (Deep Neural Network of Spectrogram) is proposed in this paper. There are three main contributions for the proposed scheme: To extract the universal features introduced by the steganography schemes based on perceptual audio coding, the spectrogram, which is the visual representation of the spectrum of frequencies of audio signal, is adopted as the input of the feature network. To extract the classification features of steganography schemes more effectively, the deep residual network is used as the basic framework of the proposed schemes Spec-ResNet. Since that the deep residual networks uses the deeper neural networks without the problem of gradient disappearance. It is suitable to extract the steganalysis features which are based on the weak signal. To capture the steganalysis features more completely, the features extracted from multiple spectrograms of different window size are combined. Since that the time frequency relationship of the spectrogram are different under different window size. A simple SVM classifier is used to verify the performance improvement of feature combination. The paper is organized as follows. The related background knowledge is introduced in section II. In section III, the proposed steganalysis scheme Spec-ResNet is described in detail. The experiments results are followed in section IV. Finally, the conclusion is drawn and the future work is discussed in V. II. PRELIMINARY A. The principle of audio codec Both AAC and MP3 are under the framework of perceptual audio coding, so their encoding principles are almost similar, including MDCT transform, psychoacoustic model, quantization and entropy coding. The improvement of AAC over MP3 include more sampling rates and channels supported, more efficient coding, additional modules added to increase compression efficiency, etc. The framework of perceptual audio coding and three main embedding domain is shown in Fig.1. During the process of perceptual audio coding, the psychoacoustic model is used firstly to remove the part of the sound signal that the human hearing cannot perceive to reduce the data rate. Then, the input pulse signal is subjected

3 3 audio signal Psychoacoustic model Scale factor Quantization coding Analysis filter bank MDCT coefficient Huffman coding Entropy coding Coding stream Fig. 1: The procedure of perceptual audio coding and three main embedding domain. to frame division or block division, and an analysis filter bank is used to perform time-frequency conversion. Each frequency domain sub-band is calculated according to a psychoacoustic model to obtain corresponding masking threshold, maximum permissible distortion. Three kinds of cyclic quantization and entropy coding will be performed on the transformed MDCT coefficient, and finally the audio code stream of entropy coding is output. B. The steganographic schemes of AAC and MP3 Due to lossy quantization in AAC and MP3 coding, the slight modification of the quantized integer MDCT coefficients will not decrease the audio quality obviously. Therefore, the embedding of the secret message can be implemented by finetuning the value of the compression parameter. There are three main embedding domain: MDCT coefficients, scale factor and Huffman coding parameters, which are shown in Fig.1. The basic idea of these steganography schemes is modifying the compression parameters slightly to hide the secret information. There are many AAC and MP3 steganography schemes, four steganography schemes from different embedding domains will be introduced in this section. 1) LSB-EE: In [6], the least significant bit (LSB) of the escape sequences in AAC Huffman coding are modified with matrix encoding to embed secret information. In AAC codec, escape sequence is a special codebook used to encode MDCT coefficients over 16. The modification of the LSB of escape sequence will bring little negative effect to audio quality. In AAC compressed data, MDCT coefficients whose value is over 16 accounts for 5%-6% of total coefficients, and they are mainly in low frequency portion, so the change of this scheme to the AAC audio signal is almost in the area of the low frequency signal. 2) MIN: In [3], Wang et al. proposes a method for implementing steganography in the small value region of MDCT coefficients. Since that the energy of the audio is mainly concentrated in the low and middle frequency portion, the MDCT coefficients of which are quantized with a smaller quantization step size, and the large quantization step size is used for the high frequency portion to reduce the distortion as much as possible. For the high frequency portion, the quantization range is larger, and it needs less coding bits. The amplitude of the quantized coefficient is basically {-1,, 1}, which is called the small value region of the MDCT coefficient. In the AAC quantized MDCT coefficient encoding stage, the coefficients of small value region are generally encoded by codebook 1 and codebook 2, and the 4 quantized MDCT coefficients are grouped into one index value index. The calculation formula is as follows in (1). 3 index = 3 i q [i] + 4, q [i] { 1,, 1} (1) i= The codeword is searched among the codebook according to index. In order to further reduce the distortion of AAC audio, the steganographic algorithm use the method of modifying the last MDCT coefficient q[3] and calculate a new index to replace the original one. In AAC compressed data, the small MDCT coefficients are mainly in the low and middle frequency portion, so the change of this scheme to the AAC audio signal is almost in the area of low and middle frequency signal. 3) SIGN: In [7], Zhu et al. proposed to embed message in the Huffman encoding process of AAC coding stream. The MDCT coefficients are encoded by a plurality of codebooks. According to the number of bits after encoding, the encoder selects an optimal codebook to encode 2 or 4 MDCT coefficients as a Huffman code. If the codebooks selected has sign bits, the sign bits of non-zero MDCT coefficients will be attached to codeword. The coding stream structure corresponding to the codebook with sign bits is shown in Table I and Table II. In codebook 1 and 2 or codebook 5 and 6, Huffman code represents 4 or 2 MDCT coefficients, while sign w, sign x, sign y, sign z is the sign bit of MDCT coefficients w, x, y, z respectively. sign = represents MDCT coefficient is negative, and sign = 1 represents MDCT coefficient is positive. TABLE I: Coding stream structure of codebook 1 and 2. Huffman code sign w sign x sign y sign z TABLE II: Coding stream structure of codebook 5 and 6. Huffman code sign x sign y With a appropriate threshold T, SIGN modified the sign bit of MDCT coefficients ranging from T to T accordingly to implement audio steganography. This subtle modification will not change any parameters in AAC encoding procedure except sign bit of specific MDCT coefficient, therefore it brings little distortion to audio quality. 4) MP3Stego: In [13], MP3Stego is a public tool to hide information in MP3 files during the MP3 compression process. There are two nested loops to quantify the MDCT coefficient in MP3 encoder, inner loop and outer loop. The outer loop is to verify the distortion of the audio signal to guarantee the audio quality. The inner loop is to select the suitable scalar factor to qualify the MDCT coefficients with the available bits number. The hiding operation of MP3Stego is in inner loop, The part2-3-length variable which records the number of the main-data bits is controlled by the selecting of the scalar factor

4 4 Amplitude Frequency(kHz) Waveform Time(s) Spectrogram Time(s) Fig. 2: Waveform and spectrogram of an audio segment. to make the LSB of part2-3-length variable equal to the secret bit. All these four steganographic schemes modified the compression parameters to hide information. Although the embedding domain is different, the final impact of all these schemes are the change of the MDCT coefficients of the encoded audio signals in different frequency range. Since that the above four schemes covered almost all frequency range of audio signals, so they are chosen as the target steganography schemes to verify the generalization ability of the proposed steganalysis scheme. C. Spectrogram Spectrogram is a basic visual representation of the spectrum of frequencies of audio signals, it can show the energy amplitude information of different frequency bands changing over time, it contains abundant time-frequency information of audio signal. Therefore spectrogram should be a good research object for general audio steganalysis to capture more relative features introduced by different audio steganography schemes. Fig. 2 is the spectrogram of an audio segment together with its waveform. In the spectrogram, horizontal axis represents the time domain and the vertical axis represents the frequency domain of the audio signal, the third dimension which is represented by the intensity of color of each point in the image indicating the amplitude of the frequency at the time. To plot the spectrogram of the audio signal, The signal will be divided into frames of special window size. The formant of spectrogram can effectively express the acoustic parameters, while its local characteristics can represent the additive noise very well, so it is feasible to capture the traces left by embedding operation through spectrogram. III. THE ARCHITECTURE OF SPEC-RESNET Based on the analysis above, an universal audio steganalysis scheme Spec-ResNet is proposed in this paper. spectrogram is adopted as the steganalysis object, deep residual network is used to extract the steganalysis features, and several different window spectrograms are combined together to extract more rich features. The main architecture of the Spec-ResNet scheme is shown in Fig. 3, which consists of three main components: Spectrogram Preprocess Module( SPM ), Deep Residual Steganalysis Network( S-ResNet ) and Classification Module( CM ). SPM generated the spectrogram of the specific window size of the input audio signal, and use filters to preprocess the spectrogram. S-ResNet is the steganalysis feature network framework based on the architecture of deep residual network to extract the steganalysis features of filtered spectrogram. CM is the traditional classification process, which fuse the features from multiple spectrograms under different window size. The detail of each module will be described below. A. Spectrogram Preprocess Module In this Module, the trained or the tested audio signals are represented as a spectrogram under specific windows size. Audio signals are highly correlated in both time and frequency domains, so the spectrogram is a perfect representation to analysis the correlation changes of time and frequency domain which are introduced by the steganography operation. The spectrogram of the audio signal contains three kind of signal: the audio content signal, the noise of real audio signal and the noise introduced by embedding operations. Just like the steganalysis task of image, the noise introduced by steganography is weak, so the noise of the audio signal should be amplified to improve the signal noise rate (SNR) of the steganography noise. In [39], Fridrich et. al. proposed an image steganalysis scheme based on the spatial domain rich model feature (SRM), in which multiple filters are introduced to capture the noise distribution characters from tested image in different directions and dimensions. Inspired by this work, the correlation of the intra-frame and inter-frame in spectrograms of 1 cover audio samples are analysed by linear regression, and multiple filters are designed to preprocess the spectrogram. The audio signals are 2s duration with 32 khz, AAC file, and decoded as wav file. The spectrogram dimension is n m, while n is half of the window size due to the symmetry in Fourier transform, m is the count of total frame in one piece of audio with 5% overlap(e.g., if window size N = 124, the dimension of spectrogram is ). Multiple linear regression model is showed in (2), y is the dependent variable and x t, t = 1, 2,..., s is the independent variable, β is bias. For 3 3 filter, the value of s is 8, and the correlation between the matrix center point and 8 signal points around is presented in Fig. 4. y = β + β 1 x β t x t β s x s (2) In Fig.4, both of the correlation between the neighboring points in horizonal and vertical direction are relatively strong, especially in vertical direction. In spectrogram, the vertical direction correlation is the relevance of the neighboring frequency band in the same frame, while horizonal direction described the relation between the successive frames in the same frequency band. Based on the analysis above, a set of filters from different direction were chosen as the first convolutional layer of deep residual network to extract the

5 5 Filters with fixed parameters Deep residual network(s-resnet) Spectrogram Original audio signal... Classification module(cm) Filters with fixed parameters Deep residual network(s-resnet) Spectrogram Fig. 3: The architecture of proposed steganalysis scheme Spec-ResNet Fig. 4: Result of the linear regression. The depth of the color or the absolute value of logistic coefficients represent the strength of the correlation. energy difference introduced by steganographic noise between cover and stego samples. In this work, 4 fixed-parameters filters are chosen to preprocess the spectrogram. Filters size is 3 3, and blanks are filled with zero, which is shown in Fig. 5. In the future work, the filter parameters can be trained by the network to improve the performance of the proposed scheme. B. Deep Residual Steganalysis Network The spectrogram of audio signal is filtered by the four filters. Then it is processed by the initial convolutional layer with 1 output channels to extract the potential correlation in the feature matrix. Deep residual steganalysis network( S-ResNet ) in this paper consists of 3 layers and there is a shortcut for each two convolution layers. After that, a global average pooling layer is applied. The whole architecture of S-ResNet Fig. 5: Filters based on the correlation of intra-frame and inter-frame. is shown in Fig. 6 and the details about the S-ResNet are described as follows. 1) Convolution layers: The convolution layers in this paper are 3 3 kernel, no matter the parameter of the filters is fixed or not. Filters introduced in SPM is applied to weak the influence of the audio content in spectrogram. The rest convolution layers which include the initial convolution layer after SPM, are used to capture the potential correlation inherent in the feature matrix. According to the numbers of the channels, the convolution layers in residual network can be classified into three types or groups:,, and, which is shown in Fig. 7. Number of channels in,, and are 1, 2, and 4 respectively, and the stride is 1 in both of the horizonal and vertical direction if no special declaration. Moreover, the number of residual units in each group is 5, which means 1 convolutional layers. +1.

6 6 Audio Spectrogram Preprocess Module 3 3 Init conv, 1 BN ReLu, stride=2 in both directions 3 3 average pooling, stride=2 in both directions 3 3 average pooling 3 3 Conv, 1 BN ReLu 3 3 Conv, 1 + Global average pooling Classification Module Fig. 6: The structure of deep residual network S-ResNet.. x x BN ReLU Weight layer ReLU F(x) Weight layer ReLU x Conv, 3 3 channels, stride=1 both in horizonal and vertical direction if not specified explicitly. Fig. 7: Convolution layers with different architecture, in which 3 3 channels(channels = 1, 2, or 4) represents filter length filter width number of channels respectively. is with 1 channels, is with 2 channels, and is with 4 channels.. Weight layer H(x) ReLU Weight layer F(x)+x ReLU (a) Convolutional unit (b) Residual unit. Fig. 8: The architecture of Convolutional unit and Residual unit. 2) Residual unit: In the common sense, for the Convolutional Neural Networks(CNN), more convolutional layers means that the features extracted from different dimensions are richer and will achieve better classification ability. However, once the convolutional neural network reaches a certain depth, the increasing number of neural network layers will lead to the degradation of the performance due to the possible vanishing gradient problem. At this time, the training accuracy and test accuracy will be declining, and it will be hard to train an effective neural network, and the learning ability of the Deep Neural Network will be limited. Compared with classical CNN, residual network [4] is capable of integrating more convolutional layers without declining network performance. The model learning can be simplified to approximate an mapping function learning, which is described in Fig. 8. H (x) = x is the hypothesis model of cascaded convolutional layers. the work of CNN is trying to find a mapping function H (x) directly. However residual network is different due to the identity function x, it

7 7 try to find a residual function F (x) = H (x) x instead. In this way, residual network is transformed to approximate ideal hypothesis by learning the residual F (x) =, which is easier than approximating H (x) directly. Obviously, convolutional output is more sensitive to the change of input than before with identity function x introduced. In Fig.6, the arc across each two convolution layers represents the shortcut. The real arc indicates x has the same size as F (x) and the two tensors can be added together directly, while the dashed arc between and or and indicates that they have different size and x needs downsampling before the add operation. 3) Batch normalization layer: Batch normalization is proposed in [41] to accelerate deep network training. The distribution of the inputs before each hidden layer varies with each training iterations. Batch normalization performs normalization for each training iterations instead of setting lower learning rates at first or restricting parameter initialization, which needs more time to train the model. It is applied before each new convolution layer in Fig.6. 4) Activation function: Besides batch normalization, activation function is also applied after each batch normalization in S-ResNet. Nonlinear links between convolution layers have better expression ability than linear model, therefore ReLU (Rectified Linear Unit) is adopted as activation function in this paper. 5) Pooling layers: In order to reduce computational complexity, max pooling layer or average pooling layer are used in deep network to reduce feature dimension. 3 3 Average pooling layer is applied to S-ResNet between and or Between and with stride = 2 in both horizontal and vertical directions. There exists no pooling layer in shortcut across,, or to keep the dimension of x and F (x) consistent. A global average pooling is applied right before classification module (CM) to get 4-dimensional classification feature. C. Classification module In the existing CNN-based [32], [36] or RNN-based [42] steganalysis scheme, classification module of network is always a fully-connected layer with a 2-way softmax, which is shown in Fig.9 to get the probability of 2 class labels. With a given threshold ranging from to 1(default to.5), the tested samples will be judged as cover or stego. is low, which does not reflect the texture characteristics of the sound well. The frequency resolution of the narrow-band spectrogram is too small to properly identify the position of the formant. In the spectrogram feature map under different window sizes, the audio signals characteristics are different, so it is very important to select the window size suitable for steganalysis. In this paper, spectrogram with different window size is considered together. For a specific audio sample, its spectrogram has different dimension with different N, however the classification feature s dimension after global average pooling is constant. In our experiment, three spectrogram with different windows are chosen to increase the diversity of features. three 4-dimensional features are cascaded to a 12- dimensional features as the final steganalysis feature. Fig. 1 showed the classification module in this paper, and the value of window size N is 124, 512, and 256 for AAC. LibSVM [43] is introduced to train the ultimate classification model based on the 12-dimensional feature originating from audio samples. In the stage of detection, the 12-dimensional feature extracted from the test audio samples are used to judge whether a piece of audio is cover or stego audio. Spectrogram with N=124 (512*128) Spectrogram with N=512 (256*256) Spectrogram with N=256 (128*512) Trained model-1 Trained model-2 Trained model-3 4-dimension feature 4-dimension feature 4-dimension feature Fig. 1: The procedure of combining three 4-dimension feature into 12-dimension classification feature.. S-ResNet 4-dimension feature Fully connected layer Softmax Classification result Fig. 9: Original CNN classification module. In short-time Fourier transform, different spectrograms with different scales will be generated when different window size used. When window size N is relatively small, broadband spectrogram will be generated, the time-bandwidth is narrow and the corresponding frequency-bandwidth is broad. the narrow-band spectrogram is generated vice versa. The broad-band spectrogram can reflect the time-varying characteristics of the spectrum well, but the frequency resolution. IV. EXPERIMENTS To evaluate the performance of the proposed scheme Spec- ResNet, three main experiments have been carried out. The purpose of the first experiment is to evaluate the detection accuracy of the proposed scheme for the three AAC steganography schemes of different embedding domain. Spectrogram with three different windows are used to evaluate the influence to the detecting performance. The second experiment is implemented to compare Spec-ResNet with the work of the existing steganalysis schemes. The third experiment is done to detect the MP3Stego, which is under MP3 codec and embedding domain is scalar factor.

8 8 A. Experiment setup 1) Data set: To the best of our knowledge, there is still no public audio data set for AAC or MP3 steganalysis, so the audio data set is constructed by ourselves. 186 pieces of music are downloaded from Internet, including different musical styles such as jazz, rock, country, pop, different language(mainly Chinese and English), and different gender etc. The music samples have been cut into clips to construct the AAC audio data set and MP3 audio data set. All the audio data sets of the experiments described below together with the source code of the proposed scheme will be public on GitHub 1. Specifically, this project is based on Tensorflow CUDA CuDNN Python In addition, all parameters are trained with Adam optimizer: mini-batch size is 32, which means 16 pairs of cover and stego audio samples; the learning rate decay is.9 and the weight decay is a) AAC audio data set: The 1, audio samples with 16 khz, 16 bit and 2s duration are used to construct 4 different audio sample sets, including one cover sample set (CDB AAC), and three stego sample sets (SDB AAC LSB, SDB AAC MIN, SDB AAC SIGN). The main construction of the four data sets are as follows. CDB AAC: The audio samples of WAV file are encoded by the open-source audio encoder FAAC [44] as 32 kbps, then decoded to WAV by FAAD2. The decoded audio samples are of 32 khz with 2 channels, and the total number of cover samples in CDB AAC is 1,. SDB AAC LSB: The 1, audio samples(.m4a format) are generated by LSB EE steganographic scheme [6], and the secret message is embedded with a relative embedding rate (EBR) of 1%, 2%, 3%, 5%, and 1% respectively. The total number of stego samples(.wav format) in SDB AAC LSB is 1, 5 = 5, after the compressed audio bitstream are decoded by FAAD2. SDB AAC MIN: The 1, audio samples(.m4a format) are generated by MIN steganographic scheme [3], and the secret message is embedded with a EBR of 1%, 2%, 3%, 5%, and 1% respectively. The total number of stego samples(.wav format) in SDB AAC MIN is 1, 5 = 5, after the compressed audio samples are decoded by FAAD2. SDB AAC SIGN: The 1, audio samples(.m4a format) are generated by SIGN steganographic scheme [7], and the secret information is embedded with a EBR of 1%, 2%, 3%, 5%, and 1% respectively. The total number of stego samples(.wav format) in SDB AAC SIGN is 1, 5 = 5, after the compressed audio samples are decoded by FAAD2. b) MP3 audio data sets: The 9, audio samples with 16 bit, 44.1 khz, and 5s segment are used to construct 2 different audio data sets, including a cover samples set(cdb MP3) and a stego samples set(sdb MP3). The main contents of the two data sets are as follows. CDB MP3: The audio samples of WAV file are encoded as 128 kbps and decoded by the public MP3 codec. The decoded 1 ResNet audio samples(.wav format) are 44.1 khz, and the total number of cover samples in CDB MP3 is 9,. SDB MP3: The audio samples of WAV file are encoded by MP3Stego [13]. Since that the maximum embedding capacity of MP3Stego is roughly 2 bits for each frame and 5s audio samples into which 3 Bytes message are embeded has 191 frames, so the EBR is nearly 6%. Besides, the total number of stego samples in SDB MP3 is 9,. 2) Metrics: To evaluate the detection accuracy of the proposed scheme and contrast schemes, TPR (True Positive Rate), TNR (True Negative Rate) and ACC (Classification Accuracy) are adopted as metrics. TPR means the proportion that stego samples are detected as stego, TNR means the proportion that cover samples are detected as cover. ACC is the detection accuracy for all cover and stego samples. When the number of cover samples and stego samples are equal, ACC is the average of TPR and TNR. B. Experiments and Results 1) Experiment I: This experiment is carried out to evaluate the detection accuracy of the proposed scheme. Three AAC steganography schemes with different embedding domain are detected. Spectrograms with three different window size N win = 256, 512, or 1, 24 are extracted from 6, audio samples in CDB AAC, SDB AAC LSB, SDB AAC MIN, and SDB AAC SIGN respectively. In the training procedure, the spectrograms extracted from 3, audio samples in CDB AAC are used as the cover input signal, and the spectrograms of 3, audio samples with.3 EBR in SDB AAC LSB, SDB AAC MIN and SDB AAC SIGN are selected as the stego input signal. For a specific window size, train set contains 6, feature matrices described above (for N = 256, the dimension of spectrogram feature matrix is , for N = 512, the dimension of spectrogram feature matrix is , and for N = 124, the dimension of the spectrogram feature matrix is ). However, the architecture of test set is different. For cover samples, spectrogram of the rest 3, audio samples in CDB AAC are selected, and spectrogram of the corresponding 3, audio samples generated by different schemes and different EBR (2 samples in each algorithm of a specific EBR) are selected as the stego samples. Three models are trained based on spectrogram with three different window size respectively. The results of experiment I are shown in TABLE III. N = 256, 512, or 124 represent the spectrogram window size. EA is the embedding schemes, and ACC shows the average detecting accuracy of the trained model towards cover or stego samples generated by a specific steganographic algorithms with multiple EBR, while AVERAGE represents the average detecting accuracy for the cover or stego samples of a specific EBR of multiple schemes. The results show that the performance of the classification model trained by spectrogram with different window size are different. The average accuracy for cover samples of different model are all above 83.89%. When N = 512, the accuracy for cover samples can achieve 93.47%, and the average detecting accuracy for the stego samples with 1% EBR is above 82.6%. When N = 124, the accuracy

9 9 reaches 89.5%. It shows that the proposed scheme Spec- ResNet can be used to detect multiple steganographic schemes of different embedding domain and have good performance. 2) Experiment II: This experiment is carried out to compare the detection accuracy of the proposed scheme with the existing steganalysis schemes. Although feature matrix of spectrogram with different window size has different shape, the length of feature through global average pooling layer is constant, explicitly 4. For spectrogram of a specific window size, a classification model is trained based on it and will be saved thereafter. Then three 4-dimensional feature originating from three trained classification model are merged into a 12-dimensional feature. SVM is used to train the ultimate classification model, which considers spectrogram feature with different scale. In particular, the architecture of train set and test set is just the same as that of Experiment I. The detecting performance of Spec-ResNet is compared with Ren et al. s scheme [19], Luo et al. s scheme [32], and Zhao et al. s scheme [36]. In Ren et al. s scheme [19], a classical rich model feature are introduced to fully analyze the Markov transition probabilities and joint probability densities of first-order differentials and second-order differential residuals of inter-frame and intra-frame MDCT coefficients. In Luo et al. s scheme [32], the classifier is trained with seven convolutional neural network based on original audio signal. Differently, a CNN is adopted to process quantified MDCT coefficients matrix in [36], while this CNN is replaced with S-ResNet above in our experiments. Audio samples with an duration of 2s are adopted for performance testing in this experiment, and 8, audio samples including 4, cover samples and 4, stego samples are used for training, The left 12, audio samples are used for testing. The results of experiment II is shown in TABLE IV. Method represents the steganalysis schemes and S len is the sample length. EA, AVERAGE and ACC is the same as that in TABLE III. TABLE IV shows that the detecting performance of steganalysis scheme Spec-ResNet is better than that of the three trained model in TABLE III, which shows that the feature combination of spectrogram with different windows can improve the performance of Spec-ResNet. Compared with the existing steganalysis scheme, the average detecting accuracy of Spec-ResNet is higher than that of the existing schemes, including traditional hand-crafted feature [19] and CNN based scheme [32], [36]. In [19], the length of the audio samples is 2 seconds, but the performance of Spec-Resnet still outperformed it under the length of 2s audio samples. Fig. 11 is the graphical representation of TABLE IV. 3) Experiment III: To evaluate the detection performance of the Spec-ResNet for MP3, MP3Stego is chosen as the detecting object. The training procedure is the same as Experiment I. The audio duration is 5s and the three different window sizes are N = 512, 124, or 248 respectively (for N = 512, 124, or 248, the dimension of feature matrix is , , ). Since that the samples in MP3 audio data set is 44.1 khz. Train set consists of spectrogram of 15 audio samples in CDB MP3 and 15 audio samples in SDB MP3, and test set is the same as train set in architecture. There is no intersection between train set and test set. The performance of Spec-ResNet is compared with the existing steganalysis scheme. Wang et al. s method [25] designed a Markov feature captured the correlation of the quantized MDCT coefficients (refer to as QMDCTs), which will be destroyed by MP3stego s embedding behaviour. Luo et al. s scheme [32] are compared to detect MP3stego with audio duration of 5 seconds, in which 8 audio samples are chosen to train CNN model, and 2 audio samples are used to test the trained model. The detecting performance of proposed scheme Spec- ResNet towards MP3Stego was compared with the existing schemes proposed in [25], [32]. The results are presented in TABLE V. Mixture represents the mixture classification model which is trained on the three spectrograms with different window size and S len & EBR is sample length & relative embedding rate. Spec-ResNet achieves better performance compared with the trained model based on spectrogram with only a specific window size. The detecting accuracy (ACC) of the proposed scheme which is in bold is higher than the existing steganalysis scheme based on traditional hand-crafted feature [25] or deep neural network [32]. That means the proposed steganalysis scheme Spec-ResNet can also be applied to other audio coding standards, such as MP3, and can achieve good detection accuracy. V. CONCLUSION In this paper, a general steganalysis scheme Spec-ResNet based on the spectrogram and deep residual network is proposed, it can be used to detect the steganography schemes of different embedding domain of AAC and MP3. Three conclusions can be obtained from this work: 1). Spectrogram, represents the spectral relationship in time series of the audio signal, is good to be as the analysis signal for audio steganalysis scheme to obtain more universal feature. It can be extended to the other audio analysis area, such as audio forensic, to improve the classification performance; 2) Deep residual network, which solved the problem of gradient disappearance, is suitable to be used to extract the steganalysis features which are based on weak signal changes; 3) Fusion of training features from different scales will improve the classification accuracy. The experiment results show that the proposed scheme has good detection accuracy and generality. It achieves better detection accuracy for three main AAC embedding domain: MDCT, scale factor and Huffman coding and MP3Stego than the existing steganalysis scheme. To the best of our knowledge, the audio steganalysis scheme based on the spectrogram and deep residual network is first proposed in this paper. The method of this paper can be applied to the other kind of audio classification works. REFERENCES [1] I , Iec information technology-coding of audio-visual objectspart 3: audio, 24. [2] (218) Advanced Audio Coding. [Online]. Available: wikipedia.org/wiki/advanced Audio Coding [3] Y.-J. Wang, L. Guo, and C.-P. Wang, Steganography method for advanced audio coding, Journal of Chinese Computer Systems, vol. 32, no. 7, pp , 211.

10 1 TABLE III: The detecting performance of spectrogram with different window size. Spectrogram EA TNR TPR Cover 1% 2% 3% 5% 1% ACC LSB-EE [6] N =124 MIN [3] SIGN [7] AVERAGE LSB-EE [6] N =512 MIN [3] SIGN [7] AVERAGE LSB-EE [6] N =256 MIN [3] SIGN [7] AVERAGE TABLE IV: The performance of Spec-ResNet compared with the existing steganalysis schemes. Method S len EA TNR TPR Cover 1% 2% 3% 5% 1% ACC Spec-ResNet Ren et al. [19] Zhao et al. [36] Luo et al. [32] 2s LSB-EE [6] s MIN [3] s SIGN [7] AVERAGE s LSB-EE [6] s MIN [3] s SIGN [7] AVERAGE s LSB-EE [6] s MIN [3] s SIGN [7] AVERAGE s LSB-EE [6] s MIN [3] s SIGN [7] AVERAGE

11 11 1 LSB-EE 1 MIN 1 SIGN detection accuracy proposed, 2s.2 Ren et.al, 2s.1 Zhao et.al, 2s Luo et.al, 2s relative embedding rate(%) (a) detecting accuracy for LSB-EE [6] detection accuracy proposed, 2s.2 Ren et.al, 2s.1 Zhao et.al, 2s Luo et.al, 2s relative embedding rate(%) (b) detecting accuracy for MIN [3] detection accuracy proposed, 2s.2 Ren et.al, 2s.1 Zhao et.al, 2s Luo et.al, 2s relative embedding rate(%) (c) detecting accuracy for SIGN [7] Fig. 11: The performance of Spec-ResNet compared with existing steganalysis scheme. TABLE V: The performance to detect MP3Stego [13]. Algorithm S len & EBR TNR TPR ACC Spec-ResNet:N =512 5s, 6% Spec-ResNet:N =124 5s, 6% Spec-ResNet:N =248 5s, 6% Spec-ResNet:Mixture 5s, 6% Luo et al. [32] 5s, 6% Wang et al. [25] 1s, 1%.925 [4] S. Xu, P. Zhang, P. Wang, and H. Yang, Performance analysis of data hiding in mpeg-4 aac audio, Tsinghua Science and Technology, vol. 14, no. 1, pp , 29. [5] Y. Wei, L. Guo, and Y. Wang, Controlling bitrate steganography on aac audio, in Image and Signal Processing (CISP), 21 3rd International Congress on, vol. 9. IEEE, 21, pp [6] Y. Wang, L. Guo, Y. Wei, and C. Wang, A steganography method for aac audio based on escape sequences, in 21 International Conference on Multimedia Information Networking and Security. IEEE, 21, pp [7] J. Zhu, R. Wang, and D. Yan, The sign bits of huffman codeword-based steganography for aac audio, in Multimedia Technology (ICMT), 21 International Conference on. IEEE, 21, pp [8] J. Zhu, R.-D. Wang, J. Li, and D.-Q. Yan, A huffman coding sectionbased steganography for aac audio, Information Technology Journal, vol. 1, no. 1, pp , 211. [9] W. Luo, Z. Yue, and H. Li, Adaptive audio steganography based on advanced audio coding and syndrome-trellis coding, 217. [1] M. Bazyar and R. Sudirman, A recent review of mp3 based steganography methods, International Journal of Security and Its Applications, vol. 8, no. 6, pp , 214. [11] K. Yang, X. Yi, X. Zhao, and L. Zhou, Adaptive mp3 steganography using equal length entropy codes substitution, in International Workshop on Digital Watermarking. Springer, 217, pp [12] R. Zhang, J. Liu, and F. Zhu, A steganography algorithm based on mp3 linbits bit of huffman codeword, in International Conference on Intelligent Information Hiding and Multimedia Signal Processing. Springer, 217, pp [13] F. Petitcolas, mp3stego, cl. cam. ac. uk/ fapp2/steganography/mp3stego/index. html, [14] C. Platt, Undermp3cover, 24. [15] Z. Achmad, Mp3stegz, 28. [16] J. Fridrich and D. Soukal, Matrix embedding for large payloads, IEEE Transactions on Information Forensics & Security, vol. 1, no. 3, pp , 26. [17] Y. Ren, Q. Xiong, and L. Wang, Steganalysis of aac using calibrated markov model of adjacent codebook, in Acoustics, Speech and Signal Processing (ICASSP), 216 IEEE International Conference on. IEEE, 216, pp [18] D. Yan, R. Wang, X. Yu, and J. Zhu, Steganalysis for mp3stego using differential statistics of quantization step, Digital Signal Processing, vol. 23, no. 4, pp , 213. [19] Y. Ren, Q. Xiong, and L. Wang, A steganalysis scheme for aac audio based on mdct difference between intra and inter frame, in International Workshop on Digital Watermarking. Springer, 217, pp [2] D. Yan and R. Wang, Detection of mp3stego exploiting recompression calibration-based feature, Multimedia tools and applications, vol. 72, no. 1, pp , 214. [21] X. Yu, R. Wang, and D. Yan, Detecting mp3stego using calibrated side information features, Journal of Software, vol. 8, no. 1, 213. [22] M. Qiao, A. H. Sung, and Q. Liu, Mp3 audio steganalysis, Information sciences, vol. 231, pp , 213. [23] R. Kuriakose and P. Premalatha, A novel method for mp3 steganalysis, in Intelligent Computing, Communication and Devices. Springer, 215, pp [24] C. Jin, R. Wang, D. Yan, P. Ma, and K. Yang, A novel detection scheme for mp3stego with low payload, in IEEE China Summit & International Conference on Signal and Information Processing, 214, pp [25] C. Jin, R. Wang, and D. Yan, Steganalysis of mp3stego with low embedding-rate using markov feature, Multimedia Tools and Applications, vol. 76, no. 5, pp , 217. [26] V. Holub and J. Fridrich, Random projections of residuals for digital image steganalysis, IEEE Transactions on Information Forensics & Security, vol. 8, no. 12, pp , 213. [27] S. Tan and B. Li, Stacked convolutional auto-encoders for steganalysis of digital images, in Signal and Information Processing Association Summit and Conference, 214, pp [28] Y. Qian, J. Dong, W. Wang, and T. Tan, Deep learning for steganalysis via convolutional neural networks, in Media Watermarking, Security, and Forensics 215, vol International Society for Optics and Photonics, 215, p. 949J. [29] G. Xu, H. Z. Wu, and Y. Q. Shi, Structural design of convolutional neural networks for steganalysis, IEEE Signal Processing Letters, vol. 23, no. 5, pp , 216. [3] Y. Qian, J. Dong, W. Wang, and T. Tan, Learning and transferring representations for image steganalysis using convolutional neural network, in Image Processing (ICIP), 216 IEEE International Conference on. IEEE, 216, pp [31] G. Xu, H. Z. Wu, and Y. Q.. Shi, Ensemble of cnns for steganalysis: An empirical study, in ACM Workshop on Information Hiding and Multimedia Security, 216, pp [32] B. Chen, W. Luo, and H. Li, Audio steganalysis with convolutional neural network, in Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security. ACM, 217, pp [33] V. Sedighi and J. Fridrich, Histogram layer, moving convolutional neural networks towards feature-based steganalysis, Electronic Imaging, vol. 217, no. 7, pp. 5 55, 217.

Fig. 1: The perceptual audio coding procedure and three main embedding domains.

Fig. 1: The perceptual audio coding procedure and three main embedding domains. 1 Spec-ResNet: A General Audio Steganalysis Scheme based on a Deep Residual Network for Spectrograms Yanzhen Ren, Member, IEEE, Dengkai Liu, Qiaochu Xiong, Jianming Fu, and Lina Wang arxiv:191.6838v2 [cs.mm]

More information

Audio-coding standards

Audio-coding standards Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system.

More information

CHAPTER 5 AUDIO WATERMARKING SCHEME INHERENTLY ROBUST TO MP3 COMPRESSION

CHAPTER 5 AUDIO WATERMARKING SCHEME INHERENTLY ROBUST TO MP3 COMPRESSION CHAPTER 5 AUDIO WATERMARKING SCHEME INHERENTLY ROBUST TO MP3 COMPRESSION In chapter 4, SVD based watermarking schemes are proposed which met the requirement of imperceptibility, having high payload and

More information

5: Music Compression. Music Coding. Mark Handley

5: Music Compression. Music Coding. Mark Handley 5: Music Compression Mark Handley Music Coding LPC-based codecs model the sound source to achieve good compression. Works well for voice. Terrible for music. What if you can t model the source? Model the

More information

Audio-coding standards

Audio-coding standards Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system.

More information

Principles of Audio Coding

Principles of Audio Coding Principles of Audio Coding Topics today Introduction VOCODERS Psychoacoustics Equal-Loudness Curve Frequency Masking Temporal Masking (CSIT 410) 2 Introduction Speech compression algorithm focuses on exploiting

More information

Adaptive Spatial Steganography Based on the Correlation of Wavelet Coefficients for Digital Images in Spatial Domain Ningbo Li, Pan Feng, Liu Jia

Adaptive Spatial Steganography Based on the Correlation of Wavelet Coefficients for Digital Images in Spatial Domain Ningbo Li, Pan Feng, Liu Jia 216 International Conference on Information Engineering and Communications Technology (IECT 216) ISBN: 978-1-69-37- Adaptive Spatial Steganography Based on the Correlation of Wavelet Coefficients for Digital

More information

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved.

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved. Compressed Audio Demystified Why Music Producers Need to Care About Compressed Audio Files Download Sales Up CD Sales Down High-Definition hasn t caught on yet Consumers don t seem to care about high fidelity

More information

A Detailed look of Audio Steganography Techniques using LSB and Genetic Algorithm Approach

A Detailed look of Audio Steganography Techniques using LSB and Genetic Algorithm Approach www.ijcsi.org 402 A Detailed look of Audio Steganography Techniques using LSB and Genetic Algorithm Approach Gunjan Nehru 1, Puja Dhar 2 1 Department of Information Technology, IEC-Group of Institutions

More information

A Study on Different JPEG Steganograhic Schemes

A Study on Different JPEG Steganograhic Schemes A Study on Different JPEG Steganograhic Schemes Alphy Ros Mathew, Sreekumar K Department of Computer Science, College of Engineering,Ponjar, Cochin University of Science And Technology Kottayam,Kerala,India

More information

Quantitative and Binary Steganalysis in JPEG: A Comparative Study

Quantitative and Binary Steganalysis in JPEG: A Comparative Study Quantitative and Binary Steganalysis in JPEG: A Comparative Study Ahmad ZAKARIA LIRMM, Univ Montpellier, CNRS ahmad.zakaria@lirmm.fr Marc CHAUMONT LIRMM, Univ Nîmes, CNRS marc.chaumont@lirmm.fr Gérard

More information

Multimedia Communications. Audio coding

Multimedia Communications. Audio coding Multimedia Communications Audio coding Introduction Lossy compression schemes can be based on source model (e.g., speech compression) or user model (audio coding) Unlike speech, audio signals can be generated

More information

Robust Steganography Using Texture Synthesis

Robust Steganography Using Texture Synthesis Robust Steganography Using Texture Synthesis Zhenxing Qian 1, Hang Zhou 2, Weiming Zhang 2, Xinpeng Zhang 1 1. School of Communication and Information Engineering, Shanghai University, Shanghai, 200444,

More information

Data Hiding on Text Using Big-5 Code

Data Hiding on Text Using Big-5 Code Data Hiding on Text Using Big-5 Code Jun-Chou Chuang 1 and Yu-Chen Hu 2 1 Department of Computer Science and Communication Engineering Providence University 200 Chung-Chi Rd., Shalu, Taichung 43301, Republic

More information

Channel Locality Block: A Variant of Squeeze-and-Excitation

Channel Locality Block: A Variant of Squeeze-and-Excitation Channel Locality Block: A Variant of Squeeze-and-Excitation 1 st Huayu Li Northern Arizona University Flagstaff, United State Northern Arizona University hl459@nau.edu arxiv:1901.01493v1 [cs.lg] 6 Jan

More information

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding Perceptual Coding Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding Part II wrap up 6.082 Fall 2006 Perceptual Coding, Slide 1 Lossless vs.

More information

The Steganography In Inactive Frames Of Voip

The Steganography In Inactive Frames Of Voip The Steganography In Inactive Frames Of Voip This paper describes a novel high-capacity steganography algorithm for embedding data in the inactive frames of low bit rate audio streams encoded by G.723.1

More information

Appendix 4. Audio coding algorithms

Appendix 4. Audio coding algorithms Appendix 4. Audio coding algorithms 1 Introduction The main application of audio compression systems is to obtain compact digital representations of high-quality (CD-quality) wideband audio signals. Typically

More information

Concealing Information in Images using Progressive Recovery

Concealing Information in Images using Progressive Recovery Concealing Information in Images using Progressive Recovery Pooja R 1, Neha S Prasad 2, Nithya S Jois 3, Sahithya KS 4, Bhagyashri R H 5 1,2,3,4 UG Student, Department Of Computer Science and Engineering,

More information

ELL 788 Computational Perception & Cognition July November 2015

ELL 788 Computational Perception & Cognition July November 2015 ELL 788 Computational Perception & Cognition July November 2015 Module 11 Audio Engineering: Perceptual coding Coding and decoding Signal (analog) Encoder Code (Digital) Code (Digital) Decoder Signal (analog)

More information

Research Article A Novel Steganalytic Algorithm based on III Level DWT with Energy as Feature

Research Article A Novel Steganalytic Algorithm based on III Level DWT with Energy as Feature Research Journal of Applied Sciences, Engineering and Technology 7(19): 4100-4105, 2014 DOI:10.19026/rjaset.7.773 ISSN: 2040-7459; e-issn: 2040-7467 2014 Maxwell Scientific Publication Corp. Submitted:

More information

SPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL

SPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL SPREAD SPECTRUM WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL 1 Yüksel Tokur 2 Ergun Erçelebi e-mail: tokur@gantep.edu.tr e-mail: ercelebi@gantep.edu.tr 1 Gaziantep University, MYO, 27310, Gaziantep,

More information

Chapter 14 MPEG Audio Compression

Chapter 14 MPEG Audio Compression Chapter 14 MPEG Audio Compression 14.1 Psychoacoustics 14.2 MPEG Audio 14.3 Other Commercial Audio Codecs 14.4 The Future: MPEG-7 and MPEG-21 14.5 Further Exploration 1 Li & Drew c Prentice Hall 2003 14.1

More information

Increasing Robustness of LSB Audio Steganography Using a Novel Embedding Method

Increasing Robustness of LSB Audio Steganography Using a Novel Embedding Method ShriRam College of Engineering & Management 1 Increasing Robustness of LSB Audio Steganography Using a Novel Embedding Method Rinki Baghel Pragya Sharma Abstract In this paper, we present a novel high

More information

Data Hiding in Video

Data Hiding in Video Data Hiding in Video J. J. Chae and B. S. Manjunath Department of Electrical and Computer Engineering University of California, Santa Barbara, CA 9316-956 Email: chaejj, manj@iplab.ece.ucsb.edu Abstract

More information

The Analysis and Detection of Double JPEG2000 Compression Based on Statistical Characterization of DWT Coefficients

The Analysis and Detection of Double JPEG2000 Compression Based on Statistical Characterization of DWT Coefficients Available online at www.sciencedirect.com Energy Procedia 17 (2012 ) 623 629 2012 International Conference on Future Electrical Power and Energy Systems The Analysis and Detection of Double JPEG2000 Compression

More information

An Improved DCT Based Color Image Watermarking Scheme Xiangguang Xiong1, a

An Improved DCT Based Color Image Watermarking Scheme Xiangguang Xiong1, a International Symposium on Mechanical Engineering and Material Science (ISMEMS 2016) An Improved DCT Based Color Image Watermarking Scheme Xiangguang Xiong1, a 1 School of Big Data and Computer Science,

More information

FUSION MODEL BASED ON CONVOLUTIONAL NEURAL NETWORKS WITH TWO FEATURES FOR ACOUSTIC SCENE CLASSIFICATION

FUSION MODEL BASED ON CONVOLUTIONAL NEURAL NETWORKS WITH TWO FEATURES FOR ACOUSTIC SCENE CLASSIFICATION Please contact the conference organizers at dcasechallenge@gmail.com if you require an accessible file, as the files provided by ConfTool Pro to reviewers are filtered to remove author information, and

More information

2.4 Audio Compression

2.4 Audio Compression 2.4 Audio Compression 2.4.1 Pulse Code Modulation Audio signals are analog waves. The acoustic perception is determined by the frequency (pitch) and the amplitude (loudness). For storage, processing and

More information

A Flexible Scheme of Self Recovery for Digital Image Protection

A Flexible Scheme of Self Recovery for Digital Image Protection www.ijcsi.org 460 A Flexible Scheme of Self Recoery for Digital Image Protection Zhenxing Qian, Lili Zhao 2 School of Communication and Information Engineering, Shanghai Uniersity, Shanghai 200072, China

More information

A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features and its Performance Analysis

A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features and its Performance Analysis International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 11, Issue 09 (September 2015), PP.27-31 A Blind Steganalysis on JPEG Gray Level

More information

DRA AUDIO CODING STANDARD

DRA AUDIO CODING STANDARD Applied Mechanics and Materials Online: 2013-06-27 ISSN: 1662-7482, Vol. 330, pp 981-984 doi:10.4028/www.scientific.net/amm.330.981 2013 Trans Tech Publications, Switzerland DRA AUDIO CODING STANDARD Wenhua

More information

A NEW DCT-BASED WATERMARKING METHOD FOR COPYRIGHT PROTECTION OF DIGITAL AUDIO

A NEW DCT-BASED WATERMARKING METHOD FOR COPYRIGHT PROTECTION OF DIGITAL AUDIO International journal of computer science & information Technology (IJCSIT) Vol., No.5, October A NEW DCT-BASED WATERMARKING METHOD FOR COPYRIGHT PROTECTION OF DIGITAL AUDIO Pranab Kumar Dhar *, Mohammad

More information

Digital Image Steganography Techniques: Case Study. Karnataka, India.

Digital Image Steganography Techniques: Case Study. Karnataka, India. ISSN: 2320 8791 (Impact Factor: 1.479) Digital Image Steganography Techniques: Case Study Santosh Kumar.S 1, Archana.M 2 1 Department of Electronicsand Communication Engineering, Sri Venkateshwara College

More information

Adaptive Pixel Pair Matching Technique for Data Embedding

Adaptive Pixel Pair Matching Technique for Data Embedding Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 1, January 2014,

More information

Mpeg 1 layer 3 (mp3) general overview

Mpeg 1 layer 3 (mp3) general overview Mpeg 1 layer 3 (mp3) general overview 1 Digital Audio! CD Audio:! 16 bit encoding! 2 Channels (Stereo)! 44.1 khz sampling rate 2 * 44.1 khz * 16 bits = 1.41 Mb/s + Overhead (synchronization, error correction,

More information

A New Approach to Compressed Image Steganography Using Wavelet Transform

A New Approach to Compressed Image Steganography Using Wavelet Transform IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 5, Ver. III (Sep. Oct. 2015), PP 53-59 www.iosrjournals.org A New Approach to Compressed Image Steganography

More information

MPEG-4 General Audio Coding

MPEG-4 General Audio Coding MPEG-4 General Audio Coding Jürgen Herre Fraunhofer Institute for Integrated Circuits (IIS) Dr. Jürgen Herre, hrr@iis.fhg.de 1 General Audio Coding Solid state players, Internet audio, terrestrial and

More information

Defining Cost Functions for Adaptive Steganography at the Microscale

Defining Cost Functions for Adaptive Steganography at the Microscale Defining Cost Functions for Adaptive Steganography at the Microscale Kejiang Chen, Weiming Zhang, Hang Zhou, Nenghai Yu CAS Key Laboratory of Electromagnetic Space Information University of Science and

More information

Alpha-trimmed Image Estimation for JPEG Steganography Detection

Alpha-trimmed Image Estimation for JPEG Steganography Detection Proceedings of the 2009 IEEE International Conference on Systems, Man, and Cybernetics San Antonio, TX, USA - October 2009 Alpha-trimmed Image Estimation for JPEG Steganography Detection Mei-Ching Chen,

More information

Video Inter-frame Forgery Identification Based on Optical Flow Consistency

Video Inter-frame Forgery Identification Based on Optical Flow Consistency Sensors & Transducers 24 by IFSA Publishing, S. L. http://www.sensorsportal.com Video Inter-frame Forgery Identification Based on Optical Flow Consistency Qi Wang, Zhaohong Li, Zhenzhen Zhang, Qinglong

More information

VARIABLE RATE STEGANOGRAPHY IN DIGITAL IMAGES USING TWO, THREE AND FOUR NEIGHBOR PIXELS

VARIABLE RATE STEGANOGRAPHY IN DIGITAL IMAGES USING TWO, THREE AND FOUR NEIGHBOR PIXELS VARIABLE RATE STEGANOGRAPHY IN DIGITAL IMAGES USING TWO, THREE AND FOUR NEIGHBOR PIXELS Anita Pradhan Department of CSE, Sri Sivani College of Engineering, Srikakulam, Andhra Pradesh, India anita.pradhan15@gmail.com

More information

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur Module 9 AUDIO CODING Lesson 29 Transform and Filter banks Instructional Objectives At the end of this lesson, the students should be able to: 1. Define the three layers of MPEG-1 audio coding. 2. Define

More information

AUDIOVISUAL COMMUNICATION

AUDIOVISUAL COMMUNICATION AUDIOVISUAL COMMUNICATION Laboratory Session: Audio Processing and Coding The objective of this lab session is to get the students familiar with audio processing and coding, notably psychoacoustic analysis

More information

Implementation of Random Byte Hiding algorithm in Video Steganography

Implementation of Random Byte Hiding algorithm in Video Steganography Implementation of Random Byte Hiding algorithm in Video Steganography S.Aswath 1, K.Akshara 2, P.Pavithra 2, D.S.Abinaya 2 Asssisant Professor 1, Student 2 (IV Year) Department of Electronics and Communication

More information

Lecture 16 Perceptual Audio Coding

Lecture 16 Perceptual Audio Coding EECS 225D Audio Signal Processing in Humans and Machines Lecture 16 Perceptual Audio Coding 2012-3-14 Professor Nelson Morgan today s lecture by John Lazzaro www.icsi.berkeley.edu/eecs225d/spr12/ Hero

More information

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

New Results in Low Bit Rate Speech Coding and Bandwidth Extension Audio Engineering Society Convention Paper Presented at the 121st Convention 2006 October 5 8 San Francisco, CA, USA This convention paper has been reproduced from the author's advance manuscript, without

More information

CHAPTER 4 REVERSIBLE IMAGE WATERMARKING USING BIT PLANE CODING AND LIFTING WAVELET TRANSFORM

CHAPTER 4 REVERSIBLE IMAGE WATERMARKING USING BIT PLANE CODING AND LIFTING WAVELET TRANSFORM 74 CHAPTER 4 REVERSIBLE IMAGE WATERMARKING USING BIT PLANE CODING AND LIFTING WAVELET TRANSFORM Many data embedding methods use procedures that in which the original image is distorted by quite a small

More information

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 13 Audio Signal Processing 14/04/01 http://www.ee.unlv.edu/~b1morris/ee482/

More information

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects. Perceptual coding Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual encoders, however, have been designed for the compression of general

More information

Scalable Coding of Image Collections with Embedded Descriptors

Scalable Coding of Image Collections with Embedded Descriptors Scalable Coding of Image Collections with Embedded Descriptors N. Adami, A. Boschetti, R. Leonardi, P. Migliorati Department of Electronic for Automation, University of Brescia Via Branze, 38, Brescia,

More information

Image and Video Watermarking

Image and Video Watermarking Telecommunications Seminar WS 1998 Data Hiding, Digital Watermarking and Secure Communications Image and Video Watermarking Herbert Buchner University of Erlangen-Nuremberg 16.12.1998 Outline 1. Introduction:

More information

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python.

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python. Inception and Residual Networks Hantao Zhang Deep Learning with Python https://en.wikipedia.org/wiki/residual_neural_network Deep Neural Network Progress from Large Scale Visual Recognition Challenge (ILSVRC)

More information

A BTC-COMPRESSED DOMAIN INFORMATION HIDING METHOD BASED ON HISTOGRAM MODIFICATION AND VISUAL CRYPTOGRAPHY. Hang-Yu Fan and Zhe-Ming Lu

A BTC-COMPRESSED DOMAIN INFORMATION HIDING METHOD BASED ON HISTOGRAM MODIFICATION AND VISUAL CRYPTOGRAPHY. Hang-Yu Fan and Zhe-Ming Lu International Journal of Innovative Computing, Information and Control ICIC International c 2016 ISSN 1349-4198 Volume 12, Number 2, April 2016 pp. 395 405 A BTC-COMPRESSED DOMAIN INFORMATION HIDING METHOD

More information

Reversible Image Data Hiding with Local Adaptive Contrast Enhancement

Reversible Image Data Hiding with Local Adaptive Contrast Enhancement Reversible Image Data Hiding with Local Adaptive Contrast Enhancement Ruiqi Jiang, Weiming Zhang, Jiajia Xu, Nenghai Yu and Xiaocheng Hu Abstract Recently, a novel reversible data hiding scheme is proposed

More information

Random Image Embedded in Videos using LSB Insertion Algorithm

Random Image Embedded in Videos using LSB Insertion Algorithm Random Image Embedded in Videos using LSB Insertion Algorithm K.Parvathi Divya 1, K.Mahesh 2 Research Scholar 1, * Associate Professor 2 Department of Computer Science and Engg, Alagappa university, Karaikudi.

More information

Peak-Shaped-Based Steganographic Technique for MP3 Audio

Peak-Shaped-Based Steganographic Technique for MP3 Audio Journal of Information Security, 3, 4, -8 doi:.436/jis.3.43 Published Online January 3 (http://www.scirp.org/journal/jis) Peak-Shaped-Based Steganographic Technique for MP3 Audio Raffaele Pinardi, Fabio

More information

An Information Hiding Algorithm for HEVC Based on Angle Differences of Intra Prediction Mode

An Information Hiding Algorithm for HEVC Based on Angle Differences of Intra Prediction Mode An Information Hiding Algorithm for HEVC Based on Angle Differences of Intra Prediction Mode Jia-Ji Wang1, Rang-Ding Wang1*, Da-Wen Xu1, Wei Li1 CKC Software Lab, Ningbo University, Ningbo, Zhejiang 3152,

More information

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual coding Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual encoders, however, have been designed for the compression of general

More information

Authentication and Secret Message Transmission Technique Using Discrete Fourier Transformation

Authentication and Secret Message Transmission Technique Using Discrete Fourier Transformation , 2009, 5, 363-370 doi:10.4236/ijcns.2009.25040 Published Online August 2009 (http://www.scirp.org/journal/ijcns/). Authentication and Secret Message Transmission Technique Using Discrete Fourier Transformation

More information

arxiv: v1 [cs.mm] 2 Mar 2017

arxiv: v1 [cs.mm] 2 Mar 2017 This is a preprint of the paper: arxiv:1703.00817v1 [cs.mm] 2 Mar 2017 Daniel Lerch-Hostalot, David Megías, LSB matching steganalysis based on patterns of pixel differences and random embedding, Computers

More information

(JBE Vol. 23, No. 6, November 2018) Detection of Frame Deletion Using Convolutional Neural Network. Abstract

(JBE Vol. 23, No. 6, November 2018) Detection of Frame Deletion Using Convolutional Neural Network. Abstract (JBE Vol. 23, No. 6, November 2018) (Regular Paper) 23 6, 2018 11 (JBE Vol. 23, No. 6, November 2018) https://doi.org/10.5909/jbe.2018.23.6.886 ISSN 2287-9137 (Online) ISSN 1226-7953 (Print) CNN a), a),

More information

Optimizing Pixel Predictors for Steganalysis

Optimizing Pixel Predictors for Steganalysis Optimizing Pixel Predictors for Steganalysis Vojtěch Holub and Jessica Fridrich Dept. of Electrical and Computer Engineering SUNY Binghamton, New York IS&T / SPIE 2012, San Francisco, CA Steganography

More information

VoIP Forgery Detection

VoIP Forgery Detection VoIP Forgery Detection Satish Tummala, Yanxin Liu and Qingzhong Liu Department of Computer Science Sam Houston State University Huntsville, TX, USA Emails: sct137@shsu.edu; yanxin@shsu.edu; liu@shsu.edu

More information

EXPOSING THE DOUBLE COMPRESSION IN MP3 AUDIO BY FREQUENCY VIBRATION. Tianzhuo Wang, Xiangwei Kong, Yanqing Guo, Bo Wang

EXPOSING THE DOUBLE COMPRESSION IN MP3 AUDIO BY FREQUENCY VIBRATION. Tianzhuo Wang, Xiangwei Kong, Yanqing Guo, Bo Wang EXPOSIG THE DOUBLE COMPRESSIO I MP3 AUDIO BY FREQUECY VIBRATIO Tianzhuo Wang, Xiangwei Kong, Yanqing Guo, Bo Wang School of Information and Communication Engineering Dalian University of Technology, Dalian,

More information

Novel Lossy Compression Algorithms with Stacked Autoencoders

Novel Lossy Compression Algorithms with Stacked Autoencoders Novel Lossy Compression Algorithms with Stacked Autoencoders Anand Atreya and Daniel O Shea {aatreya, djoshea}@stanford.edu 11 December 2009 1. Introduction 1.1. Lossy compression Lossy compression is

More information

Audio Compression. Audio Compression. Absolute Threshold. CD quality audio:

Audio Compression. Audio Compression. Absolute Threshold. CD quality audio: Audio Compression Audio Compression CD quality audio: Sampling rate = 44 KHz, Quantization = 16 bits/sample Bit-rate = ~700 Kb/s (1.41 Mb/s if 2 channel stereo) Telephone-quality speech Sampling rate =

More information

Parametric Coding of High-Quality Audio

Parametric Coding of High-Quality Audio Parametric Coding of High-Quality Audio Prof. Dr. Gerald Schuller Fraunhofer IDMT & Ilmenau Technical University Ilmenau, Germany 1 Waveform vs Parametric Waveform Filter-bank approach Mainly exploits

More information

The MPEG-4 General Audio Coder

The MPEG-4 General Audio Coder The MPEG-4 General Audio Coder Bernhard Grill Fraunhofer Institute for Integrated Circuits (IIS) grl 6/98 page 1 Outline MPEG-2 Advanced Audio Coding (AAC) MPEG-4 Extensions: Perceptual Noise Substitution

More information

Hybrid Stegnography using ImagesVaried PVD+ LSB Detection Program

Hybrid Stegnography using ImagesVaried PVD+ LSB Detection Program www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 4 Issue 5 May 2015, Page No. 12086-12090 Hybrid Stegnography using ImagesVaried PVD+ LSB Detection Program Shruti

More information

A reversible data hiding based on adaptive prediction technique and histogram shifting

A reversible data hiding based on adaptive prediction technique and histogram shifting A reversible data hiding based on adaptive prediction technique and histogram shifting Rui Liu, Rongrong Ni, Yao Zhao Institute of Information Science Beijing Jiaotong University E-mail: rrni@bjtu.edu.cn

More information

Scalable Perceptual and Lossless Audio Coding based on MPEG-4 AAC

Scalable Perceptual and Lossless Audio Coding based on MPEG-4 AAC Scalable Perceptual and Lossless Audio Coding based on MPEG-4 AAC Ralf Geiger 1, Gerald Schuller 1, Jürgen Herre 2, Ralph Sperschneider 2, Thomas Sporer 1 1 Fraunhofer IIS AEMT, Ilmenau, Germany 2 Fraunhofer

More information

Simple Watermark for Stereo Audio Signals with Modulated High-Frequency Band Delay

Simple Watermark for Stereo Audio Signals with Modulated High-Frequency Band Delay ACOUSTICAL LETTER Simple Watermark for Stereo Audio Signals with Modulated High-Frequency Band Delay Kazuhiro Kondo and Kiyoshi Nakagawa Graduate School of Science and Engineering, Yamagata University,

More information

High Capacity Reversible Watermarking Scheme for 2D Vector Maps

High Capacity Reversible Watermarking Scheme for 2D Vector Maps Scheme for 2D Vector Maps 1 Information Management Department, China National Petroleum Corporation, Beijing, 100007, China E-mail: jxw@petrochina.com.cn Mei Feng Research Institute of Petroleum Exploration

More information

MPEG-4 aacplus - Audio coding for today s digital media world

MPEG-4 aacplus - Audio coding for today s digital media world MPEG-4 aacplus - Audio coding for today s digital media world Whitepaper by: Gerald Moser, Coding Technologies November 2005-1 - 1. Introduction Delivering high quality digital broadcast content to consumers

More information

QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose

QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose Department of Electrical and Computer Engineering University of California,

More information

H.264/AVC Video Watermarking Algorithm Against Recoding

H.264/AVC Video Watermarking Algorithm Against Recoding Sensors & Transducers 2014 by IFSA Publishing, S. L. http://www.sensorsportal.com H.264/AVC Video Watermarking Algorithm Against Recoding Rangding Wang, Qian Li, Lujian Hu, Dawen Xu College of Information

More information

Efficient Path Finding Method Based Evaluation Function in Large Scene Online Games and Its Application

Efficient Path Finding Method Based Evaluation Function in Large Scene Online Games and Its Application Journal of Information Hiding and Multimedia Signal Processing c 2017 ISSN 2073-4212 Ubiquitous International Volume 8, Number 3, May 2017 Efficient Path Finding Method Based Evaluation Function in Large

More information

Convolution Neural Networks for Chinese Handwriting Recognition

Convolution Neural Networks for Chinese Handwriting Recognition Convolution Neural Networks for Chinese Handwriting Recognition Xu Chen Stanford University 450 Serra Mall, Stanford, CA 94305 xchen91@stanford.edu Abstract Convolutional neural networks have been proven

More information

Information Cloaking Technique with Tree Based Similarity

Information Cloaking Technique with Tree Based Similarity Information Cloaking Technique with Tree Based Similarity C.Bharathipriya [1], K.Lakshminarayanan [2] 1 Final Year, Computer Science and Engineering, Mailam Engineering College, 2 Assistant Professor,

More information

AUDIOVISUAL COMMUNICATION

AUDIOVISUAL COMMUNICATION AUDIOVISUAL COMMUNICATION Laboratory Session: Audio Processing and Coding The objective of this lab session is to get the students familiar with audio processing and coding, notably psychoacoustic analysis

More information

Proceedings of Meetings on Acoustics

Proceedings of Meetings on Acoustics Proceedings of Meetings on Acoustics Volume 19, 213 http://acousticalsociety.org/ ICA 213 Montreal Montreal, Canada 2-7 June 213 Engineering Acoustics Session 2pEAb: Controlling Sound Quality 2pEAb1. Subjective

More information

Residual Networks And Attention Models. cs273b Recitation 11/11/2016. Anna Shcherbina

Residual Networks And Attention Models. cs273b Recitation 11/11/2016. Anna Shcherbina Residual Networks And Attention Models cs273b Recitation 11/11/2016 Anna Shcherbina Introduction to ResNets Introduced in 2015 by Microsoft Research Deep Residual Learning for Image Recognition (He, Zhang,

More information

METRIC LEARNING BASED DATA AUGMENTATION FOR ENVIRONMENTAL SOUND CLASSIFICATION

METRIC LEARNING BASED DATA AUGMENTATION FOR ENVIRONMENTAL SOUND CLASSIFICATION METRIC LEARNING BASED DATA AUGMENTATION FOR ENVIRONMENTAL SOUND CLASSIFICATION Rui Lu 1, Zhiyao Duan 2, Changshui Zhang 1 1 Department of Automation, Tsinghua University 2 Department of Electrical and

More information

DAB. Digital Audio Broadcasting

DAB. Digital Audio Broadcasting DAB Digital Audio Broadcasting DAB history DAB has been under development since 1981 at the Institut für Rundfunktechnik (IRT). In 1985 the first DAB demonstrations were held at the WARC-ORB in Geneva

More information

CHAPTER 3. Digital Carriers of Steganography

CHAPTER 3. Digital Carriers of Steganography CHAPTER 3 Digital Carriers of Steganography 3.1 Introduction It has been observed that all digital file formats can be used as digital carrier for steganography, but the formats those are with a high degree

More information

Inception Network Overview. David White CS793

Inception Network Overview. David White CS793 Inception Network Overview David White CS793 So, Leonardo DiCaprio dreams about dreaming... https://m.media-amazon.com/images/m/mv5bmjaxmzy3njcxnf5bml5banbnxkftztcwnti5otm0mw@@._v1_sy1000_cr0,0,675,1 000_AL_.jpg

More information

Deep residual learning for image steganalysis

Deep residual learning for image steganalysis DOI 10.1007/s11042-017-4440-4 Deep residual learning for image steganalysis Songtao Wu 1 Shenghua Zhong 1 Yan Liu 2 Received: 13 December 2016 / Revised: 13 January 2017 / Accepted: 23 January 2017 Springer

More information

Image Error Concealment Based on Watermarking

Image Error Concealment Based on Watermarking Image Error Concealment Based on Watermarking Shinfeng D. Lin, Shih-Chieh Shie and Jie-Wei Chen Department of Computer Science and Information Engineering,National Dong Hwa Universuty, Hualien, Taiwan,

More information

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF INFORMATION TECHNOLOGY ACADEMIC YEAR / ODD SEMESTER QUESTION BANK

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF INFORMATION TECHNOLOGY ACADEMIC YEAR / ODD SEMESTER QUESTION BANK KINGS COLLEGE OF ENGINEERING DEPARTMENT OF INFORMATION TECHNOLOGY ACADEMIC YEAR 2011-2012 / ODD SEMESTER QUESTION BANK SUB.CODE / NAME YEAR / SEM : IT1301 INFORMATION CODING TECHNIQUES : III / V UNIT -

More information

STEGANALYSIS OF STEGOSTORAGE SYSTEM

STEGANALYSIS OF STEGOSTORAGE SYSTEM t m Mathematical Publications DOI: 10.1515/tmmp-2015-0049 Tatra Mt. Math. Publ. 64 (2015), 205 215 STEGANALYSIS OF STEGOSTORAGE SYSTEM Michala Gulášová Matúš Jókay ABSTRACT. The aim of this contribution

More information

LETTER Improvement of JPEG Compression Efficiency Using Information Hiding and Image Restoration

LETTER Improvement of JPEG Compression Efficiency Using Information Hiding and Image Restoration IEICE TRANS. INF. & SYST., VOL.E96 D, NO.5 MAY 2013 1233 LETTER Improvement of JPEG Compression Efficiency Using Information Hiding and Image Restoration Kazumi YAMAWAKI, Member, Fumiya NAKANO, Student

More information

MPEG-1. Overview of MPEG-1 1 Standard. Introduction to perceptual and entropy codings

MPEG-1. Overview of MPEG-1 1 Standard. Introduction to perceptual and entropy codings MPEG-1 Overview of MPEG-1 1 Standard Introduction to perceptual and entropy codings Contents History Psychoacoustics and perceptual coding Entropy coding MPEG-1 Layer I/II Layer III (MP3) Comparison and

More information

Modern Steganalysis Can Detect YASS

Modern Steganalysis Can Detect YASS Jan Kodovský, Tomáš Pevný, Jessica Fridrich January 18, 2010 / SPIE 1 / 13 YASS Curriculum Vitae Birth Location: University of California, Santa Barbara Birth Date: More than 2 years ago [Solanki-2007],

More information

Secret Communication through Audio for Defense Application

Secret Communication through Audio for Defense Application Secret Communication through Audio for Defense Application A.Nageshwar Rao Maduguri Sudhir R.Venkatesh Abstract: A steganographic method of embedding textual information in an audio file is presented in

More information

Face Recognition Using Vector Quantization Histogram and Support Vector Machine Classifier Rong-sheng LI, Fei-fei LEE *, Yan YAN and Qiu CHEN

Face Recognition Using Vector Quantization Histogram and Support Vector Machine Classifier Rong-sheng LI, Fei-fei LEE *, Yan YAN and Qiu CHEN 2016 International Conference on Artificial Intelligence: Techniques and Applications (AITA 2016) ISBN: 978-1-60595-389-2 Face Recognition Using Vector Quantization Histogram and Support Vector Machine

More information

A Review of Approaches for Steganography

A Review of Approaches for Steganography International Journal of Computer Science and Engineering Open Access Review Paper Volume-2, Issue-5 E-ISSN: 2347-2693 A Review of Approaches for Steganography Komal Arora 1* and Geetanjali Gandhi 2 1*,2

More information

Machine Learning 13. week

Machine Learning 13. week Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of

More information

Block Mean Value Based Image Perceptual Hashing for Content Identification

Block Mean Value Based Image Perceptual Hashing for Content Identification Block Mean Value Based Image Perceptual Hashing for Content Identification Abstract. Image perceptual hashing has been proposed to identify or authenticate image contents in a robust way against distortions

More information

Text Hiding In Multimedia By Huffman Encoding Algorithm Using Steganography

Text Hiding In Multimedia By Huffman Encoding Algorithm Using Steganography Text Hiding In Multimedia By Huffman Encoding Algorithm Using Steganography Madhavi V.Kale 1, Prof. Swati A.Patil 2 PG Student, Dept. Of CSE., G.H.Raisoni Institute Of Engineering And Management,Jalgaon

More information