Available online at www.sciencedirect.com Procedia Engineering 30 (2012) 703 710 International Conference on Communication Technology and System Design 2011 LSB Based Audio Steganography Based On Text Compression M.Baritha Begum a,y.venkataramani b, a* a Saranathan College of Engineering Trichy 620012,India b Saranathan College of Engineering, Trichy, 620012, Inidia Abstract Compression algorithm is what reduces the redundancy of data representation and decreases the data storage capacity. Data compression plays a vital role in reducing the communication cost making use of available bandwidth. The compressed data from the security aspect is transmitted through internet. It is, however very much vulnerable to a multitude of attacks. To propose a new dictionary based text compression technique for ASCII texts for the purpose of obtaining good performance on various document sizes. Dictionary based compression bits are hidden into the Lsb bit of audio signals and to calculate the signal to noise ratio (SNR). This audio Steganography is conducted for various compression algorithms with dictionary based compression. Audio Steganography based dictionary compression achieves better value of signal to noise ratio (SNR). 2011 Published by Elsevier Ltd. Selection and/or peer-review under responsibility of ICCTSD 2011 Open access under CC BY-NC-ND license. Keywords: Data compression; Dictionary Based Encoding (DBE); Lossless;Audio steganography;least significant bit(lsb). 1. Introduction Compression is the combination of two components. One is encoding algorithm, another one is decoding algorithm. In encoding algorithm makes the message as compressed representation. In decoding algorithm reconstructs the message from compressed representation to original message or it reconstructs some approximation. Compression algorithms are classified into two categories lossless algorithms reconstruct original message from compressed message. Lossless compression is used for text; loss compression is used for images and sound. [1, 3] Text compression is one approach to increase the performance of text compression. Input text can be changed a highly redundant text by using pre-defined highly redundant codes instead of words or phrases. This high redundant text will increase the performance of the text compression algorithm. The already existing arithmetic coding, * Baritha Begum. Tel.: +919443677672; E-mail address: baritha_m@yahoo.com. 1877-7058 2011 Published by Elsevier Ltd. doi:10.1016/j.proeng.2012.01.917 Open access under CC BY-NC-ND license.
704 M. Baritha Begum and Y.Venkataramani / Procedia Engineering 30 (2012) 703 710 Huffman coding, LZ algorithm, PPMC, RLE cannot give better compression ratios. [4, 5] Better compression ratio is achieved by using dictionary based compression. Steganography, from the Greek, means covered or secret writing, and is a long-practiced form of hiding information. Although related to cryptography, they are not the same. Steganography intent is to hide the existence of the message, while cryptography scrambles a message so that it cannot be understood. More precisely, the goal of Steganography is to hide messages inside other harmless messages in a way that does not allow any enemy to even detect that there is a second secret message present. Steganography includes a vast array of techniques for hiding messages in a variety of media. Among these methods are invisible inks, microdots, digital signatures, covert channels and spread-spectrum communications. Today, thanks to modern technology, steganography is used on text, images, sound, signals, and more. Cover is an audio, image, video so on which is used to hide the original message. The cover signal used in the system of steganography is called the host signals. Information hidden in cover data is called embedded data. [6, 7] There is no necessary to encrypt the hidden message.but it depends on the security of the system, the design of the complete knowledge of it. The advantage of steganography is that it can be used to secretly transmit messages without the fact of the transmission being discovered. Often, using encryption might identify the sender or receiver as somebody with something to hide. [8] 1.1. Background Of Text Compression And Steganography Lossless compression researchers have developed highly sophisticated approaches, such as Huffman encoding, arithmetic encoding, the Lempel-Ziv (LZ) family, Dynamic Markov Compression (DMC), Prediction by Partial Matching (PPM),Run length coding(rle) and Burrows-Wheeler Transform (BWT) based algorithms.[5,12,14] However, none of these methods has been able to reach the theoretical best-case compression ratio consistently. Dictionary Based Encoding (DBE) approach for trying to attain better compression ratios is to develop new compression algorithms. In order to increase the secrecy of the text message compressed by dictionary based compression, it is hidden in the audio file. If the text message is hidden using stenographic system, it may be detected by attackers. To avoid this, the input message may be converted into highly redundant code and then hidden.this method will help maintain secrecy. 2. Dictionary making algorithms 1. Calgary corpus files are taken as test text files. 2. Words are collected from all text files. This is 6, 18,108 number of words. 3. In this word, letters in uppercase are converted into letters in lowercase. 4. To form the dictionary, words are listed in descending order after finding how many times each word occurs. 5. 8900 words have been listed in the latest dictionary 6. For the first 169 words, single ASCII character is assigned as code. 7. For the words from 170 to 4300, single 169ASCII character with each uppercase letter is assigned as double codes. 8. For the remaining words, with each upper case letter previous two character combination is coded. Hiding the compressed text in audio will enhance the security. Compared to other text compression algorithm dictionary based audio steganography system gives better value.
M. Baritha Begum and Y.Venkataramani / Procedia Engineering 30 (2012) 703 710 705 2.1. LSB Insertion method 1. Audio file is converted into the data samples. 2. First 40 bytes are allocated for header part. 3. The compressed text message is converted as binary. 4. The length of text message is also converted as binary. 5. The identifier is selected to hide the text message. 6. An identifier helps in the recovery of text. 7. If there is no identifier in audio file, audio file no hidden text message. 8. The identifier s binary is 10101010. 9. Identifier can be hidden in 8 data samples. 10. The next 10 data samples will serve as the length of text message. 11. The next 10 data samples will be as the width of text message. 12. The compressed text message in the remaining data samples lsb is to be hidden. 2.2. Data Extraction Process 1. Text can be recovered in a reverse way of how the text is hidden. 2. Now check the received audio file whether identifier present or not. 3. Without identifier, there can be no hidden text in data samples. 4. Both the length and width of the text message from the data samples lsb are to be measured. 5. The lsb bit of data samples should be taken until the length of the message is received 6. Then the message in the lsb bit is to be converted into text. Decoding Algorithm The decoding is easier than the encoding. Upper case letters followed by single ASCII character is identified as a code. If upper case letters are followed by two ASCII characters, the second ASII character is identified as separate code. Extracted code is compared with dictionary table and corresponding words are collected in the output file. This output file after processing looks the same as the initial document since the compression and decompression is lossless. 3. Performance Analysis We made experiments on the transformation algorithms mentioned in section 2 using standard Calgary Corpus [15] text file collections.
706 M. Baritha Begum and Y.Venkataramani / Procedia Engineering 30 (2012) 703 710 Table1. List of files used in experiments File name Size(byte) Description Bib 111,261 Bibliography Geo 102,400 Geological seismic data Obj1 21,504 VAX object program paper1 53,161 Technical Paper Paper2 82,199 Technical Paper Paper3 46,526 Technical Paper Paper4 13,286 Technical Paper Paper5 11,954 Technical Paper Paper6 38105 Technical Paper Progc 39,611 Source Code in C Progl 71,646 Source Code in Pascal Progp 49,379 Text: English Text The performance issue such as compression ratio and Bits per Character (BPC) are compared for the five cases i.e., simple Arithmetic coding, Huffman with BWT, LZSS with BWT and Dictionary based Encoding (DBE) The results are shown graphically and prove that DBE out performs all other techniques in compression ratio, Bits per Character (BPC). Output file size Compression ratio = ----------------------------------- Input file size Output file size Bits per character (BPC) = -----------------------* 8 Input file size
M. Baritha Begum and Y.Venkataramani / Procedia Engineering 30 (2012) 703 710 707 Table.2: BPC comparison of transform Arithmetic coding, Huffman with BWT, LZSS with BWT and Dictionary based Encoding (DBE) for Calgary corpus files. File Name Arithmetic coding Huffman BWT LZSS BWT dictionary based compression Bib 5.232 3.656 5.016 2.224 Geo 5.656 5.8 6.304 4.56 Obj1 5.968 4.768 5.288 1.856 paper1 4.984 3.616 4.976 2.256 Paper2 4.624 3.68 5.136 2.256 Paper3 4.712 3.856 5.336 2.2 Paper4 4.824 4.064 5.376 2.212 Paper5 5.064 4.056 5.256 2.48 Paper6 5.008 3.632 4.952 2.408 Progc 5.24 3.504 4.728 2.288 Progl 4.76 2.68 3.648 1.896 Progp 4.896 2.76 3.688 1.392 Fig.1: BPC comparison of transform Arithmetic coding, Huffman with BWT, LZSS with BWT, Dictionary based Encoding (DBE) for Calgary corpus files Comparision of Bits Per Character BPC 7 6 5 4 3 2 1 0 Arithmetic coding Huffman BWT LZSS BWT dictionary based compression bib geo Obj1 paper1 Paper2 Paper3 Paper4 Paper5 Paper6 progc progl progp File Name Fig.2 compression ratio comparison of transform Arithmetic coding, Huffman with BWT, LZSS with BWT and Dictionary based Encoding (DBE) for Calgary corpus files Comparision of compression ratio Comparision ratio 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 BIB GEO OBJ1 PAPER1 PAPER2 PAPER3 PAPER4 PAPER5 PAPER6 File Name PROGC PROGL PROGP arithmetic coding Huffman + BWT LZSS + BWT Dictionary based compression
708 M. Baritha Begum and Y.Venkataramani / Procedia Engineering 30 (2012) 703 710 Table.3 Comparison of Compression Ratio File Name arithmetic coding Huffman + BWT LZSS + BWT Dictionary based compression BIB 0.654 0.457 0.627 0.278 GEO 0.707 0.725 0.788 0.57 OBJ1 0.746 0.596 0.661 0.232 PAPER1 0.623 0.452 0.622 0.282 PAPER2 0.578 0.46 0.642 0.282 PAPER3 0.589 0.482 0.667 0.275 PAPER4 0.603 0.508 0.672 0.2765 PAPER5 0.633 0.507 0.657 0.31 PAPER6 0.626 0.454 0.619 0.301 PROGC 0.655 0.438 0.591 0.286 PROGL 0.595 0.335 0.456 0.237 PROGP 0.612 0.345 0.461 0.174 Example, a section of text from Calgary corpus paper 1 looks like this in the original text: Its performance is optimal without the need for blocking of input data. Its performance is optimal without the need for blocking of input data. Its performance is optimal without the need for blocking of input data. It encourages a clear separation between the model for representing data and the encoding of information with respect to that model. It accommodates adaptive models easily. It is computationally efficient. Number of characters required=420 Running this text through the dictionary based encoder yields the following text: C/(KÊBÕ!B:,BLÍ"B@C/(KÊBÕ!B:,BLÍ"B@C/(KÊBÕ!B:,BLÍ"B@AVk%CåOÂC-!»,K{A.&!Có" <K~$2DEËA BÅ(DEÊADw Number of characters required=94 Hiding the compressed text in audio will enhance the security. Compared to other text compression algorithm dictionary based audio steganography system gives better SNR (signal to noise Ratio) value. SNR=10 log 10{ n X 2 (n)/ n [X 2 (n)-y 2 (n)]} X (n) =Represents a sample of input audio sequence. Y (n) =Stands for a sample of audio with modified LSB.
M. Baritha Begum and Y.Venkataramani / Procedia Engineering 30 (2012) 703 710 709 Table.4 Comparison of SNR Cover file name Arithmetic coding Huffman coding Dictionary based compression News2 58.3957 59.9772 64.19 Notify 59.5341 74.4633 75.1404 Tada 58.5864 60.2545 64.9636 Windows XP windows 62.2428 63.1862 68.5777 Fig 3.Comparison of SNR Comparision of SNR SNR 80 70 60 50 40 30 20 10 0 News2 notify Tada Windows XP windows Audio file name arithmetic coding Huffman coding Dictionary based compression 3. CONCLUSION This paper proposes a method of text transformation using Dictionary based encoding and audio steganography. In a channel, the reduction of transmission time is directly proportional to the amount of compression. If the input text is replaced by variable length codes with its length less than its average size, the size of input text can be reduced by using dictionary based compression. This proposed compression algorithm achieves good compression ratio, reduces bits per character. This audio Steganography is conducted for various compression algorithms with dictionary based compression. Audio Steganography based text compression achieves better SNR value. 4. REFERENCE [1].G.Hold and T.R Marshall, Data compression, John Wiley, New York 1991. [2]. Jirapond Tadrat and Veera Boonjing, 2008 An Experiment study on Transformation for Compression using stop lists and Frequent words IEEE Transactions on information technology. [3].Data compression: the complete reference By David Salomon [4].A.carus, A.Mesut, 2010, Fast text compression using Multiplies dictionaries, Information technology journal 9(5) 1013-1021. [5]. M. Burrows and D. J. Wheeler. A Block-sorting Lossless Data Compression Algorithm, SRC Research Report 124, Digital Systems Research Center [6].Mohammed Pooyan,Ahmed Delforouzi.2007, LSB based steganography method based on lifting WaveletTransform,IEEE international symposium on signal processing and information technology. [7].R.Sridevi,DR.A.Damodaram,dr.SVL.Narasimham,2009, Efficient method of audio steganography by modified LSB algorithm and strong encryption key with enhanced security. [8].F.A.P.Petitcolas,R.J.Anderson,and M.G.Khun, Information Hiding A survey,proc.ieee,vol.87.7,1999,pp.1062-1078. [9]. J.L. Bentley, D.D. Sleator, R.E. Tarjan, and V.K. Wei, A Locally Adaptive Data Compression Scheme, Proc. 22nd Allerton Conf. On Communication, Control, and Computing, pp. 233-242, Monticello, IL, October 1984, University of Illinois [10]. J.L. Bentley, D.D. Sleator, R.E. Tarjan, and V.K. Wei, A Locally Adaptive Data Compression Scheme, Commun. Ass. Comp. Mach., 29:pp. 233-242, April 1986. [11]. R.G. Gallager. Variations on a theme by Huffman, IEEE Trans. Information Theory, IT-24(6), pp.668-674, Nov, 1978 [12]. D.A.Huffman. A Method for the Construction of Minimum Redundancy Codes, Proc. IRE, 40(9), pp.1098-1101, 1952
710 M. Baritha Begum and Y.Venkataramani / Procedia Engineering 30 (2012) 703 710 [13].Nelson C. Francisco, Nuno M. M. Rodrigues, Eduardo A. B. da Silva, Murilo Bresciani de Carvalho, Sergio M. M. de Faria,, October 2010 Scanned Compound Document Encoding Using Multiscale Recurrent Patterns IEEE transactions on image processing, vol. 19, no. 10. [14].Umesh S. Bhadade Prof. A.I. Trivedi, January 2011 Lossless Text Compression using Dictionaries, International Journal of Computer Applications (0975 8887) Volume 13 No.8. [15]. corpus.canterbury.ac.nz/