2015 International Symposium on Technology Management and Emerging Technologies (ISTMET), August 25-27, 2015, Langkawi, Kedah, Malaysia On using Emoticons and Lingoes for Hiding Data in SMS Vahab Iranmanesh* 1, Ho Jing Wei 1, Sean Lee Dao-Ming 1, Olasimbo Ayodeji Arigbabu 1 Department of Computing and Information Systems Faculty of Science and Technology Sunway University Bandar Sunway, Selangor, Malaysia *vahab.iranmanesh@gmail.com, {12074795,12078614}@imail.sunway.eud.my, oa.arigbabu@gmail.com Abstract SMS is widely used as a daily communication service among people around the world. As such, SMS is a suitable means that can be readily exploited for transferring secret messages between individuals in a less conspicuous way. In this paper, we present an investigation on the suitability of hiding secret message in SMS. The hidden data are represented as lingoes and emoticons which are frequently used by users in SMS and chat. As a result, a pre-shared list, which is common between the sender and receiver is used to embed, and later extract the secret message in SMS in a natural form. The proposed method is implemented and tested on Android based mobile device, and showed some desirable outcomes. Keywords - Data hiding; SMS steganography, Emoticons; lingoes I. INTRODUCTION Ever since the invention of the first mobile phone in 1973, mobile phone has become a universal means of communication among individuals, and it is one of the most utilized technologies on a daily basis. In fact, almost every individual around the globe owns at least one mobile phone. Over the years, mobile phones have evolved from being just an accessory, to becoming a necessity in most people s lives. Probably one of the easiest and most widely used methods of communication present in mobile phones is a short message service (SMS). By definition, SMS is the transfer and exchange of short text messages using 160 characters [1] between two mobile phones. It has a short time of transferring messages and also low cost [2]. However, one of the main issues with the use of mobile phones is the security and privacy of the information, since it is not a wired communication. Mobile phone spying is a well known crime and interception or recording of messages between two mobile phones is one of these methods commonly used [3]. Another example is phone cloning that also allows an attacker to intercept incoming messages and send outgoing messages as though the phone is the original one [4]. Due to the susceptibility of mobile phones to these kinds of intrusions, several solutions such as cryptography and steganography are needed to ensure information is not trivially compromised. Steganography is a method used for hiding data within other data, and has gained prominence in recent years. In steganography, the main goal is to hide data within a cover media, such that, only the sender and receiver can understand the hidden information. The cover medium refers to the media in which the data are embedded and can be in the form of text, images, or audio files [5]. Moreover, one of the main benefits of using steganography over cryptography is that, cryptography is obvious to the observers that there is a hidden message, whereas in steganography, the secret message is hidden in the cover media, to make it look inconspicuous and only the receiver would have the right key to extract the secret message [6]. As a matter of fact, text is the most difficult media due to less redundant information in the file [7]. Any changes in the media will affect the structure of text and eventually lead to retrieval of the secret message. II. RELATED WORKS Unlike other steganography approaches, studies on data hiding in short message service (SMS) are quite limited. The first investigation on SMS steganography was reported by Shirali-Shahreza [8], which utilized image as cover media and SMS as carrier to transfer the hidden message to the recipient. In this method, a black and white (B&W) image is used to transfer the hidden message by changing the intensity of pixels. However, low capacity (27 bytes) is the main drawback of the mentioned technique, due to the use of only black and white image rather than using full colour. The study presented in [9] exploited abbreviation or full form of words such as "u" for the meaning of "you", and "univ" for "university" to hide the secret message in the SMS. For example, to hide 01, the abbreviation form of the word (u) is used to embed value 0, whereas the full form (you) is utilized for embedding value 1. The main limitation of this method is very low payload capacity and easy extraction. Further, the static form of the word abbreviation list is removed by introducing computationally light weighted exclusive or (xor) encryption in [10]. The method has the capability to swap the abbreviation and full form words in the statistic table in conjunction to the stego key as an additional 978-1-4799-1723-5/15/$31.00 2015 IEEE 103
security layer, which makes the data difficult to retrieve by an attacker. In [11], the image is used to hide software activation code based on the steganography technique proposed in [8] and then send the image to the recipient through SMS. The new method didn't show any significant improvements to their previous method rather sending activation code. The use of Sudoku game is another interesting technique proposed by Shirali-Shahreza [12], where the recipient receives an unsolved Sudoku puzzle to solve in order to retrieve the hidden data from the puzzle. In this case, the arrangement of numbers from 1 to 9 on the puzzle is exploited as features to embed the secret message, and later, two key values as the location of embedding row and column are also sent to the receiver along with the SMS to extract the hidden data. Since the embedding locations can only be 2 places (2 cells) in Sudoku puzzle with 9 bits per cell, it can embed very short messages such as 18 bits (9 bits per cell). As it is suggested by the same author, expanding Sudoku puzzles in bigger size such as 25 25 instead of 9 9 cells, the more secret message can be embedded. Using Least Significant Bit (LSB) for hiding data on the colour image for the bank customers is introduced in [13]. Since human eyes cannot distinguish the slight changes in blue colour, two bit value of blue pixel is used for coding one byte of data. Moreover, the customers of banks are required to use a particular mobile banking application to receive SMS for downloading the image file that contains hidden data from the bank website. Then, the decoder goes through the downloaded image to extract the hidden data based on LSB method. For having additional security, a password function is developed before reading the image file in mobile application. However, the main limitation of the mentioned technique is using the LSB method in time domain which can be altered through channels such as filtering. However, high payload capacity is an advantage of the this technique, since several pixels within the colour image are used for hiding the data. Bhaya [14] proposed using proportional fonts to hide sensitive data which cannot be easily recognized by human eyes. In this case, the hidden data are converted to a string of 1s and 0s. Then these binary values are represented by proportional font with one bit encoding in one character. Low payload capacity is the limitation of this technique, as one character is used to embed one bit of hidden data. The application of frequently used emoticons has been recently introduced as cover media [15]. To achieve this, a pre-shared list is specified to map several emoticons to the alphanumeric characters. For example, :-) refers to character A and :-( to character B, and so on. High capacity is one of the advantages of this technique due to the use of one character to replace with one character of hidden data. The emoticons are not only used in SMS steganography, but also used in chat to hide the data. An example of such approach is presented in [16], whereby emoticons are categorized into several classes based on the feelings expressed by each emoticon such as cry and happy classes. Then in each class, emoticons are ordered and located at different locations to embed end extract the secret message. For instance, to embed the bit string like 1110 which is equivalent to 14 in decimal format, a category is selected which has emoticon at location 14. Thus, emoticon 14 from that particular category is selected and sent to the recipient. To extract the hidden data, the same emoticon categories are used with respect to the location of the sent emoticon which is located in 14 and then the binary value (1110) is extracted. However, in this method, to embed higher values such as 65 (00111000), we need to use more than one emoticons (two emoticons) due to limitation on the number of emoticons in each category. Despite the numerous techniques that have been proposed, no work to the best of our knowledge has suggested lingoes as feelings expression, which is generally accepted and used in SMS and chat environment, in order to hide the secret message. The aim of this research is to study on how lingoes in conjunction with the other feelings expression such as emoticons can be used to embed and extract the hidden data in a natural way within the SMS between two parties. The result shows that the current method can embed the secret message within the SMS naturally without raising the attention of eavesdropper. The rest of the paper is organized as follows: Section III describes the proposed method. Section IV draws the discussion and conclusion. III. PROPOSED METHOD The illustration of the proposed method for SMS steganography is depicted in fig 1, which is also utilized similar architecture as those described in [15,16]. In this approach, the emoticons and lingoes (abbreviation of words) that are frequently typed in chat and SMS, are used as cover media to deliver the data in a hidden manner. The recipient is able to decode the data based on the meaning of emoticons and lingoes in the pre-shared list. The encoding is based on one character per emoticon or lingo. Fig. 1. Proposed method The emoticon is a sort of icons that depicts a user s feeling in text mode as shown in fig 2. These icons are widely used, especially in SMS and chat where there is a limitation on the number of characters. Also, abbreviations are also used as lingo to transfer the meaning with less number of characters such as "gr8" for great. 104
(160 characters) and cannot be fitted within one SMS, the rest of the secret message can be transferred to the second SMS and so on. Finally, the stego text is passed to the recipient through SMS. Fig. 2. Several SMS emoticons A. Embedding The first step in the proposed technique involves suggesting a pre-shared list that contains lingoes and emoticons along with their alphanumeric meanings, as shown in Table I. This list is commonly shared between the sender and recipient to encode and decode the secret message in the SMS. This pre-shared list is developed as an application in the Android operating system for embedding and extracting purpose. In the next step, the sender needs to embed the secret message within several steps. Figure 3 shows the main windows of the application that contains embedding and extraction parts developed using Java programming language. The user is required to select the embedding function first, since the interest is on in concealing the data. Then, as shown in fig 4, the application requests for the secret message, (for example "attack"), to be transmitted to the recipient. Further, by using Table 1, the secret message characters (attack) can be converted to their respective characters such as " ;) tbh tbh ;) -.- plox " which is shown in fig 5. Therefore, this conversion method is used as a pre-share list (key) which is exchanged through a different channel rather than SMS in order to embed and extract the secret message. In addition, each derived lingo and emoticon can be placed anywhere within the cover text (beginning, middle or at the end of each sentence or even a word) depending on the meaning of the sentence or word preceding or succeeding the lingo or emoticon. Then, the user is required to type a cover text in conjunction with the generated lingoes and emoticons as shown in fig 5. Since the emoticons and lingoes are used for hiding the data which represents human feelings, we would like to emphasize that selection of cover text should be done based on the feelings that are related to the generated lingoes and emoticons. This is important to make the stego text appear as normal as possible to the eavesdropper. For example, ;) is the first generated character which is one of the emoticons, meaning wink. In this case, the sender must pay attention to this, to type a sentence with a non-contradicting feelings to the mentioned emoticon such as angry. Similarly, it will be preferable for the remaining cover text sentences be equivalent to the corresponding generated lingoes and emoticons. In rare cases, if the length of secret message is longer than normal Fig. 3. Main window of SMS steganography application TABLE I. PRE-SHARED LIST Alphanumeric Lingoes and emoticons a ;) b ;( c -.- d =) e =( f :) g :( h =/ i =\ j QAQ K plox l =.= m ToT n == o._. p xd q OTL r afk s brb t tbh u tldr v lmao w wew X ttyl y wp z iirc 1 gg 2 :] 3 :[ 4 ikr 5 rofl 6 >.< 7 asap 8 -_- 9 =] 0 =[ 105
Fig. 4. Entering secret message window Fig. 6. Extracting windoww Fig. 5. Embedding window Fig. 7. Extracted secret message B. Extracting Similarly, in the recipient side, the user is required to use the same mobile application to extract the secret message from the SMS. To this end, the mobile application uses the same pre-shared list to extract the secret message from the stego SMS. Since the words are not important in stego SMS, the mobile application searches through entire SMS in order to extract the emoticons and lingoes. Later on, these emoticons and lingoes are represented the secret message characters using Table 1 as shown in fig 6 and 7. For example, the first character, apart from words, is " ;) " which is converted to the character " a ". In the same way, the rest of other emoticons and lingoes of receiving SMS can be converted in the same way using Table 1 to complete the secret message (attack). IV. DISCUSSION AND CONCLUSION This paper suggested a method for hiding data using lingoes and emoticons in SMS. Based on the fact that, mobile phone is now a widely used mode of communication by the larger population, the described method is usable anywhere and at any time, withoutt being conspicuous to an eavesdropper. Hiding data in SMS can be performed using different approaches. Several researchers have used image to hide secret messages [8,11,13]. The main advantage of this method is high capacity due to using pixel intensity to embed the secret message. However, any modification of the image during the transition, such as filtering, can affect on the result. Also, use of word abbreviations and full form have been attempted to represent the secret message in SMS [9,10]. Since the embedding is based on bits, several abbreviated forms of the words are required to transmit the hidden data. In this case, it would raise the attention of the attacker to the existence of secret message in the SMS. Due to SMS being used a mutual communication between sender and receiver, in [12], Sudoku puzzle is used to embed the secret message within locations of puzzle cells. In another approach, human feelings are represented as emoticons in SMS and chat [15,16] to transfer the secret message. Moreover, using only 106
emoticons to express users feelings continuously in SMS would generate a pattern to identify the existence of hidden message. However, delivering more data is the principal advantage of this method due to the need of representing one character of secret message with one emoticon in the SMS. However, in this study, lingoes are suggested as additional feelings apart from emoticons to make the cover media becomes more realistic rather than using only word and emoticons. In other words, the stego media (SMS) is a combination of word, emoticons and lingoes which is very similar to its natural form as shown in fig 6. Since lingoes introduce the chance to express more feelings on SMS, a user has more flexibility to select proper sentences to deliver the secret message by his feelings, based on emoticons and lingoes. Although the proposed is quite simplistic, it works quite effectively based on the experiments we have conducted. Nevertheless, we would like to point out the limitations of this work. The first one is that, some lingoes with more than one character such as "plox" occupy several characters of cover media, which results in embedding smaller hidden data due to the limitation of characters per SMS (160 characters). The second is using fixed pre-shared list (key) which can be disclosed or guessed by the attacker in order to extract the secret. In future works, the suggested method can be expanded to dual steganography, which takes encryption into account by encoding (cryptography) secret message before embedding for better security. In addition, the pre-shared list containing the alphanumeric characters can be shuffled from time to time to generate a new private key, to be used between the sender and receiver, during embedding and extraction of hidden data. This creates some difficulties for the eavesdropper to understand the secret message. REFERENCES [1] Digital cellular telecommunications system, Global System for Mobile communications (GSM) ETSI TS 100 901, GSM 03.40 version 7.4.0, 1998. [2] Technical realization of the Short Message Service (SMS) ETSI, 2000. [3] M. Kashif, " Secure SMS communication using encryption gateway and digital signature ", International Conference on Computational Science and Engineering (CSE), pp. 1430 1434, 2014. [4] J. Singh, R. Ruhl, and D. Lindskog, " Secure GSM OTA SIM cloning attack and cloning resistance in EAP- SIM and USIM ", International Conference on Social Computing (SocialCom), pp. 1005 1010, 2013. [5] S. Bhattacharyaa, I. Banerjee, and G. Sanyal, "A survey of steganography and steganalysis technique in image, text, audio and video as cover carrier", Journal of global research in computer science, 2(4), 2011. [6] I. V. S. Manjo, "Cryptography and steganography", International journal of computer applications, 1(12), 2010. [7] M. Agarwal, "Text steganography approaches: a comparison", International journal of network security and its application, 5(1), 2013. [8] M. Shirali-Shahreza, "Stealth steganography in SMS", Proceedings of the third IEEE and IFIP International conference on wireless and optical communications networks (WOCN), April, 2006. [9] M. Shirali-Shahreza, and M. H. Shirali-Shahreza, " Text steganography in SMS", International conference on convergence information technology, pp. 2260-2265, 2007. [10] K. F. Rafat, "Enhanced text steganography in SMS", International conference on computer, control and communication, pp. 1-6, 2009. [11] M. H. Shirali-Shahreza, and M. Shirali-Shahreza, "Sending mobile software activation code by SMS using steganography", Third international conference on intelligent information hiding and multimedia signal processing, 1, pp. 554-557, 2007. [12] M. H. Shirali-Shahreza, and M. Shirali-Shahreza, "Steganography in SMS by Sudoku puzzle", International conference on computer systems and applications, pp. 844-847, 2008. [13] M. Shirali-Shahreza, "Improving mobile banking security using steganography", Fourth international conference on information technology, pp. 885-887, 2007. [14] W. S. Bhaya, "Text hiding in mobile phone simple message service using fonts", Journal of computer science,7 (11), pp. 1626-1628, 2011. [15] W. S. Bhaya, "A new approach to SMS text steganography using emoticons", International journal of computer applications, 2014. [16] W. Zhi-Hui, C. Chin-Chen, D. K. The, and L. Ming-Chu, "Emoticonbased text steganography in chat", Asia-Pacific conference computational intelligence and industrial applications, 2, pp. 457-460, 2009. 107