BYTE FREQUENCY ANALYSIS DESCRIPTOR WITH SPATIAL INFORMATION FOR FILE FRAGMENT CLASSIFICATION

Size: px

Start display at page:

Download "BYTE FREQUENCY ANALYSIS DESCRIPTOR WITH SPATIAL INFORMATION FOR FILE FRAGMENT CLASSIFICATION"

Tamsyn Nelson
6 years ago
Views:

1 BYTE FREQUENCY ANALYSIS DESCRIPTOR WITH SPATIAL INFORMATION FOR FILE FRAGMENT CLASSIFICATION HongShen Xie 1, Azizi Abdullah 2, Rossilawati Sulaiman 3 School of Computer Science, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, MALAYSIA 1 xie2876@126.com 2 azizi@ftsm.ukm.my 3 rossilawati@gmail.com Abstract Digital forensic is generally about recovering and investigating digital devices such as PC and mobile phones. Examining information and extracting evidences from the digital devices are not an easy task. In data recovery for example, the successful of recovering the digital information is highly dependent on how a method is able to understand the content of a document effectively. The more the system is able to understand the content of documents the more effective is will be in recovering the desired documents. One of the challenging issues in recovering documents is to determine the type of file fragments from an incomplete structure of documents. One possible solution to the problem is based on statistical analysis such as the byte frequency analysis for feature description. The byte frequency analysis computers a global descriptor and provides a statistical distribution from file fragments. However, one possible problem of this solution is to create a global histogram input vector for a machine learning classifier, such as support vector machine that is insensitive to small changes in the file fragment content. Besides, it does not include any spatial information, and liable to false positive especially for large datasets. Therefore, the byte frequency analysis with circular representation is proposed, where a set of file fragments is divided into several blocks using a fixed partitioning scheme. Then, for each block the lower-level byte frequency analysis descriptor feature is used to represent the partitions. After that, all features are combined to create one large input vector for machine learning classifier for classification. We have performed experiments on 10 different file categories at three different resolutions i.e. level0, level1, level2 and combination of several these resolutions. The results show that the proposed method slightly outperforms the single byte frequency analysis distribution. Keyword: digital forensic, byte frequency analysis, support vector machine, spatial information circular scheme 1. Introduction One of the important tasks in digital forensic is file type classification. File fragment type classification is significant in data carving which make a contribution to data recovery before implemented recovery technique. Fragment classification aims to classify different categories of file fragments which is termed extension-based, magic bytes-based and content-based methods (Amirani, Toorani, & Mihandoost, 2013). Content-based method is the most challenging in the parts of the meta-data which have been lost or corrupted. However, the fragments type still can be predicted due to different statistical feature distribution. In previous work, statistical methods have been widely applied in file fragment classification, such as byte frequency analysis (McDaniel & Heydari, 2003), file entropy (Calhoun & Coles, 2008) and standard deviation(axelsson, 2010) of the byte frequencies. Some of researchers get promising results. Recently, applying supervised(li, Ong, Suganthan, & Thing, 2010) and unsupervised (Axelsson, 2010) machine learning technique which boosts file Organized by WorldConferences.net 249

2 fragment classification is widely used. The solution often selects some file fragment features such as BFA, entropy and compression distance. After features are selected the classification is a common machine learning problem. The finding of the research showed that the applying machine learning perform well(fitzgerald, Mathews, Morris, & Zhulyn, 2012), (Beebe, Maddox, Liu, & Sun, 2013). However, there are not enough the techniques which used spatial information to describe file fragment. In this paper, we applied spatial information in enriching file fragment feature. A contentbased type classification method that deploys spatial information and supervised support vector machine for an automatic feature extraction is proposed. The spatial information combining with SVM technique applies statistical analysis of byte frequency of the file fragment in such a way that the accuracy of the technique does not rely on the potential metadata information but rather the values of data itself. The extracted features are then applied in a classifier for file fragment type classification. Table 4, 5 and 6 show that the proposed method gives promising results in both binary file fragment and textual file fragment, when processing the fragment in random size and using a multi dimensional support vector machine classifier. 2. Related Works Previous works that explored non machine learning or machine learning techniques to the problem of file fragment classification appears in the literature. Mcdaniel and Heydari (McDaniel & Heydari, 2003) proposed three algorithms, byte Frequency analysis(bfa),byte Frequency Crosscorrelation(BFC) and File Header/Trailer(FHT), in order to construct the characteristic fingerprints to identify different file types. The BFA algorithm calculates the frequency distribution of each file type by counting the number of occurrences of each byte value. The BFC algorithm captures the relationship between the byte value frequencies to strengthen the file type identification. The correlation strength determined by the average difference in their frequencies The FHT algorithm focuses on calculating byte distribution of the file header and file footer. The experiment explores 30 kinds of files; each file has 4 files sample. The classification accuracies were approximately by 27.5%, 45.83%, and 95.83% for BFA, BFC, and FHT algorithm, respectively. However, this approach fails to get a high accuracy, except that it relies on the header information contained in the fragment Meta information. Hence, it is not applicable to most of situations which are not included in the header information. Besides that, Karresand and Shahmehri (Karresand & Shahmehri, 2006b) proposed a file type identification method, and named it Oscar. They built centroids of the mean and standard deviation of byte frequency distribution of different file types. A weighted quadratic distance metric was used to measure the distance between the centroids and the test data fragments, so as to identify JPEG fragments. In addition, the detection capability of Oscar was enhanced by taking into consideration that byte OXFF was only allowed in combination with a few other bytes (i.e.0x00, 0XD0..D7) within the data part of JPEG files. Using a test data set of KB blocks, the classification accuracy was 97.9%. The authors extended the Oscar method in his previous research (Karresand & Shahmehri, 2006a) by incorporating the byte ordering information through calculating the rate of change of the data bytes. Using a test data of 72.2 MB, the classification accuracy for JPEG fragments was 99% with no false positive. However, for windows executable, the false positive rate increased tremendously to exceed the detection rate. The detection rate for zip files was between 46% and 84%, with false positive rates in the range of 11% to 37%. Calhoun and Coles (Calhoun & Coles, 2008) proposed the application of the Fisher linear discriminate to a set of different statistics based on four file types (namely jpg bmp gif and PDF). Specifically, the experiment compared between jpg vs.pdf jpg vs.gif, PDF vs. bmp. The test set is 50 fragments for each file. After data preprocessing, the combination of the ASCII-Entropy-Low-High- Modesfreq-Sdfreq statistical feature achieved the highest average accuracy 88.3%. It is caused by only take four file fragment types into consideration and based on 1vs1 model. However, there was no attempt at multi-type classification. Testing in such a way gives less chance of misclassification and the results should be interpreted with false positive or true negatives, finally they noted that a modification to their methodology would be required to avoid the situation where the method fails and all fragments are classified as one type but which gives high accuracy. Organized by WorldConferences.net 250

3 Veenman (2007) extracts three features from file content which include byte frequency complexity. After these features were retrieved, the author applied linear discriminate analysis to classify file fragments. In his experiments the type-x and type-all map to binary and multi-class classification. And the author used a large private data set consisting of between 3000 and fragments per file type, for 11 file types. This method achieved an average classification accuracy of 45%. However, this method gets a poor performance in dealing with the compressed files had only 18% accuracy with 80% false positives. The prediction accuracy decreased as the numbers of clusters increased. The possible reason is that the compressed files are not sensitive to the increase in clusters content. Li et al. (2010) proposed support vector machine (SVM) applied in high entropy file fragment classification based on byte frequency analysis. In this experiment the training data have 5 kind of file types.(dll,mp3,exe,jpeg,pdf) For each file have 800 file training and 80 file for testing, each file was split into 4096 byte fragments and the first and last fragment discarded to ensure that header and footer meta data and any possible file padding was excluded. They achieved an average classification accuracy of 81.5%. The result shows the classification results are acceptable. However, due to only implementation on five types files. These results are not so comprehensive. The datasets should extend to more file fragment types, which also have a high entropy values. At the same time a drawback in this experiment is that the data set is a private datasets. It is difficult to extend the experiment to other researchers. Fitzgerald et al., (2012) proposed machine learning algorithm (SVM) applied in natural language processing based on byte frequency bigram counts, entropy of bigram counts, hamming weight and compressed length as a feature. As to data set, this experiment downloads from a public dataset. ( GovDocs 2009). They selected up to 4000 fragments for the data set and apportioned these approximately in the ratio 9:1 as training and test sets respectively for the SVM. Finally according to SVM predict the results which show average predict accuracy is 48.3%. And with the number of file fragment increasing, the prediction accuracy increasing too. The result is achieved promising. However, the paper did not mention which feature belongs to most powerful one in classifying file fragment. Sportiello and Zanero (2011), proposed a method which have multi features combining solution. These features include Mean Byte Value, Entropy (E) and Complexity(C). And some specialized classifiers such as the distribution of ASCII character codes which would characterize text based file types, and Rate of Change which we have seen is a good classifier for JPEG files. The corpus consisted of nine file types download from the internet and these were decomposed into a total of blocks of 52 bytes for each file type. The experiment show E-C-byte frequency usually is a powerful feature vector to achieve a high accuracy. And feature of rate of Complexity and Complexity- byte frequency is also a good feature vector to get high classification accuracy. Finally the truth prediction rate range from 71.1% to 98.1%, and false prediction rate range from 3.6% to 32.1%. The experiment have a contribution is clearly explore what kinds of combining feature vector is useful to file block classification. However, there is no multi-class classification was attempted. For each file type a separate SVM model was created to classify a fragment type against each of the other types individually. And there is no confusion matrix of the results and no mention of false negative results, the table of results is arranged by fragment type, feature and feature parameter (c and r). It is difficult to compare the results with previous research. Gopal, Yang, Salomatin, and Carbonell (2011), proposed a multiple file corruption situation, where Type3 corresponds to file fragment classification they then evaluated the performance of several statistical classification methods(support vector machine with n-gram feature, and k-nearestneighbors) and several commercial off-the-shelf solution (for example: including Libmagic, Trid, Outsidein and Droid) in classifying files under several corruption scenarios. At the same time they made use of public data ( GovDocs 2009) and consisted over files of 316 file types, as to specification of file that is compressed or encrypted files not mention in the experiment. The result shows file fragment is more difficult to classify than completely file type. And the performance on file fragment was for the SVM only and it is shown that they achieved 40% accuracy measured the Macro-F1 measure. However, the true file types were taken to be those reported by Libmagic using Organized by WorldConferences.net 251

4 the Linux file command and so no take the meta-data for example file header or footer into consideration which may have resulted in bias in the experiment results. 3. Proposed Method The proposed scheme includes two phases: the training phase and the testing phase. During the training phase, the support vector machine is trained, and the system is initialized using sample data with known types. The output will be developed with a fileprint from different file types. In the testing phase, trained support vector machine and fileprints will be used for the fragment type testing. We assumed that to each input file fragment the model would automatic partition into several blocks based on 4096 byte. After we processed this splitting, we removed the first file fragment, because it included meta-data of header information, which normally affects classification accuracy. At the same time, we deleted the last file fragment because it also included metadata of footer information. When the real world file fragment was inserted into this system, it was quickly converted to a 4k based file fragment. After data preprocessing and BFA extraction, the left part data fragment was divided into several blocks using 2 i algorithm. The circular scheme maps to level 0, level 1, and level 2, when i equals to 0, 1, and 2.In this study we focus on implementation the experiment on multi-class classification and using one verse one model. In Figure 2: a, b, c show the circular scheme constructing based on 50 file fragments at level 0, 1, Byte Frequency Analysis (BFA) In file fragment classification, the histogram of BFA is one of the most important file fragment features (Veenman, 2007). BFA is an algorithm, which calculates byte frequency distribution that includes eight-bit (unigram count) or sixteen-bit (bigram count) numbers capable of representing numeric values in a file. However, a drawback in current BFA algorithm that uses unigram or bigram count is that the connecting relationship between different fragmentation blocks is missing, which causes the BFA algorithm failing to capture enough fragment information. This is observed especially in high entropy file fragment or similar byte frequency distribution among different file fragments. Therefore, in this experiment we applied spatial information to enrich feature vector. First, we count single byte (unigram count) to extract byte frequency distribution, which correspond to eight-bit numbers capable of representing numeric values from 0 to 255 inclusive. By counting the number of occurrences of each byte values in a file, a frequency distribution can be retrieved. Different file types have consistent pattern to their frequency distributions (McDaniel & Heydari, 2003), furthermore a connection is constructed by extracting neighborhood fragment information. As following we present a brief BFA algorithm BFA algorithm: Calculation of a fragment histogram. Organized by WorldConferences.net 252

5 Table 1: BFA Algorithm Definition BFA algorithm: Calculation of a fragment histogram. Create an array histogram with 2 n elements (n = 8 bits) For all data byte value, i do Histogram[i] =0 End for For all data types X do Increment histogram [f(x)] by 1 End for 3.2 BFA with Circular Scheme Mirroring of the fragment at its border Figure 1: A 28KB file fragment is developed based on seven file fragments in level 1 Figure 2: A circular scheme constructing based on eight blocks in level 1 In our proposed method, we represent our file fragment representation by partitioning a file fragment. Partitioning is a process to capture the spatial information of file fragments. However, the problem with partitioning is the way to get an even distribution for calculating neighborhood information of file fragments. Therefore, we introduce the circular scheme to get a holistic of neighborhood information. The idea is to borrow information from the opposite site partition of the file fragment. The algorithm to implement circular scheme is here under presented in Table 2. Organized by WorldConferences.net 253

6 Table 2: An algorithm definition of circular scheme under mod algorithm ALGORITHM: Circular scheme under mod algorithm Let M be with fragment number M= n*p +r, where P is number of partitions. P= 1, 2, 4, when level i= 0, 1, 2. r is undivided number of fragments. r < n, n. If M is odd number then M=M+1 (i=1); M=M+1, M=M+2 or M=M+3, (i=2) Else if M is a even number then End if M= M (i=1); M=M or M=M+2 (i=2) 3.3 BFA with Spatial Level Circular Scheme We introduce the spatial level circular scheme to enrich spatial byte frequency features by looking at several resolutions of partitions in the circular scheme representation, such as shown in Figure 3 (a) to (c). Each file fragment is divided into a sequence of increasingly finer spatial partition by repeatedly doubling the number of BFA distribution in the circular scheme, such as shown in Figure 3. Figure 3: The circular scheme constructing in level 0, 1, 2. Figure 3: Spatial Level Circular Scheme representation. Different levels give different numbers of BFA distributions. (a) Level 0 uses single BFA histogram distribution, (b) Level 1 uses two BFA histogram distributions, and (c) Level 2 uses four BFA histogram distributions Before we applied spatial information to extract features, it is necessary to do a data pre-processing. At first, we use 2 i algorithm to partition the file fragment. In order to construct multi-resolution description the file fragment. Number of partitions P is equal to 2 i. P = 2 i (i= 0, 1, 2) (1) Organized by WorldConferences.net 254

7 Where I equal to different levels, from level 0 to level 2.At the same time, the spatial information layout approach uses the fixed partitioning scheme (2 i ) to construct multiple spatial resolution levels in the file fragment. Each histogram in each partition is used to capture spatial information of the fragment. In this case, a BFA vector is computed for each grid cell at each different resolution level. The final BFA vector descriptor for fragment is a concatenation of all BFA vectors. In forming the multi-resolution BFA, the grid at level L has 2L cells along the dimensional. Consequently, level 0 is represented by a K-vector corresponding to the k bins of histogram, level 1by a 2k-vector etc, and the combination of BFA vector descriptor of file fragment is a vector with dimensionality: For example, for levels up to L=1 and k=256 bin. It will be a 768-vector. In the study we limit the number of levels to L=2 to prevent over fitting. (2) 4. Experimental Setup and Results 4.1 SVM Classifiers Support Vector Machine (SVM) is a machine learning algorithm that is very useful in solving classification problems(hsu, Chang, & Lin, 2003). In this study, we applied Radial-Basis-Function (RBF) kernel in developing the fileprint. Furthermore, one-vs-one approach is used to train and classify file fragment. Initially, all attributes in training and testing were normalized to the interval [-1, +1] by using this equation: X= 2(x-min)/ (max-min) -1 (3) The normalization is a process of scaling data into a small interval where it scales in the range of [-1, 1]. This process is the key point of having better classification performance. Data normalization is used to avoid numerical difficulties during the calculation and to make sure the largest values do not dominate the smaller ones. For a C-SVM type, a parameter C is introduced. This parameter is intended to handle misclassification, thus lesson the training error rate (penalty) while maximizing margin between two classes. Misclassification can occur as there are possibilities that some positive classes are biased to the negative class while some negative class may be biased to positive class. To optimize the classification performance, the kernel parameters are determined by using the libsvm grid-search algorithm [index]. The C and values can be tried to get the best accuracy performance. However, we tried the following values { 2-5, 2-3,,2 15 } and {2-15, 2-13,,2 3 } for C and respectively. We select the best accuracy value that is used in training set. The training file we used to create classifier to get the optimal learning parameter C and. 4.2 Dataset During the experiment evaluation phase, we download two categories of datasets. The first one is binary file, 400 JPEG images, 400 MP3 music files, 400 PDF documents, 400 dynamic link library files (DLLs) and 400 Microsoft windows executable (EXEs). As to textual file, we download 400 comma-separated values files (CSV) and 400 extensible mark-up language file (XML), 400 Hypertext Mark-up Language files (HTML), 400 log files (LOG) and 400 text file (TEXT). Note that all of the files randomly downloaded from public dataset. Which are available to download at specification of data set used is shown in Table 3. Organized by WorldConferences.net 255

8 Table 3: A specification of dataset used in experiments File types File size range Numbers of files DLL 18KB-13287KB 400 EXE 18KB-11287KB 400 MP3 18KB-14547KB 400 JPEG 18KB-3247KB 400 PDF 18KB-16297KB 400 CSV 18KB-4287KB 400 HTML 18KB-3288KB 400 LOG 18KB-2657KB 400 TXT 18KB-3998KB 400 XML 18KB-2979KB Result In this study, we focus on three types of experiment. The first one is binary file fragment-based experiment. The second one is textual file fragment-based experiment. The last one is combination of binary and textual file fragment. In each experiment, we randomly downloaded 400 files for each type. And 200 files for training, and 200 file for testing. At the same time each experiment was repeated five times. For each file, first, we used split software to divide each file into a set of 4096 byte partitions. The size is widely applied in some typical file system as a cluster size. This is the reason why we implemented on 4096 byte. At the same time, we removed the first and the last fragment from dataset. Due to the file header and footer will disturb the classification results, which include important file information, e.g. file type, file creation time, and file store location and so on. First of all, we present the results that are divided into three parts. The first diagram is an average classification accuracy of binary fragment, the second diagram is textual fragment, and the last one is combination binary with textual fragment. The average classification accuracy of the Binary file fragment is dll+ exe+ jpg+mp3+pdf / 5 (Notification: Spatial1 is level0 +level1. Spatial2 is level0 +level1 +level2). Besides that, the Textual file fragment classification used the average accuracy is log + xml + html +CSV + text / 5. While the average classification accuracy of the Binary file fragment combination with textual file fragment is dll+exe+jpg+mp3+pdf+csv+html+log+txt+xml / 10. All results were obtained by repeating the experiment five times. Table 4: The average classification accuracy and distinguish an individual file results Levels Level 0 Level 1 Level 2 Spatial 1 Spatial 2 Types Fragment DLL EXE JPEG MP PDF Average accuracy Organized by WorldConferences.net 256

9 Table 5: The average classification accuracy and distinguish an individual file results Levels Level 0 Level 1 Level 2 Spatial 1 Spatial 2 Types Fragment CSV HTML LOG TXET XML Average accuracy Table 6: The average classification accuracy and distinguish an individual file results Levels Level 0 Level 1 Level 2 Spatial 1 Spatial 2 Types Fragment DLL EXE JPEG MP PDF CSV HTML LOG TXT XML Average accuracy Discussions and Conclusion Looking at the results of true positive rate of classification, it is obvious that when more spatial information is added, no further definite level scheme could be identified. (increasing more dimensional vector) in our hypothesis, we expected that adding more spatial information data to the classifier would allow for a better classification because more spatial information data distribution should allow for the classifier to better represent the characteristic features of the fragment type. Even in the best case scenario, since the spatial information construct best neighbourhood connection. We would expect the true positive rate to increase level by level. Only a small number of results actually such a behaviour, which includes DLL file fragment, JPEG file fragment and MP3 file fragment in binary file fragment experiment (Table 4). As to textual file fragment experiment the CSV file fragment and TEXT file fragment have a significant results. (Table 5) Finally, in combining binary and textual file fragment experiment, the DLL,JPEG, MP3,CSV,TXET, AND XML get a significant results. (Table 6) In stark contrast to our expectations, the results show cases where classification results actually deteriorated with addition of more dimensional vector. This can be seem to some extent in the classification results of EXE, PDF file fragment type in binary file fragment experiment. HTML, LOG, XML, file fragment type in textual file fragment experiment. EXE, PDF, HTML, LOG, file fragment type in combining binary and textual file fragment experiment. The goal of this research was to investigate whether technique from spatial information layout could be applied successfully to file fragment classification. We found that this is indeed the case. However, the prediction accuracy was not as we expected, especially in textual file fragment experiment. The Organized by WorldConferences.net 257

10 possible reason is unigram counting based on 8 bit rich enough. And the selecting dataset should consider more comprehensively. In this paper, we researched the problem of file type classification of digital forensic evidence in the absence of header footer and file system information. Although some of research techniques got promising results. It is a gap that using spatial information to build the fragment connection in order to enrich the discrimination(ahmed, Lhee, Shin, & Hong, 2009). Recently, there have been attempts to solve the problem with machine learning techniques such as support vector machine, k-nearest neighbor. Despite the improved performance over previous methods, the classification model becomes complex and inefficient. We proposed to utilize support vector machines that are very powerful supervised learning algorithms that have been intensively applied in contend based classification. At the same time we employed a simple feature vector (byte frequency distribution) combination with spatial information implementation on circular scheme. And trained the SVM with large amount of data and performed parameter optimization to achieve high accuracy. The results show that spatial information has a slight improvement in average accuracy. The possible reason is that unigram count is sufficiently rich based on 8 bit byte frequency. One possible future direction in this regard is to consider bigram count or trigram counts combination with 16 bit byte frequency to classification (Fitzgerald et al., 2012). References Amirani, Mehdi Chehel, Toorani, Mohsen, & Mihandoost, Sara. (2013). Feature-based Type Identification of File Fragments. Security and Communication Networks, 6(1), Axelsson, Stefan. (2010). The Normalised Compression Distance as a file fragment classifier. digital investigation, 7, S24-S31. Beebe, N, Maddox, L, Liu, Lishu, & Sun, Minghe. (2013). Sceadan: Using Concatenated N-Gram Vectors for Improved File and Data Type Classification. Calhoun, William C, & Coles, Drue. (2008). Predicting the types of file fragments. Digital investigation, 5, S14-S20. Fitzgerald, Simran, Mathews, George, Morris, Colin, & Zhulyn, Oles. (2012). Using NLP techniques for file fragment classification. Digital Investigation, 9, S44-S49. Gopal, Siddharth, Yang, Yiming, Salomatin, Konstantin, & Carbonell, Jaime. (2011). Statistical learning for file-type identification. Paper presented at the Machine Learning and Applications and Workshops (ICMLA), th International Conference on. Hsu, Chih-Wei, Chang, Chih-Chung, & Lin, Chih-Jen. (2003). A practical guide to support vector classification. Karresand, Martin, & Shahmehri, Nahid. (2006a). File type identification of data fragments by their binary structure. Paper presented at the Information Assurance Workshop, 2006 IEEE. Karresand, Martin, & Shahmehri, Nahid. (2006b). Oscar file type identification of binary data in disk clusters and ram pages Security and privacy in dynamic environments (pp ): Springer. Li, Qiming, Ong, A, Suganthan, P, & Thing, V. (2010). A novel support vector machine approach to high entropy data fragment classification. Paper presented at the Proceedings of the South African Information Security Multi-Conference (SAISMC 2010). McDaniel, Mason, & Heydari, Mohammad Hossain. (2003). Content based file type detection algorithms. Paper presented at the System Sciences, Proceedings of the 36th Annual Hawaii International Conference on. Organized by WorldConferences.net 258

11 Sportiello, Luigi, & Zanero, Stefano. (2011). File Block Classification by Support Vector Machine. Paper presented at the Availability, Reliability and Security (ARES), 2011 Sixth International Conference on. Veenman, Cor J. (2007). Statistical disk cluster classification for file carving. Paper presented at the Information Assurance and Security, IAS Third International Symposium on. Ahmed, Irfan, Lhee, Kyung-suk, Shin, Hyunjung, & Hong, ManPyo. (2009). On improving the accuracy and performance of content-based file type identification. Paper presented at the Information Security and Privacy. Organized by WorldConferences.net 259

A Novel Support Vector Machine Approach to High Entropy Data Fragment Classification

A Novel Support Vector Machine Approach to High Entropy Data Fragment Classification Q. Li 1, A. Ong 2, P. Suganthan 2 and V. Thing 1 1 Cryptography & Security Dept., Institute for Infocomm Research, Singapore