Enhancement of Lempel-Ziv Algorithm to Estimate Randomness in a Dataset

Size: px
Start display at page:

Download "Enhancement of Lempel-Ziv Algorithm to Estimate Randomness in a Dataset"

Transcription

1 , October 19-21, 216, San Francisco, USA Enhancement of Lempel-Ziv Algorithm to Estimate Randomness in a Dataset K. Koneru, C. Varol Abstract Experts and researchers always refer to the rate of error or accuracy of a database. The percentage of accuracy is determined by the number of correct cells over the total number of cells in a dataset. After all, the detection and processing of errors depend on the randomness of their distribution in the data. Apparently, if the errors are systematic (present in a particular record or column), then they can be fixed readily with minimal changes. As a result, sorting errors would help to address many managerial questions. Enhanced Lempel-Ziv algorithm is reflected as one of the effective ways to differentiate random errors from systematic errors in a dataset. This paper explains Lempel-Ziv algorithm usage in differentiating random errors from systematic ones and proposes its improvement. The experiment spectacles that the Enhanced Lempel-Ziv algorithm successfully differentiates the random errors from the systematic errors for a minimum data size of 5 and with a minimum error rate of 1%. Index Terms Data accuracy, Enhanced Lempel-Ziv, Prioritization, Random errors, errors I. INTRODUCTION From the early age of software, data owned by an organization is one of the crucial assets. In order to improve the quality of information, primarily the data quality needs to be measured, to evaluate the value of any information available. Redman et.al, mentioned the science of data quality has not yet advanced to the point where there are standard measures for any data quality issues [1]. Considering the quality of data at the database level, the rate of error at the attribute level plays a vital role. The error rate is defined as the number of erroneous cells over the total number of attribute cells available in dataset. Lee et.al, had defined the accuracy rating as 1 (Number of desirable outcomes / total outcomes) [2]. These definitions ascribe to individual cells which are data attributes for specific records. Organizations are attentive towards the reliability, correctness and error free data. But the error in the data may not enclose to a particular area. Prioritization of databases plays a critical role when they are suggested to improve their existing quality. The number of errors per dataset or current quality might influence the priority to fix the problems. Hence finding the error relies on the vector quantity known as measure of randomness of error in data. Manuscript received March 7, 216. This work was supported in part by the Sam Houston State University. K. Koneru, Student in Master of Sciences, Department of Computer Science, Sam Houston State University, Huntsville, TX USA ( keerthi@ shsu.edu). C. Varol, Associate Professor in Department of Computer Science, Sam Houston State University, Huntsville, TX USA.( cvarol@shsu.edu). Distinguishing between dataset with random errors and dataset with systematic errors would help in better assessment of database quality. In this research, the developed method obtains more appropriate complexity metric using Lempel-Ziv algorithm to absolutely state the type of error. The outcomes are computed by considering a sample dataset with errors (as 1 s) and no errors (as s) where we could estimate and govern whether the errors are random (or not) by using Enhanced Lempel-Ziv (LZ) complexity measure. The proposed method helps to obtain the dataset with highest percentage of errors. Hence, it will be useful to address the decision-making query such as prioritizing the databases, which should be considered primarily, to fix the issues. The rest of the paper is organized as follows. Related work in the areas of data quality and studies in randomness of dataset are detailed in section 2. The approach, Enhanced Lempel-Ziv algorithm, is explained in section 3. Section 4 shares the test cases and results that have been used and obtained from the study, and the paper is finalized with conclusion and future work section. II. RELATED WORK The definition of randomness has its root from the branch of mathematics which considers the storage and transmission of data [3]. With the same percentage of errors existing in a dataset, the distribution of errors affects the management of a dataset more significantly. Hence the difference in the complexity measure can readily be observed which specifies the distribution of errors. Fig 1.Distribution of [3]. (a) in one column; (b) in one row; (c) randomly distributed throughout the table Fisher et.al, stated that the database might account for the same percentage of errors but have the errors randomly distributed among many columns and rows, causing both analysis and improvement to be significantly more complicated. Figure 1 depicts the datasets with 5% error rate as Redman s cells with error divided by total number of cells [3]. Sometimes error may be due to a single record or there may be existence of different errors in a single field which is

2 , October 19-21, 216, San Francisco, USA referred as systematic error as shown in Fig 1(a) and Fig 1(b) and can be handled by a single change. On the other hand, there may be existence of different errors in different records which are distributed in the dataset. Though the percentage of errors is same as that of systematic, the dataset is considered to be out of control. These are known as random errors as depicted in Fig 1(c). More effort is required to fix such kind of errors that needs to be taken care based on priority. All three cases could report the same percentage but represent different degrees of quality. Different algorithms like Kolmogorov-Chaitin complexity, Lempel-Ziv complexity and probabilistic entropy based measures like Shannon entropy, Approximate entropy, Sample entropy etc. [4], are used to measure the degree of randomness in a database. The Lempel-Ziv measure of complexity is one of the most popular methods used to obtain the degree of randomness where the data is considered as binary sequence [5]. The Kolmogorov-Chaitin (KC) complexity measure provides the disorder in any sequence. Whenever a sequence is random, it is considered to be incompressible. The length of the minimal description of sequence conveys the complexity, which leads to the measure of randomness of the sequence. Kolmogorov-Chaitin complexity of a sequence of binary digits is defined as the shortest program that can output the original sequence of bits. It can also be stated that sequence is said to be completely random, if KC complexity is approximately equal to the size of sequence of bits [2]. H KC M = P min where H KC is the KC complexity measure, M is the original sequence and P min is the minimal program [2]. This complexity measure not only organizes hierarchy of degrees of randomness but also describes the properties of randomness more precisely than statistical information. It is also used to measure the information content of a sequence. But, the major problem in the calculation of KC complexity is that there will not be any general algorithm for such a program, and is highly dependent on the data available in the dataset. In such a case, it is hard to estimate the value of time complexity when n. Shannon Entropy, defined as the weighted average of the self-information within a message or the amount of randomness that exists in a random event, is another method used to find the random errors in a given dataset. Shannon entropy depends on the probability distribution of the sequence. Let X is the random variable of the sequence of binary digits S with a probability mass function p(x), and then Shannon Entropy H(X) is given as H X = x S p x logp(x) The value of H(X) varies from to log ( S ), depicting zero to no uncertainty and log ( S ) when all elements have equal probabilities [6]. As the length of the sequence increases, it underestimates the higher entropies while overestimating lower entropies. On the other hand, the Lempel-Ziv algorithm evaluates the complexity of a system objectively and quantitatively and overcomes the limitation of calculation of complexity statistically. The randomness parameter analyses difference between systematic patterns versus degree of random distribution. Unlike KC complexity measure, in which length of the program plays a major role, LZ complexity measure depends on two operations on the binary digits: copy and insert. It depends on the formation of number of distinct substrings along the length of sequence and the rate of their occurrence [3]. The LZ complexity algorithm and its corollaries are used in the development of application software and dictionary based lossless compressions such as WINZIP etc. It is extensively employed in biomedical applications to estimate the complexity of discrete time signals [3]. Apart from the above applications, Lempel and Ziv in 1976[2] mentioned that the algorithm has overcome the restraint of interpreting the complexity through characteristic quantities of statistical complexity. Simultaneously, as the calculation of characteristic quantities require longer data sequences, other algorithms can only identify whether system is complicated or not, whereas Lempel-Ziv Complexity measure shows the degree of system complexity [2]. Fisher et.al, had used the Lempel-Ziv complexity measure to obtain the randomness in a dataset [3]. But, the output is unable to provide accurate values of complexity measure for the small values of n, as the parameter, epsilon (ε n ) is disregarded. As a result, it not only overestimates the complexity measure but also impotent to differentiate noticeably between the random errors and systematic errors when the dataset size is less than 1. The proposed methodology, in this paper, enhances the Lempel-Ziv algorithm by considering the parameter ε n and calculates the value of complexity measure appropriately. It differentiates between random and systematic errors for smaller dataset size commencing from 5. It also determines the dataset with highest percentage of error among the given datasets of particular data size. III. METHODOLOGY LZ Complexity measure is a prominent approach used to differentiate the random and systematic errors. The word randomness is used in an instinctual manner in day to day life to define regularity deficit in a pattern. Sequences which are not random will cast a doubt on the random nature of the generating process [7]. A. Lempel-Ziv Algorithm: The steps for obtaining normalized complexity in the Lempel-Ziv Algorithm are given below. 1. Divide the sequence into consecutive disjoint words such that the next word is the shortest template not seen before. 2. The size of the disjoint sets is considered to be the complexity measure c(n) of the sequence, which is also defined as number of steps required to form the disjoint sets in a complete sequence. 3. The asymptotic value of b(n) is calculated as b n = n/ log 2 n

3 , October 19-21, 216, San Francisco, USA 4. The Lempel-Ziv complexity measure C(n) is evaluated by 5. c n C n = b n The measure C(n) represents the rate of occurrence of new substring, and its value varies between (for systematic sequences) and 1(for totally random sequence). While calculating the randomness in the database files, the measure is easily computable for large values of n [3]. B. Enhanced Lempel-Ziv Algorithm: The high level architecture of the proposed system is shown in Figure 2. We assume a sample database is given as an input to the system and we already know whether the data available in the sample database is either correct or incorrect. Then, the system samples the data cell values into binary sequence, where 1 s indicate the cells with errors whereas s indicate the cells without errors. After the binary sequence is generated, the analyzer applies the Enhanced LZ complexity algorithm to generate unique substrings from the sequence. Based on the created substrings, normalized LZ complexity is calculated for each of the datasets, which signifies whether the errors stationed in the data are systematic or random. prioritize the databases which can be considered to obtain the integrity, the Lempel-Ziv algorithm is modified. Yong Tang et.al, mentioned that the asymptotic value of b(n) is accurate if the value of n [2]. But for every finite sequence, there exists a value ε n, such that b n = n/ (1 ε n )log 2 n. The value of epsilon (ε n ) is given by ε n = log 2 log 2 (2n) log 2 n The value of ε n ought to be measured for small values of n, whose value varies between and 1, having its value around.5 for n being 1. Hence the value of ε n cannot be ignored for finite sequences, where the results are not accurate in the measure of randomness. It shows that the upper limit is underestimated, which illustrates that the normalized complexity C(n) is overestimated [2]. Based on the above analysis, following are the constraints to be considered to evaluate accurate measure of Lempel-Ziv algorithm: 1. The critical value of n when (1 ε n ) log 2 n is greater than zero. 2. The assumption of sufficient sequence length. The new LZ complexity measure is given by, C n = (1 ε n)c(n)log 2 n n Fig 2. Architecture of Proposed Methodology Although original LZ complexity objectively and quantitatively estimates the system complexity, there is a drawback of over valuation of normalized complexity for short-series. Fisher et.al, has stated that the C(n) value goes close to zero for deterministic sequences and approaches 1 for non-deterministic or random sequences [3]. But it may not be applicable for the finite sequences appropriately to distinguish between random errors and systematic errors. In order to overcome such disadvantages and improve the complexity measure to differentiate the randomness and Upon the calculation of new complexity measure, the randomness can be determined appropriately even for the short series whose length is significantly much less than. C. Illustration of obtaining normalized complexity measure C(n): To illustrate this procedure, the sequence of eleven (n = 11) symbols S = is considered. Enhanced Lempel Ziv s algorithm parses the sequence into six substrings {, 1, 1,, 1, 11} rendering c(n = 11) = 6. The notation S(i) is used to identify the ith bit in the string S. The algorithm parses S from left to right looking for substrings that are not present in the superset U. As the algorithm proceeds and the superset is grown, the first substring seen from left to right is S(1) =, and U = {}. Then S(2) = 1 is parsed and added to U. Hence U = {, 1}. The next bit is S(3) =, already present in U, S (4) = 1 is appended to S(3), interpreting substring 1. As 1 is not present in the superset, it is included to U= {1,, 1}. The next bit S (5) = is included in U, so S (6) = is appended to it. The resulting substring (), not present in U, is therefore added to the superset: U= {1,, 1, }. As the algorithm proceeds, the next two bits S (7) = 1 and S (8) = are parsed. Similarly S(9) = and S(1) = 1 are parsed. As the resulting substring 1 is part of U, S (11) = 1 is appended to it, rendering 11. That value is added to the superset, yielding U = {1,, 1,, 1, 11}. The size of the superset U is taken to be the complexity measure c(n=11) = 6, which is also the number of steps required to construct U.

4 , October 19-21, 216, San Francisco, USA Enhanced Lempel-Ziv complexity measure c(n) of a binary random sequence is in the limit equal to b(n) = n/(1- ɛ n )log 2 (n). Dividing c(n) by b(n) gives the normalized Enhanced Lempel-Ziv Complexity measure C (n). cn ( ) Cn ( ) bn ( ) The normalized C(n) represents the rate of occurrence of new substrings in the sequence. C(n) values go from close to (for deterministic/ periodic sequences) to.4 (for totally random sequences). IV. TEST CASE AND RESULTS A. Test Data Three different types of datasets are generated for testing and to obtain the comparative results, specifically, random dataset, dataset with systematic errors in rows, and dataset with systematic errors in columns. All three types of datasets with different percentage of errors are used to obtain the test results respectively. For each of the data type and percentage of errors, 1 different data sets are generated. Particularly, for the size of 5, samples, 3 different test datasets are generated for each of different percentage of errors varying from 5% to 2% and another 9 datasets at 5% are considered for random type of errors, systematic errors in rows and systematic errors in columns. Same strategy is applied for 1, data sizes. B. Test Results With the algorithm, a total of more than 12 different datasets is analyzed for each of the sample size. Each type of data is tested by 1 different sets in each sample size with particular percentage of error. Figure 3 reflects the average C(n) scores for different type of datasets in different sample sizes for 1% of errors in each dataset. It can be clearly understood from the figure, that both systematic type of errors (rows and columns) have lower C(n) scores related to the random type of errors. From the experiment, it is also clear that the selected sample sizes should have a minimum value of n >= 5 as the differentiation is quite difficult for lower values of n. The same experiment is conducted on the datasets having 5%, 15% and 2% errors respectively. Unlike in the dataset with 5% errors where the random errors are not clearly differentiated as shown in Figure 4, the algorithm distinguished between random and systematic errors in the datasets with 15% and 2% errors. For the dataset with 15% errors, the average values of C(n) are.19,.13, and.25 for systematic errors in rows, columns and random errors respectively. The average values of C(n) for the dataset having 2% errors are.19,.14, and.28 for systematic errors in rows, columns and random errors respectively. The standard deviation for these values is in the order of 1-4, showing the algorithm as effective. It is a clear indication that the Enhanced Lempel-Ziv complexity can differentiate the random errors from systematic errors. As also expected, with the increase of data size, the C(n) value of systematic type of data will vary from.1 to.2 while for random errors the value is greater than.23 for different datasets. As the percentage of errors increases to a very high value beyond 5%, the algorithm still distinguishes between random errors and systematic errors but with different threshold values. With the tests on different datasets having 5% errors, the average value of C(n) for systematic errors in columns is nearly.7 while for systematic errors in rows has an average value of.2. For the random errors the average value of C(n) is.37. Though random errors have highest value it cannot be restrained once the errors increase beyond 5%. Noramalized Complexity C(n) Normalized Complexity C(n) Random Column 5 1 Row Fig 3. Test Results (C(n) for the different datasets having 1% of errors) Random Column Row 5 1 Fig 4. Test Results (C(n) for the different datasets having 5% of errors) The Enhanced complexity also differentiates between the percentages of errors by giving highest complexity measure for high percentage of errors. As shown in Figure 5, the databases with high percentage of random errors have high complexity measure than the databases with low percentage of errors for different data set sizes. Hence, with these results it would be easier to answer the managerial questions to consider the prioritization of databases, which may require the processing proximately. Normalized Complexity C(n) % Random 15% Random 2% Random Fig 5. Test Results (C(n) for the different datasets having different error percentages)

5 , October 19-21, 216, San Francisco, USA Figure 6 clearly reflects that at any given percentage of errors the random errors have highest complexity and for any given type of errors the value of complexity measure is high for high percentage of errors in a given dataset having same number of data cells. But as the percentage of errors reach closer to or beyond 5%, due to high errors organization and hence it would be hard to prioritize the datasets in such scenario. Conference on Fuzzy Systems and Knowledge Discovery (FSKD 27), 27 IEEE. [6] W. Kinser, Single-Scale Measures for Randomness and Complexity, Proc. 6th IEEE Int. Conf. on Cognitive Informatics (ICCI 7), 27 IEEE. [7] R. Falk and C. Konold. "Making Sense of randomness: Implicit encoding as a bias for judgment." 14: p Psychological Review, Complexity measure c(n) Percentage of errors Random Column Row Fig 6. Test Results (c(n) for the different datasets having 1 data cells) V. CONCLUSION Enhanced Lempel-Ziv Algorithm is an effective and efficient way to obtain the degree of randomness in a dataset, without over-estimating the normalized complexity, C(n), for moderate data sizes. It is simple and fast algorithm that renders an intuitive measure of the degree of randomness of the data errors. It is well instituted in conceptual principles and has vast applications in practical fields. It is one of the most prominent applications to assess the random number generators by The Computer Security Division of the National Institute of Standards and Technology (NIST), a key component of all modern cryptographic systems. The tests also have been performed on sample sizes from 5 to 4, which cannot differentiate the random errors from systematic errors due to data size constraint. The above proposed algorithm predominantly differentiates between the errors when the minimum sample size is 5. The proposed method is a significant step in comparing databases. In the near future, a probability distribution function as means of characterizing random distribution errors may help to monitor and benchmark the quality status of datasets and may assist to obtain the prioritization of datasets even for more percentage of errors. REFERENCES [1] T. Redman. "Measuring Data Accuracy, in Information Quality," p Armonk, NY: R.Y. Wang, et al.,, 25. [2] Y.W. Lee, L. L. Pipino, J. D. Funk, and R. Y. Wang, Journey to Data Quality, Cambridge: MA: MIT Press., 26. [3] C.W. Fisher and E. J. M. Lauria, An Accuracy Metric: Percentages, Randomness and Probabilities, ACM Journal of Data and Information Quality, Vol. 1, No. 3, Article 16, December 29. [4] A.H. Lhadj, Measuring the Complexity of Traces Using Shannon Entropy, Fifth International Conference on Information Technology: New Generations, 26. [5] F. Liu and Y. Tang, Improved Lempel-Ziv Algorithm Based on Complexity Measurement of Short Time Series, Fourth International

Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms

Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 5, SEPTEMBER 2002 1225 Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms S. Sathiya Keerthi Abstract This paper

More information

ITCT Lecture 8.2: Dictionary Codes and Lempel-Ziv Coding

ITCT Lecture 8.2: Dictionary Codes and Lempel-Ziv Coding ITCT Lecture 8.2: Dictionary Codes and Lempel-Ziv Coding Huffman codes require us to have a fairly reasonable idea of how source symbol probabilities are distributed. There are a number of applications

More information

Comparison of Online Record Linkage Techniques

Comparison of Online Record Linkage Techniques International Research Journal of Engineering and Technology (IRJET) e-issn: 2395-0056 Volume: 02 Issue: 09 Dec-2015 p-issn: 2395-0072 www.irjet.net Comparison of Online Record Linkage Techniques Ms. SRUTHI.

More information

Value Added Association Rules

Value Added Association Rules Value Added Association Rules T.Y. Lin San Jose State University drlin@sjsu.edu Glossary Association Rule Mining A Association Rule Mining is an exploratory learning task to discover some hidden, dependency

More information

Mobile Cloud Multimedia Services Using Enhance Blind Online Scheduling Algorithm

Mobile Cloud Multimedia Services Using Enhance Blind Online Scheduling Algorithm Mobile Cloud Multimedia Services Using Enhance Blind Online Scheduling Algorithm Saiyad Sharik Kaji Prof.M.B.Chandak WCOEM, Nagpur RBCOE. Nagpur Department of Computer Science, Nagpur University, Nagpur-441111

More information

IMPROVING THE RELEVANCY OF DOCUMENT SEARCH USING THE MULTI-TERM ADJACENCY KEYWORD-ORDER MODEL

IMPROVING THE RELEVANCY OF DOCUMENT SEARCH USING THE MULTI-TERM ADJACENCY KEYWORD-ORDER MODEL IMPROVING THE RELEVANCY OF DOCUMENT SEARCH USING THE MULTI-TERM ADJACENCY KEYWORD-ORDER MODEL Lim Bee Huang 1, Vimala Balakrishnan 2, Ram Gopal Raj 3 1,2 Department of Information System, 3 Department

More information

We show that the composite function h, h(x) = g(f(x)) is a reduction h: A m C.

We show that the composite function h, h(x) = g(f(x)) is a reduction h: A m C. 219 Lemma J For all languages A, B, C the following hold i. A m A, (reflexive) ii. if A m B and B m C, then A m C, (transitive) iii. if A m B and B is Turing-recognizable, then so is A, and iv. if A m

More information

Lossless Compression Algorithms

Lossless Compression Algorithms Multimedia Data Compression Part I Chapter 7 Lossless Compression Algorithms 1 Chapter 7 Lossless Compression Algorithms 1. Introduction 2. Basics of Information Theory 3. Lossless Compression Algorithms

More information

SIGNAL COMPRESSION Lecture Lempel-Ziv Coding

SIGNAL COMPRESSION Lecture Lempel-Ziv Coding SIGNAL COMPRESSION Lecture 5 11.9.2007 Lempel-Ziv Coding Dictionary methods Ziv-Lempel 77 The gzip variant of Ziv-Lempel 77 Ziv-Lempel 78 The LZW variant of Ziv-Lempel 78 Asymptotic optimality of Ziv-Lempel

More information

CHAPTER 4 STOCK PRICE PREDICTION USING MODIFIED K-NEAREST NEIGHBOR (MKNN) ALGORITHM

CHAPTER 4 STOCK PRICE PREDICTION USING MODIFIED K-NEAREST NEIGHBOR (MKNN) ALGORITHM CHAPTER 4 STOCK PRICE PREDICTION USING MODIFIED K-NEAREST NEIGHBOR (MKNN) ALGORITHM 4.1 Introduction Nowadays money investment in stock market gains major attention because of its dynamic nature. So the

More information

Character Recognition

Character Recognition Character Recognition 5.1 INTRODUCTION Recognition is one of the important steps in image processing. There are different methods such as Histogram method, Hough transformation, Neural computing approaches

More information

15 Data Compression 2014/9/21. Objectives After studying this chapter, the student should be able to: 15-1 LOSSLESS COMPRESSION

15 Data Compression 2014/9/21. Objectives After studying this chapter, the student should be able to: 15-1 LOSSLESS COMPRESSION 15 Data Compression Data compression implies sending or storing a smaller number of bits. Although many methods are used for this purpose, in general these methods can be divided into two broad categories:

More information

TRIE BASED METHODS FOR STRING SIMILARTIY JOINS

TRIE BASED METHODS FOR STRING SIMILARTIY JOINS TRIE BASED METHODS FOR STRING SIMILARTIY JOINS Venkat Charan Varma Buddharaju #10498995 Department of Computer and Information Science University of MIssissippi ENGR-654 INFORMATION SYSTEM PRINCIPLES RESEARCH

More information

Fundamentals of Multimedia. Lecture 5 Lossless Data Compression Variable Length Coding

Fundamentals of Multimedia. Lecture 5 Lossless Data Compression Variable Length Coding Fundamentals of Multimedia Lecture 5 Lossless Data Compression Variable Length Coding Mahmoud El-Gayyar elgayyar@ci.suez.edu.eg Mahmoud El-Gayyar / Fundamentals of Multimedia 1 Data Compression Compression

More information

6. Advanced Topics in Computability

6. Advanced Topics in Computability 227 6. Advanced Topics in Computability The Church-Turing thesis gives a universally acceptable definition of algorithm Another fundamental concept in computer science is information No equally comprehensive

More information

GENETIC ALGORITHM AND BAYESIAN ATTACK GRAPH FOR SECURITY RISK ANALYSIS AND MITIGATION P.PRAKASH 1 M.

GENETIC ALGORITHM AND BAYESIAN ATTACK GRAPH FOR SECURITY RISK ANALYSIS AND MITIGATION P.PRAKASH 1 M. GENETIC ALGORITHM AND BAYESIAN ATTACK GRAPH FOR SECURITY RISK ANALYSIS AND MITIGATION P.PRAKASH 1 M.SIVAKUMAR 2 1 Assistant Professor/ Dept. of CSE, Vidyaa Vikas College of Engineering and Technology,

More information

7. Decision or classification trees

7. Decision or classification trees 7. Decision or classification trees Next we are going to consider a rather different approach from those presented so far to machine learning that use one of the most common and important data structure,

More information

An Effective Approach to Improve Storage Efficiency Using Variable bit Representation

An Effective Approach to Improve Storage Efficiency Using Variable bit Representation Volume 114 No. 12 2017, 145-154 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu An Effective Approach to Improve Storage Efficiency Using Variable

More information

Parallel Implementation of the NIST Statistical Test Suite

Parallel Implementation of the NIST Statistical Test Suite Parallel Implementation of the NIST Statistical Test Suite Alin Suciu, Iszabela Nagy, Kinga Marton, Ioana Pinca Computer Science Department Technical University of Cluj-Napoca Cluj-Napoca, Romania Alin.Suciu@cs.utcluj.ro,

More information

Statistical Testing of Software Based on a Usage Model

Statistical Testing of Software Based on a Usage Model SOFTWARE PRACTICE AND EXPERIENCE, VOL. 25(1), 97 108 (JANUARY 1995) Statistical Testing of Software Based on a Usage Model gwendolyn h. walton, j. h. poore and carmen j. trammell Department of Computer

More information

Hybrid Feature Selection for Modeling Intrusion Detection Systems

Hybrid Feature Selection for Modeling Intrusion Detection Systems Hybrid Feature Selection for Modeling Intrusion Detection Systems Srilatha Chebrolu, Ajith Abraham and Johnson P Thomas Department of Computer Science, Oklahoma State University, USA ajith.abraham@ieee.org,

More information

Chapter 5 VARIABLE-LENGTH CODING Information Theory Results (II)

Chapter 5 VARIABLE-LENGTH CODING Information Theory Results (II) Chapter 5 VARIABLE-LENGTH CODING ---- Information Theory Results (II) 1 Some Fundamental Results Coding an Information Source Consider an information source, represented by a source alphabet S. S = { s,

More information

Figure-2.1. Information system with encoder/decoders.

Figure-2.1. Information system with encoder/decoders. 2. Entropy Coding In the section on Information Theory, information system is modeled as the generationtransmission-user triplet, as depicted in fig-1.1, to emphasize the information aspect of the system.

More information

FACULTY OF ENGINEERING LAB SHEET INFORMATION THEORY AND ERROR CODING ETM 2126 ETN2126 TRIMESTER 2 (2011/2012)

FACULTY OF ENGINEERING LAB SHEET INFORMATION THEORY AND ERROR CODING ETM 2126 ETN2126 TRIMESTER 2 (2011/2012) FACULTY OF ENGINEERING LAB SHEET INFORMATION THEORY AND ERROR CODING ETM 2126 ETN2126 TRIMESTER 2 (2011/2012) Experiment 1: IT1 Huffman Coding Note: Students are advised to read through this lab sheet

More information

Image Quality Assessment Techniques: An Overview

Image Quality Assessment Techniques: An Overview Image Quality Assessment Techniques: An Overview Shruti Sonawane A. M. Deshpande Department of E&TC Department of E&TC TSSM s BSCOER, Pune, TSSM s BSCOER, Pune, Pune University, Maharashtra, India Pune

More information

An Overview of various methodologies used in Data set Preparation for Data mining Analysis

An Overview of various methodologies used in Data set Preparation for Data mining Analysis An Overview of various methodologies used in Data set Preparation for Data mining Analysis Arun P Kuttappan 1, P Saranya 2 1 M. E Student, Dept. of Computer Science and Engineering, Gnanamani College of

More information

Keywords APSE: Advanced Preferred Search Engine, Google Android Platform, Search Engine, Click-through data, Location and Content Concepts.

Keywords APSE: Advanced Preferred Search Engine, Google Android Platform, Search Engine, Click-through data, Location and Content Concepts. Volume 5, Issue 3, March 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Advanced Preferred

More information

CHAPTER 6. 6 Huffman Coding Based Image Compression Using Complex Wavelet Transform. 6.3 Wavelet Transform based compression technique 106

CHAPTER 6. 6 Huffman Coding Based Image Compression Using Complex Wavelet Transform. 6.3 Wavelet Transform based compression technique 106 CHAPTER 6 6 Huffman Coding Based Image Compression Using Complex Wavelet Transform Page No 6.1 Introduction 103 6.2 Compression Techniques 104 103 6.2.1 Lossless compression 105 6.2.2 Lossy compression

More information

Analysis of Dendrogram Tree for Identifying and Visualizing Trends in Multi-attribute Transactional Data

Analysis of Dendrogram Tree for Identifying and Visualizing Trends in Multi-attribute Transactional Data Analysis of Dendrogram Tree for Identifying and Visualizing Trends in Multi-attribute Transactional Data D.Radha Rani 1, A.Vini Bharati 2, P.Lakshmi Durga Madhuri 3, M.Phaneendra Babu 4, A.Sravani 5 Department

More information

13 File Structures. Source: Foundations of Computer Science Cengage Learning. Objectives After studying this chapter, the student should be able to:

13 File Structures. Source: Foundations of Computer Science Cengage Learning. Objectives After studying this chapter, the student should be able to: 13 File Structures 13.1 Source: Foundations of Computer Science Cengage Learning Objectives After studying this chapter, the student should be able to: Define two categories of access methods: sequential

More information

David Rappaport School of Computing Queen s University CANADA. Copyright, 1996 Dale Carnegie & Associates, Inc.

David Rappaport School of Computing Queen s University CANADA. Copyright, 1996 Dale Carnegie & Associates, Inc. David Rappaport School of Computing Queen s University CANADA Copyright, 1996 Dale Carnegie & Associates, Inc. Data Compression There are two broad categories of data compression: Lossless Compression

More information

An Evaluation of Information Retrieval Accuracy. with Simulated OCR Output. K. Taghva z, and J. Borsack z. University of Massachusetts, Amherst

An Evaluation of Information Retrieval Accuracy. with Simulated OCR Output. K. Taghva z, and J. Borsack z. University of Massachusetts, Amherst An Evaluation of Information Retrieval Accuracy with Simulated OCR Output W.B. Croft y, S.M. Harding y, K. Taghva z, and J. Borsack z y Computer Science Department University of Massachusetts, Amherst

More information

SIMILARITY MEASURES FOR MULTI-VALUED ATTRIBUTES FOR DATABASE CLUSTERING

SIMILARITY MEASURES FOR MULTI-VALUED ATTRIBUTES FOR DATABASE CLUSTERING SIMILARITY MEASURES FOR MULTI-VALUED ATTRIBUTES FOR DATABASE CLUSTERING TAE-WAN RYU AND CHRISTOPH F. EICK Department of Computer Science, University of Houston, Houston, Texas 77204-3475 {twryu, ceick}@cs.uh.edu

More information

A Comparison of Text-Categorization Methods applied to N-Gram Frequency Statistics

A Comparison of Text-Categorization Methods applied to N-Gram Frequency Statistics A Comparison of Text-Categorization Methods applied to N-Gram Frequency Statistics Helmut Berger and Dieter Merkl 2 Faculty of Information Technology, University of Technology, Sydney, NSW, Australia hberger@it.uts.edu.au

More information

Powered Outer Probabilistic Clustering

Powered Outer Probabilistic Clustering Proceedings of the World Congress on Engineering and Computer Science 217 Vol I WCECS 217, October 2-27, 217, San Francisco, USA Powered Outer Probabilistic Clustering Peter Taraba Abstract Clustering

More information

Optimal Detector Locations for OD Matrix Estimation

Optimal Detector Locations for OD Matrix Estimation Optimal Detector Locations for OD Matrix Estimation Ying Liu 1, Xiaorong Lai, Gang-len Chang 3 Abstract This paper has investigated critical issues associated with Optimal Detector Locations for OD matrix

More information

Basic Concepts of Reliability

Basic Concepts of Reliability Basic Concepts of Reliability Reliability is a broad concept. It is applied whenever we expect something to behave in a certain way. Reliability is one of the metrics that are used to measure quality.

More information

Engineering Mathematics II Lecture 16 Compression

Engineering Mathematics II Lecture 16 Compression 010.141 Engineering Mathematics II Lecture 16 Compression Bob McKay School of Computer Science and Engineering College of Engineering Seoul National University 1 Lossless Compression Outline Huffman &

More information

INFORMATION RETRIEVAL SYSTEM: CONCEPT AND SCOPE

INFORMATION RETRIEVAL SYSTEM: CONCEPT AND SCOPE 15 : CONCEPT AND SCOPE 15.1 INTRODUCTION Information is communicated or received knowledge concerning a particular fact or circumstance. Retrieval refers to searching through stored information to find

More information

DESIGN AND ANALYSIS OF ALGORITHMS. Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES

DESIGN AND ANALYSIS OF ALGORITHMS. Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES DESIGN AND ANALYSIS OF ALGORITHMS Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES http://milanvachhani.blogspot.in USE OF LOOPS As we break down algorithm into sub-algorithms, sooner or later we shall

More information

Data analysis using Microsoft Excel

Data analysis using Microsoft Excel Introduction to Statistics Statistics may be defined as the science of collection, organization presentation analysis and interpretation of numerical data from the logical analysis. 1.Collection of Data

More information

BLIND QUALITY ASSESSMENT OF JPEG2000 COMPRESSED IMAGES USING NATURAL SCENE STATISTICS. Hamid R. Sheikh, Alan C. Bovik and Lawrence Cormack

BLIND QUALITY ASSESSMENT OF JPEG2000 COMPRESSED IMAGES USING NATURAL SCENE STATISTICS. Hamid R. Sheikh, Alan C. Bovik and Lawrence Cormack BLIND QUALITY ASSESSMENT OF JPEG2 COMPRESSED IMAGES USING NATURAL SCENE STATISTICS Hamid R. Sheikh, Alan C. Bovik and Lawrence Cormack Laboratory for Image and Video Engineering, Department of Electrical

More information

The Fuzzy Search for Association Rules with Interestingness Measure

The Fuzzy Search for Association Rules with Interestingness Measure The Fuzzy Search for Association Rules with Interestingness Measure Phaichayon Kongchai, Nittaya Kerdprasop, and Kittisak Kerdprasop Abstract Association rule are important to retailers as a source of

More information

Association Rule Mining. Introduction 46. Study core 46

Association Rule Mining. Introduction 46. Study core 46 Learning Unit 7 Association Rule Mining Introduction 46 Study core 46 1 Association Rule Mining: Motivation and Main Concepts 46 2 Apriori Algorithm 47 3 FP-Growth Algorithm 47 4 Assignment Bundle: Frequent

More information

Enhancing Internet Search Engines to Achieve Concept-based Retrieval

Enhancing Internet Search Engines to Achieve Concept-based Retrieval Enhancing Internet Search Engines to Achieve Concept-based Retrieval Fenghua Lu 1, Thomas Johnsten 2, Vijay Raghavan 1 and Dennis Traylor 3 1 Center for Advanced Computer Studies University of Southwestern

More information

CS 493: Algorithms for Massive Data Sets Dictionary-based compression February 14, 2002 Scribe: Tony Wirth LZ77

CS 493: Algorithms for Massive Data Sets Dictionary-based compression February 14, 2002 Scribe: Tony Wirth LZ77 CS 493: Algorithms for Massive Data Sets February 14, 2002 Dictionary-based compression Scribe: Tony Wirth This lecture will explore two adaptive dictionary compression schemes: LZ77 and LZ78. We use the

More information

Cross-validation. Cross-validation is a resampling method.

Cross-validation. Cross-validation is a resampling method. Cross-validation Cross-validation is a resampling method. It refits a model of interest to samples formed from the training set, in order to obtain additional information about the fitted model. For example,

More information

The Cross-Entropy Method

The Cross-Entropy Method The Cross-Entropy Method Guy Weichenberg 7 September 2003 Introduction This report is a summary of the theory underlying the Cross-Entropy (CE) method, as discussed in the tutorial by de Boer, Kroese,

More information

Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES

Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES DESIGN AND ANALYSIS OF ALGORITHMS Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES http://milanvachhani.blogspot.in USE OF LOOPS As we break down algorithm into sub-algorithms, sooner or later we shall

More information

Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay

Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 29 Source Coding (Part-4) We have already had 3 classes on source coding

More information

Handling Missing Values via Decomposition of the Conditioned Set

Handling Missing Values via Decomposition of the Conditioned Set Handling Missing Values via Decomposition of the Conditioned Set Mei-Ling Shyu, Indika Priyantha Kuruppu-Appuhamilage Department of Electrical and Computer Engineering, University of Miami Coral Gables,

More information

Using the Kolmogorov-Smirnov Test for Image Segmentation

Using the Kolmogorov-Smirnov Test for Image Segmentation Using the Kolmogorov-Smirnov Test for Image Segmentation Yong Jae Lee CS395T Computational Statistics Final Project Report May 6th, 2009 I. INTRODUCTION Image segmentation is a fundamental task in computer

More information

WIRE/WIRELESS SENSOR NETWORKS USING K-RLE ALGORITHM FOR A LOW POWER DATA COMPRESSION

WIRE/WIRELESS SENSOR NETWORKS USING K-RLE ALGORITHM FOR A LOW POWER DATA COMPRESSION WIRE/WIRELESS SENSOR NETWORKS USING K-RLE ALGORITHM FOR A LOW POWER DATA COMPRESSION V.KRISHNAN1, MR. R.TRINADH 2 1 M. Tech Student, 2 M. Tech., Assistant Professor, Dept. Of E.C.E, SIR C.R. Reddy college

More information

FCA-based Search for Duplicate objects in Ontologies

FCA-based Search for Duplicate objects in Ontologies FCA-based Search for Duplicate objects in Ontologies Dmitry A. Ilvovsky and Mikhail A. Klimushkin National Research University Higher School of Economics, School of Applied Mathematics and Informational

More information

EE-575 INFORMATION THEORY - SEM 092

EE-575 INFORMATION THEORY - SEM 092 EE-575 INFORMATION THEORY - SEM 092 Project Report on Lempel Ziv compression technique. Department of Electrical Engineering Prepared By: Mohammed Akber Ali Student ID # g200806120. ------------------------------------------------------------------------------------------------------------------------------------------

More information

On the Relationship of Server Disk Workloads and Client File Requests

On the Relationship of Server Disk Workloads and Client File Requests On the Relationship of Server Workloads and Client File Requests John R. Heath Department of Computer Science University of Southern Maine Portland, Maine 43 Stephen A.R. Houser University Computing Technologies

More information

Multimedia Data and Its Encoding

Multimedia Data and Its Encoding Lecture 13 Multimedia Data and Its Encoding M. Adnan Quaium Assistant Professor Department of Electrical and Electronic Engineering Ahsanullah University of Science and Technology Room 4A07 Email adnan.eee@aust.edu

More information

Information Technology Department, PCCOE-Pimpri Chinchwad, College of Engineering, Pune, Maharashtra, India 2

Information Technology Department, PCCOE-Pimpri Chinchwad, College of Engineering, Pune, Maharashtra, India 2 Volume 5, Issue 5, May 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Adaptive Huffman

More information

Template Extraction from Heterogeneous Web Pages

Template Extraction from Heterogeneous Web Pages Template Extraction from Heterogeneous Web Pages 1 Mrs. Harshal H. Kulkarni, 2 Mrs. Manasi k. Kulkarni Asst. Professor, Pune University, (PESMCOE, Pune), Pune, India Abstract: Templates are used by many

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY Rashmi Gadbail,, 2013; Volume 1(8): 783-791 INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK EFFECTIVE XML DATABASE COMPRESSION

More information

Selection of Best Web Site by Applying COPRAS-G method Bindu Madhuri.Ch #1, Anand Chandulal.J #2, Padmaja.M #3

Selection of Best Web Site by Applying COPRAS-G method Bindu Madhuri.Ch #1, Anand Chandulal.J #2, Padmaja.M #3 Selection of Best Web Site by Applying COPRAS-G method Bindu Madhuri.Ch #1, Anand Chandulal.J #2, Padmaja.M #3 Department of Computer Science & Engineering, Gitam University, INDIA 1. binducheekati@gmail.com,

More information

Texture Image Segmentation using FCM

Texture Image Segmentation using FCM Proceedings of 2012 4th International Conference on Machine Learning and Computing IPCSIT vol. 25 (2012) (2012) IACSIT Press, Singapore Texture Image Segmentation using FCM Kanchan S. Deshmukh + M.G.M

More information

EFFICIENT ATTACKS ON HOMOPHONIC SUBSTITUTION CIPHERS

EFFICIENT ATTACKS ON HOMOPHONIC SUBSTITUTION CIPHERS EFFICIENT ATTACKS ON HOMOPHONIC SUBSTITUTION CIPHERS A Project Report Presented to The faculty of the Department of Computer Science San Jose State University In Partial Fulfillment of the Requirements

More information

A Survey on Keyword Diversification Over XML Data

A Survey on Keyword Diversification Over XML Data ISSN (Online) : 2319-8753 ISSN (Print) : 2347-6710 International Journal of Innovative Research in Science, Engineering and Technology An ISO 3297: 2007 Certified Organization Volume 6, Special Issue 5,

More information

Spatio-temporal Range Searching Over Compressed Kinetic Sensor Data. Sorelle A. Friedler Google Joint work with David M. Mount

Spatio-temporal Range Searching Over Compressed Kinetic Sensor Data. Sorelle A. Friedler Google Joint work with David M. Mount Spatio-temporal Range Searching Over Compressed Kinetic Sensor Data Sorelle A. Friedler Google Joint work with David M. Mount Motivation Kinetic data: data generated by moving objects Sensors collect data

More information

AI32 Guide to Weka. Andrew Roberts 1st March 2005

AI32 Guide to Weka. Andrew Roberts   1st March 2005 AI32 Guide to Weka Andrew Roberts http://www.comp.leeds.ac.uk/andyr 1st March 2005 1 Introduction Weka is an excellent system for learning about machine learning techniques. Of course, it is a generic

More information

DATA MODELS FOR SEMISTRUCTURED DATA

DATA MODELS FOR SEMISTRUCTURED DATA Chapter 2 DATA MODELS FOR SEMISTRUCTURED DATA Traditionally, real world semantics are captured in a data model, and mapped to the database schema. The real world semantics are modeled as constraints and

More information

A Connection between Network Coding and. Convolutional Codes

A Connection between Network Coding and. Convolutional Codes A Connection between Network Coding and 1 Convolutional Codes Christina Fragouli, Emina Soljanin christina.fragouli@epfl.ch, emina@lucent.com Abstract The min-cut, max-flow theorem states that a source

More information

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant

More information

Algorithms Interview Questions

Algorithms Interview Questions Algorithms Interview Questions 1. Define the concept of an algorithm. An algorithm is any well-defined computational procedure that takes some value (or set of values) as input and produces some value

More information

Closed Pattern Mining from n-ary Relations

Closed Pattern Mining from n-ary Relations Closed Pattern Mining from n-ary Relations R V Nataraj Department of Information Technology PSG College of Technology Coimbatore, India S Selvan Department of Computer Science Francis Xavier Engineering

More information

This module presents the star schema, an alternative to 3NF schemas intended for analytical databases.

This module presents the star schema, an alternative to 3NF schemas intended for analytical databases. Topic 3.3: Star Schema Design This module presents the star schema, an alternative to 3NF schemas intended for analytical databases. Star Schema Overview The star schema is a simple database architecture

More information

KEYWORDS: Clustering, RFPCM Algorithm, Ranking Method, Query Redirection Method.

KEYWORDS: Clustering, RFPCM Algorithm, Ranking Method, Query Redirection Method. IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY IMPROVED ROUGH FUZZY POSSIBILISTIC C-MEANS (RFPCM) CLUSTERING ALGORITHM FOR MARKET DATA T.Buvana*, Dr.P.krishnakumari *Research

More information

IMAGE COMPRESSION TECHNIQUES

IMAGE COMPRESSION TECHNIQUES IMAGE COMPRESSION TECHNIQUES A.VASANTHAKUMARI, M.Sc., M.Phil., ASSISTANT PROFESSOR OF COMPUTER SCIENCE, JOSEPH ARTS AND SCIENCE COLLEGE, TIRUNAVALUR, VILLUPURAM (DT), TAMIL NADU, INDIA ABSTRACT A picture

More information

Lempel-Ziv compression: how and why?

Lempel-Ziv compression: how and why? Lempel-Ziv compression: how and why? Algorithms on Strings Paweł Gawrychowski July 9, 2013 s July 9, 2013 2/18 Outline Lempel-Ziv compression Computing the factorization Using the factorization July 9,

More information

TIME COMPLEXITY ANALYSIS OF THE BINARY TREE ROLL ALGORITHM

TIME COMPLEXITY ANALYSIS OF THE BINARY TREE ROLL ALGORITHM TIME COMPLEXITY ANALYSIS OF THE BINARY TREE ROLL ALGORITHM JITA 6(2016) 2:53-62 TIME COMPLEXITY ANALYSIS OF THE BINARY TREE ROLL ALGORITHM Adrijan Božinovski 1, George Tanev 1, Biljana Stojčevska 1, Veno

More information

Keywords hierarchic clustering, distance-determination, adaptation of quality threshold algorithm, depth-search, the best first search.

Keywords hierarchic clustering, distance-determination, adaptation of quality threshold algorithm, depth-search, the best first search. Volume 4, Issue 3, March 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Distance-based

More information

D-Optimal Designs. Chapter 888. Introduction. D-Optimal Design Overview

D-Optimal Designs. Chapter 888. Introduction. D-Optimal Design Overview Chapter 888 Introduction This procedure generates D-optimal designs for multi-factor experiments with both quantitative and qualitative factors. The factors can have a mixed number of levels. For example,

More information

Adaptive Supersampling Using Machine Learning Techniques

Adaptive Supersampling Using Machine Learning Techniques Adaptive Supersampling Using Machine Learning Techniques Kevin Winner winnerk1@umbc.edu Abstract Previous work in adaptive supersampling methods have utilized algorithmic approaches to analyze properties

More information

NON-CALCULATOR ARITHMETIC

NON-CALCULATOR ARITHMETIC Mathematics Revision Guides Non-Calculator Arithmetic Page 1 of 30 M.K. HOME TUITION Mathematics Revision Guides: Level: GCSE Foundation Tier NON-CALCULATOR ARITHMETIC Version: 3.2 Date: 21-10-2016 Mathematics

More information

Risk-based Object Oriented Testing

Risk-based Object Oriented Testing Risk-based Object Oriented Testing Linda H. Rosenberg, Ph.D. Ruth Stapko Albert Gallo NASA GSFC SATC NASA, Unisys SATC NASA, Unisys Code 302 Code 300.1 Code 300.1 Greenbelt, MD 20771 Greenbelt, MD 20771

More information

Multimedia Networking ECE 599

Multimedia Networking ECE 599 Multimedia Networking ECE 599 Prof. Thinh Nguyen School of Electrical Engineering and Computer Science Based on B. Lee s lecture notes. 1 Outline Compression basics Entropy and information theory basics

More information

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data American Journal of Applied Sciences (): -, ISSN -99 Science Publications Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data Ibrahiem M.M. El Emary and Ja'far

More information

Two Hierarchical Dynamic Load Balancing Algorithms in Distributed Systems

Two Hierarchical Dynamic Load Balancing Algorithms in Distributed Systems 29 Second International Conference on Computer and Electrical Engineering Two Hierarchical Dynamic Load Balancing Algorithms in Distributed Systems Iman Barazandeh Dept. of Computer Engineering IAU, Mahshahr

More information

Cluster quality assessment by the modified Renyi-ClipX algorithm

Cluster quality assessment by the modified Renyi-ClipX algorithm Issue 3, Volume 4, 2010 51 Cluster quality assessment by the modified Renyi-ClipX algorithm Dalia Baziuk, Aleksas Narščius Abstract This paper presents the modified Renyi-CLIPx clustering algorithm and

More information

Exploiting Unused Spare Columns to Improve Memory ECC

Exploiting Unused Spare Columns to Improve Memory ECC 2009 27th IEEE VLSI Test Symposium Exploiting Unused Spare Columns to Improve Memory ECC Rudrajit Datta and Nur A. Touba Computer Engineering Research Center Department of Electrical and Computer Engineering

More information

Achieving Distributed Buffering in Multi-path Routing using Fair Allocation

Achieving Distributed Buffering in Multi-path Routing using Fair Allocation Achieving Distributed Buffering in Multi-path Routing using Fair Allocation Ali Al-Dhaher, Tricha Anjali Department of Electrical and Computer Engineering Illinois Institute of Technology Chicago, Illinois

More information

A New Technique to Optimize User s Browsing Session using Data Mining

A New Technique to Optimize User s Browsing Session using Data Mining Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,

More information

A SYSTEM FOR PROGRAM EXECUTION IDENTIFICATION ON THE MICROSOFT WINDOWS PLATFORMS

A SYSTEM FOR PROGRAM EXECUTION IDENTIFICATION ON THE MICROSOFT WINDOWS PLATFORMS A SYSTEM FOR PROGRAM EXECUTION IDENTIFICATION ON THE MICROSOFT WINDOWS PLATFORMS Yujiang Xiong Zhiqing Liu Hu Li School of Software Engineering Beijing University of Post and Telecommunications Beijing

More information

Probability Models.S4 Simulating Random Variables

Probability Models.S4 Simulating Random Variables Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard Probability Models.S4 Simulating Random Variables In the fashion of the last several sections, we will often create probability

More information

An Efficient Technique for Tag Extraction and Content Retrieval from Web Pages

An Efficient Technique for Tag Extraction and Content Retrieval from Web Pages An Efficient Technique for Tag Extraction and Content Retrieval from Web Pages S.Sathya M.Sc 1, Dr. B.Srinivasan M.C.A., M.Phil, M.B.A., Ph.D., 2 1 Mphil Scholar, Department of Computer Science, Gobi Arts

More information

Encoding. A thesis submitted to the Graduate School of University of Cincinnati in

Encoding. A thesis submitted to the Graduate School of University of Cincinnati in Lossless Data Compression for Security Purposes Using Huffman Encoding A thesis submitted to the Graduate School of University of Cincinnati in a partial fulfillment of requirements for the degree of Master

More information

Chapter 4: Implicit Error Detection

Chapter 4: Implicit Error Detection 4. Chpter 5 Chapter 4: Implicit Error Detection Contents 4.1 Introduction... 4-2 4.2 Network error correction... 4-2 4.3 Implicit error detection... 4-3 4.4 Mathematical model... 4-6 4.5 Simulation setup

More information

Holistic Correlation of Color Models, Color Features and Distance Metrics on Content-Based Image Retrieval

Holistic Correlation of Color Models, Color Features and Distance Metrics on Content-Based Image Retrieval Holistic Correlation of Color Models, Color Features and Distance Metrics on Content-Based Image Retrieval Swapnil Saurav 1, Prajakta Belsare 2, Siddhartha Sarkar 3 1Researcher, Abhidheya Labs and Knowledge

More information

Evaluating Classifiers

Evaluating Classifiers Evaluating Classifiers Charles Elkan elkan@cs.ucsd.edu January 18, 2011 In a real-world application of supervised learning, we have a training set of examples with labels, and a test set of examples with

More information

Survey Paper on Efficient and Secure Dynamic Auditing Protocol for Data Storage in Cloud

Survey Paper on Efficient and Secure Dynamic Auditing Protocol for Data Storage in Cloud Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 1, January 2014,

More information

A Novel Image Compression Technique using Simple Arithmetic Addition

A Novel Image Compression Technique using Simple Arithmetic Addition Proc. of Int. Conf. on Recent Trends in Information, Telecommunication and Computing, ITC A Novel Image Compression Technique using Simple Arithmetic Addition Nadeem Akhtar, Gufran Siddiqui and Salman

More information

Data Structure for Association Rule Mining: T-Trees and P-Trees

Data Structure for Association Rule Mining: T-Trees and P-Trees IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 16, NO. 6, JUNE 2004 1 Data Structure for Association Rule Mining: T-Trees and P-Trees Frans Coenen, Paul Leng, and Shakil Ahmed Abstract Two new

More information

Uncertain Data Classification Using Decision Tree Classification Tool With Probability Density Function Modeling Technique

Uncertain Data Classification Using Decision Tree Classification Tool With Probability Density Function Modeling Technique Research Paper Uncertain Data Classification Using Decision Tree Classification Tool With Probability Density Function Modeling Technique C. Sudarsana Reddy 1 S. Aquter Babu 2 Dr. V. Vasu 3 Department

More information

The Design Space of Software Development Methodologies

The Design Space of Software Development Methodologies The Design Space of Software Development Methodologies Kadie Clancy, CS2310 Term Project I. INTRODUCTION The success of a software development project depends on the underlying framework used to plan and

More information