An Effective Approach to Improve Storage Efficiency Using Variable bit Representation

Similar documents
Comparison between Variable Bit Representation Techniques for Text Data Compression

A New Algorithm based on Variable BIT Representation Technique for Text Data Compression

Keywords Data compression, Lossless data compression technique, Huffman Coding, Arithmetic coding etc.

A Comparative Study of Entropy Encoding Techniques for Lossless Text Data Compression

Text Data Compression and Decompression Using Modified Deflate Algorithm

Research Article Does an Arithmetic Coding Followed by Run-length Coding Enhance the Compression Ratio?

EE67I Multimedia Communication Systems Lecture 4

An Advanced Text Encryption & Compression System Based on ASCII Values & Arithmetic Encoding to Improve Data Security

Data Compression. Media Signal Processing, Presentation 2. Presented By: Jahanzeb Farooq Michael Osadebey

Business Analytics. SQL PL SQL [Oracle 10 g] P r i n c e S e t h i w w w. x l m a c r o. w e b s. c o m

INTRODUCTION TO DATABASE

Engineering Mathematics II Lecture 16 Compression

Lossless Compression Algorithms

Database Management System 9

A Comprehensive Review of Data Compression Techniques

Fundamentals of Multimedia. Lecture 5 Lossless Data Compression Variable Length Coding

Topic 5 Image Compression

5. Single-row function

Information Technology Department, PCCOE-Pimpri Chinchwad, College of Engineering, Pune, Maharashtra, India 2

A Research Paper on Lossless Data Compression Techniques

Image compression. Stefano Ferrari. Università degli Studi di Milano Methods for Image Processing. academic year

Oracle Database 10g: Introduction to SQL

OPTIMIZATION OF LZW (LEMPEL-ZIV-WELCH) ALGORITHM TO REDUCE TIME COMPLEXITY FOR DICTIONARY CREATION IN ENCODING AND DECODING

International Journal of Advanced Research in Computer Science and Software Engineering

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

normalization are being violated o Apply the rule of Third Normal Form to resolve a violation in the model

HARDWARE IMPLEMENTATION OF LOSSLESS LZMA DATA COMPRESSION ALGORITHM

Image coding and compression

Table of Contents. PDF created with FinePrint pdffactory Pro trial version

Image Compression for Mobile Devices using Prediction and Direct Coding Approach

David Rappaport School of Computing Queen s University CANADA. Copyright, 1996 Dale Carnegie & Associates, Inc.

DEFLATE COMPRESSION ALGORITHM

Analysis of Parallelization Effects on Textual Data Compression

Oracle. SQL(Structured Query Language) Introduction of DBMS. Build In Function. Introduction of RDBMS. Grouping the Result of a Query

IMAGE COMPRESSION. Image Compression. Why? Reducing transportation times Reducing file size. A two way event - compression and decompression

IMAGE COMPRESSION TECHNIQUES

Multimedia Systems. Part 20. Mahdi Vasighi

Course Outline and Objectives: Database Programming with SQL

A Unit of SequelGate Innovative Technologies Pvt. Ltd. All Training Sessions are Completely Practical & Real-time

Introduction to Computer Science and Business

Oracle SQL & PL SQL Course

Journal of Computer Engineering and Technology (IJCET), ISSN (Print), International Journal of Computer Engineering

JSPM s Bhivarabai Sawant Institute of Technology & Research, Wagholi, Pune Department of Information Technology

Oracle Database 11g: SQL and PL/SQL Fundamentals

Oracle Database: Introduction to SQL

Data Compression. An overview of Compression. Multimedia Systems and Applications. Binary Image Compression. Binary Image Compression

Oracle Database: Introduction to SQL

CS 335 Graphics and Multimedia. Image Compression

A COMPRESSION TECHNIQUES IN DIGITAL IMAGE PROCESSING - REVIEW

Multimedia Networking ECE 599

UNIT-IV (Relational Database Language, PL/SQL)

Image Compression - An Overview Jagroop Singh 1

Oracle Syllabus Course code-r10605 SQL

Lab # 2. Data Definition Language (DDL) Eng. Alaa O Shama

Oracle Database: Introduction to SQL

Oracle Database: SQL and PL/SQL Fundamentals NEW

Ch. 2: Compression Basics Multimedia Systems

Oracle Database: SQL and PL/SQL Fundamentals Ed 2

Department of electronics and telecommunication, J.D.I.E.T.Yavatmal, India 2

2-D SIGNAL PROCESSING FOR IMAGE COMPRESSION S. Venkatesan, Vibhuti Narain Rai

Introduction. Introduction to Oracle: SQL and PL/SQL

A Comparative Study of Lossless Compression Algorithm on Text Data

Welcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Lecture 10 (Chapter 7) ZHU Yongxin, Winson

1 Overview of Database Management

15 Data Compression 2014/9/21. Objectives After studying this chapter, the student should be able to: 15-1 LOSSLESS COMPRESSION

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

An Efficient Compression Technique Using Arithmetic Coding

SQL Interview Questions

ENTROPY ENCODERS: HUFFMAN CODING AND ARITHMETIC CODING 1

Introduction to SQL/PLSQL Accelerated Ed 2

Solved MCQ on fundamental of DBMS. Set-1

EDUVITZ TECHNOLOGIES

Figure-2.1. Information system with encoder/decoders.

Message Communication A New Approach

A Compression Method for PML Document based on Internet of Things

STRUCTURED QUERY LANGUAGE (SQL)

A New Compression Method Strictly for English Textual Data

Oracle Database: Introduction to SQL/PLSQL Accelerated

THE RELATIVE EFFICIENCY OF DATA COMPRESSION BY LZW AND LZSS

Lab # 4. Data Definition Language (DDL)

SQL (Structured Query Language)

1 Writing Basic SQL SELECT Statements 2 Restricting and Sorting Data

RDBMS-Day3. SQL Basic DDL statements DML statements Aggregate functions

IMAGE PROCESSING (RRY025) LECTURE 13 IMAGE COMPRESSION - I

A Novel Image Compression Technique using Simple Arithmetic Addition

Data Compression Techniques for Big Data

Oracle Database: SQL and PL/SQL Fundamentals

EE-575 INFORMATION THEORY - SEM 092

CHAPTER 4 REVERSIBLE IMAGE WATERMARKING USING BIT PLANE CODING AND LIFTING WAVELET TRANSFORM

Optimization of Bit Rate in Medical Image Compression

Chapter 1. Digital Data Representation and Communication. Part 2

A Novel Approach for Reduction of Huffman Cost Table in Image Compression

Highly Secure Invertible Data Embedding Scheme Using Histogram Shifting Method

Introduction to Computer Science and Business

Repetition 1st lecture

Oracle Database 11g: Introduction to SQLRelease 2

Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay

AO3 - Version: 2. Oracle Database 11g SQL

Textual Data Compression Speedup by Parallelization

Oracle Database: Introduction to SQL Ed 2

Transcription:

Volume 114 No. 12 2017, 145-154 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu An Effective Approach to Improve Storage Efficiency Using Variable bit Representation 1 R. Anoop, 2 Subhadra G. Varma and 3 V.R. Rajalakshmi 1 Department of Computer Science & IT, School of Arts and Sciences, Amrita University, Kochi. 2 Department of Computer Science & IT, School of Arts and Sciences, Amrita University, Kochi. 3 Department of Computer Science & IT, School of Arts and Sciences, Amrita University, Kochi. Abstract Compression techniques can be used for decreasing the cost of data storage and transmission capacity by reducing the redundancy within a dataset. Data compression is applied by encoding information using lesser number of bits than the actual representation. These techniques maybe either lossy or lossless. In lossless compression, the actual data could be completely reconstructed from the compressed data whereas in lossy compression, the data cannot be reverted completely back to 100% of its initial state. In this paper, we suggest a lossless compression method using the concept of variable bits. Here, the distinct repetitive values of an attribute in a database are represented as binary values where the data items with the highest frequency are assigned the lowest binary(0) value and the subsequent data item is assigned the next value(1) and so on. Also, the actual bit length is reduced by removing any redundant leading zeroes whenever necessary. This ensures that the most repeated data item is assigned the lowest possible bit value and with the lowest bit length. Key Words:Database, compression, variable-bit, lossless, PL/ SQL. 145

1. Introduction A relational database is a set of data items systematically organized as tables from which data can be retrieved or reassembled without restructuring the tables. The data items can be large in number with unique and repetitive values which requires a large amount of storage space and high storage cost. The efficiency of storage in a database could be improved by incorporating data compression. Compression reduces the quantity of data used to represent a file without excessively compromising the quality of the original data. Lossless compression:.lossless compression is applied in cases where it is essential that the data derived after decompression be exactly identical with the original data. That is, no portion of the data is lost during compression. Lossless compression focuses more on preserving the integrity of the data rather than the compression efficiency. Lossy compression: In lossy data compression, the data extracted after decompression may not be exactly same as the original i.e. some fractions of the data might be lost (hence the name lossy), but is good enough to be useful for specific purposes. When the compressed message is decoded it does not give back the original message intact. As a matter of fact, lossy compression focuses more on maximal compression rather than the integrity of the data. 2. Related Works Nimisha et al [1] proposed a lossless compression method which makes use of binary values to represent every distinct attribute in a database. In this method, the count of distinct attributes (n) were found and the number of bits needed to represent the attribute values was calculated (using the general rule, with n bits- combinations can be represented ). A new table was created with unique attribute values and their corresponding bit values. The original table was updated with the corresponding bit values. The frequency of the attributes however were not taken into consideration and equal number of bits were used to represent every distinct attribute values. As an example, 4 unique attributes can be represented in 2 bit combinations (.The combinations are 00, 01, 10, 11. S.R. Kodituwakku et al [2] performed experimental comparisons of different lossless compression algorithms for text data. Although they were tested on different type of files, the main interest was on different test patterns. By considering the compression times, decompression times and saving percentages of all the algorithms, the Shannon Fano algorithm was considered as the most efficient algorithm among the selected ones. 146

Amandeep Singh et al [3] developed a dynamic bit reduction algorithm to compress and decompress the text data based on lossless data compression approach. Various experiments were conducted on different datasets such as Random, Alphanumeric, Numeral and Special Characters dataset. The results obtained by the proposed system were compared with the existing data compression techniques- Bit Reduction and Huffman Coding using parameters- Compression Ratio and saving percentage. It was observed that the proposed system shows very good compression results in terms of Compression Ratio and Saving Percentage. Rupinder Singh et al[4] also proposed a new a bit reduction algorithm used for compression of text data based on existing compression algorithms. This algorithm employed the technique of saving bits. The compression algorithm took O (n) time, where n is the total number of characters in the file. Since the differential breaking follows Divide and Conquer policy, it takes O (n log n) time. So, the total computation time required for this algorithm is proportional to O (n log n). Shrusti Porwal et al [5] compared the lossless data compressions (Huffman and arithmetic encoding) and their performances. Stepwise algorithmic processes and various performance measures had been performed according to the criteria, to analysis which technique is better. The performance were calculated based on compression ratio, compression speed, decompression speed, memory space needed, compressed pattern matching and permits random access. It was observed that arithmetic encoding results the best compression ratio compared to Huffman compression. Figure 1: Lossless Compression 3. Methodology A. PL/SQL Figure 2: Losssy Compression It is a procedural language which encompasses SQL statements within its syntax. It was developed by Oracle to extend the features of SQL and to embed the features of procedural programming within SQL. There are generally six types of SQL commands: a. Data Definition Language- DDL can be used for purposes like creation of database objects and also to restructure them. Some common 147

examples are CREATE TABLE, DROP TABLE, ALTER TABLE etc. b. Data Manipulation Language DML commands are used to insert, delete and modify data. The DML commands are: INSERT, UPDATE, DELETE. c. Data Query Language -DQL allows to access/retrieve data from the database. The basic command used is SELECT. d. Data Control Language -DCL commands provides privileges to users to regulate data access within the database.the commands used are: ALTER PASSWORD, GRANT, REVOKE, CREATE SYNONYM. e. Data administration commands - These commands enable users to diagnose system performance by carrying out audits within database. Examples are: START AUDIT, STOP AUDIT f. Transactional control commands (TCL) These commands are used for managing database transactions. A few of these commands are:- COMMIT, ROLLBACK, SAVEPOINT, SET TRANSACTION B. PL/SQL Dynamic SQL Dynamic SQL methodology enables creating and running SQL statements at run time. It is useful for the following a. Writing general-purpose and flexible programs like ad hoc query systems b. Writing programs that must run database definition language (DDL) statements c. When the data type/number of input and output variables is missing or when the whole text of an SQL statement is unavailable during compilation. The steps to be taken when data manipulation operations are performed: 4. Proposed Work Step 1: Select the count of distinct data items from the selected column. Example: select count(distinct col_name) from table name Step 2: Calculate the count (n) of bits needed to represent the attribute (according to general rule, with n bits we can represent unique combinations). Step 3: Calculate the unique data items in for every attribute in the table and sort them in the decreased of their frequency. Example: select col_name, count(col_name) from table name; group by col_name order by count(col_name)desc Step 4: Create a table(index table) with the data items and their frequency along with their corresponding binary values. Attributes are assigned the bit values based on the decreasing order of their frequencies. Step 5: Eliminate the leading zeroes from the binary values. Step 6: Update the database by replacing the data items with their corresponding V-bit (variable bit) values. 148

INSERT: When new values are inserted, calculate the count of repeated values, recalculate the bit values and find the binary equivalent for the same. DELETE: When new values are deleted, same process as insertion to be performed, that is calculating the count, recalculating the bit values and finding the binary equivalent for the current. UPDATE: Updating the index table with the new value. SELECT: We can select and check values from the index table instead of original table. Here, a table is created at run time with attribute values and their corresponding frequencies along with an index table with attribute values and their corresponding binary values. The data item with the highest count is assigned the lowest binary value (0) and following item is allocated the next value (1) and so on. The leading zeroes are eliminated from the binary values (resulting in varying bit values) to reduce the storage space and thus increasing the performance efficiency. 5. Experiments and Results A sample dataset of Superstore Sales is being used for the experimentation of this paper, which contains distinct repetitive values. The table structure is as follows: Figure 3: Dataset: store sales 149

From the given dataset, we select the attributes which have unique repeating values which would require a considerable amount of storage space. Afterwards, we calculate the counts of distinct repetitive values from each selected column. For example, the count of repetitive values of the column Product Category is 3. According to the general rule, with n bits we can represent unique combination. Here we have 3 unique combinations and we need a maximum of two bits to represent all the values. The actual bit values for these combinations are represented as: 00, 01, 10. By further implementing the concept of variable-bits, the values are truncated and represented simply as 0, 1, 10. Now we are substituting these values to the attributes and creating an index table with the attribute values and their corresponding bit values which are assigned based on their frequency. In case the frequencies are the same, the assignment is just done in a sequential fashion. The index table thus created for the single attribute "Product Category" is shown in the table below. Table 1: Index table for Product-Category Similarly, we generate index tables for every attribute in the dataset. These are also used as reference tables to re-create the original dataset during the process of decompression. After creating the index tables for all columns, we have to update the original table by replacing the actual data values with their corresponding bit values. The updated table is shown below. In the original table, column Product Category requires 28 bits for the attribute Office supplies, 18 bits for Furniture, and 20 bits for Technology. In the compressed form, only 1 bit for the attribute office supplies, 1 bit for furniture and 2 bits for Technology are required which adds up to a total of 4 bits in total. This is a sizeable improvement over a similar work done by Nimisha et al [1], which makes use of six bits in a similar scenario since they generated binary values in a standard fashion without eliminating the leading zeroes. This refinement will be further evident in situations where the number of data items are significantly high. 150

Figure 4: Dataset updated with Bit value (Compressed dataset) Figure 5: Bar chart representing the storage space needed before and after compression 6. Conclusion In this paper, we propose the concept of introducing a variable-bit data-type for the purpose of compression in a database. This is an extension of the work done previously by Nimisha et al[1] which uses a uniform-length bit representation but is further improved by truncating the redundant leading zeroes in the binary values used during compression. Also, the assignment of the bits to the data items are based on their frequency of occurrence. This would ensure that the lowest binary number is allotted to the data item with the highest count/frequency and the highest binary value is allotted similarly to the data item with the lowest count. This results in an overall reduction in the number of 151

bits used during compression. This technique could be applied in any dataset and by considering the actual number of bits used, variable-bit is a much better alternative to the conventional bit representation for compression. Acknowledgement Special thanks to RAJALAKSHMI V R, Assistant Professor, Department of Computer Science & IT, Faculty at Amrita School of Arts and Sciences, Kochi, for her guidance and helpful comments on database management and data compression. We also thank the anonymous reviewers for their helpful and constructive comments. References [1] Nimisha E., Shyama P., Rajalakshmi V.R., A New Approach to Increase the Storage Efficiency of Databases Using BIT Representation, Amrita VishwaVidyapeetham, Department of Computer Science and IT, Kochi, India (2016). [2] Kodituwakku S.R., Amarasinghe U.S., Comparison of lossless data compression algorithms for text data, Indian Journal of Computer Science and Engineering 1(4) (2010), 416-425. [3] Amandeep Singh Sidhu, MeenakshiGarg, Research Paper on Text Data Compression Algorithm using Hybrid Approach, International Journal of Computer Science and Mobile Computing (IJCSMC) 3(12) (2014), 01-10. [4] Rupinder SinghBrar, Bikramjeet Singh, A Survey on Different Compression Techniques and Bit Reduction Algorithm for Compression of Text/Lossless Data, International Journal of Advanced Research in Computer Science and Software Engineering 3(3) (2013). [5] ShrustiPorwal, YashiChaudhary, Jitendra Joshi, Manish Jain, Data Compression Methodologies for Lossless Data and Comparison between Algorithms, International Journal of Engineering Science and Innovative Technology(IJESIT) 2(2) (2013). [6] Nishad P.M, ManickaChezian R., Enhanced LZW (Lempel-Ziv- Welch) Algorithm by Binary Search with Multiple Dictionary to Reduce Time Complexity for Dictionary Creation in Encoding and Decoding, International Journal of Advanced Research in Computer Science and Software Engineering 2(3) (2012). [7] Paul G. Howard, Jerey Scott Vitter, Practical Implementations of Arithmetic Coding, A shortened version appears in the proceedings of the International Conference on Advances in Communication and Control (1991), 152

[8] HaroonAltarawneh, Mohammad Altarawneh, Data Compression Techniques on Text Files: A Comparison Study, International Journal of Computer Applications 26(5) (2011). [9] Aarti, Performance Analysis of Huffman Coding Algorithm, International Journal of Advanced Research in Computer Science and Software Engineering 3(50) (2013). 153

154