INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Similar documents
Information Technology Department, PCCOE-Pimpri Chinchwad, College of Engineering, Pune, Maharashtra, India 2

To Optimize XML Query Processing using Compression Technique

The Effects of Data Compression on Performance of Service-Oriented Architecture (SOA)

Compressing XML Documents Using Recursive Finite State Automata

HARDWARE IMPLEMENTATION OF LOSSLESS LZMA DATA COMPRESSION ALGORITHM

So, what is data compression, and why do we need it?

ECE 499/599 Data Compression & Information Theory. Thinh Nguyen Oregon State University

Lossless Compression Algorithms

An Effective Approach to Improve Storage Efficiency Using Variable bit Representation

A Research Paper on Lossless Data Compression Techniques

Semistructured Data Store Mapping with XML and Its Reconstruction

LIPT-Derived Transform Methods Used in Lossless Compression of Text Files

Data Compression. Media Signal Processing, Presentation 2. Presented By: Jahanzeb Farooq Michael Osadebey

IMAGE COMPRESSION TECHNIQUES

Evaluating the Role of Context in Syntax Directed Compression of XML Documents

A Compression Method for PML Document based on Internet of Things

A Comparative Study of Entropy Encoding Techniques for Lossless Text Data Compression

A New Compression Method Strictly for English Textual Data

Multimedia Networking ECE 599

A Novel Image Compression Technique using Simple Arithmetic Addition

An Empirical Evaluation of XML Compression Tools

A COMPRESSION TECHNIQUES IN DIGITAL IMAGE PROCESSING - REVIEW

Survey over Adaptive Compression Techniques Vijay S Gulhane, Dr. Mir Sadique Ali

Data Compression. An overview of Compression. Multimedia Systems and Applications. Binary Image Compression. Binary Image Compression

Adaptive Compression Techniques and Efficient Query Evaluation for XML Databases V.S. Gulhane Dr. M.S. Ali

Image coding and compression

Data Compression Techniques for Big Data

Ch. 2: Compression Basics Multimedia Systems

A Compression Technique Based On Optimality Of LZW Code (OLZW)

Category: Informational May DEFLATE Compressed Data Format Specification version 1.3

Data Compression Scheme of Dynamic Huffman Code for Different Languages

JPEG. Table of Contents. Page 1 of 4

Department of electronics and telecommunication, J.D.I.E.T.Yavatmal, India 2

LSB Based Audio Steganography Based On Text Compression

Keywords Data compression, Lossless data compression technique, Huffman Coding, Arithmetic coding etc.

XML Tree Structure Compression

GUJARAT TECHNOLOGICAL UNIVERSITY

15 Data Compression 2014/9/21. Objectives After studying this chapter, the student should be able to: 15-1 LOSSLESS COMPRESSION

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

THE RELATIVE EFFICIENCY OF DATA COMPRESSION BY LZW AND LZSS

Christian Esposito and Domenico Cotroneo. Dario Di Crescenzo

XML Retrieval More Efficient Using Compression Technique

Lempel-Ziv-Welch (LZW) Compression Algorithm

Shri Vishnu Engineering College for Women, India

IMAGE PROCESSING (RRY025) LECTURE 13 IMAGE COMPRESSION - I

Image Compression Algorithm and JPEG Standard

Optimization of Bit Rate in Medical Image Compression

An Advanced Text Encryption & Compression System Based on ASCII Values & Arithmetic Encoding to Improve Data Security

WIRE/WIRELESS SENSOR NETWORKS USING K-RLE ALGORITHM FOR A LOW POWER DATA COMPRESSION

Encoding. A thesis submitted to the Graduate School of University of Cincinnati in

LZ UTF8. LZ UTF8 is a practical text compression library and stream format designed with the following objectives and properties:

Performance Evaluation of XHTML encoding and compression

Figure-2.1. Information system with encoder/decoders.

ADVANCED LOSSLESS TEXT COMPRESSION ALGORITHM BASED ON SPLAY TREE ADAPTIVE METHODS

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

FPGA based Data Compression using Dictionary based LZW Algorithm

XPRESS: A Queriable Compression for XML Data

Analysis of Parallelization Effects on Textual Data Compression

University of Mustansiriyah, Baghdad, Iraq

Lossless Audio Coding based on Burrows Wheeler Transform and Run Length Encoding Algorithm

Using Arithmetic Coding for Reduction of Resulting Simulation Data Size on Massively Parallel GPGPUs

Interactive Progressive Encoding System For Transmission of Complex Images

CS 493: Algorithms for Massive Data Sets Dictionary-based compression February 14, 2002 Scribe: Tony Wirth LZ77

Multimedia Systems. Part 20. Mahdi Vasighi

Comparison of Text Data Compression Using Run Length Encoding, Arithmetic Encoding, Punctured Elias Code and Goldbach Code

Information Retrieval. Chap 7. Text Operations

A Image Comparative Study using DCT, Fast Fourier, Wavelet Transforms and Huffman Algorithm

Index Compression. David Kauchak cs160 Fall 2009 adapted from:

An Asymmetric, Semi-adaptive Text Compression Algorithm

Textual Data Compression Speedup by Parallelization

CS 335 Graphics and Multimedia. Image Compression

Lossless Image Compression with Lossy Image Using Adaptive Prediction and Arithmetic Coding

A Comprehensive Review of Data Compression Techniques

Journal of Computer Engineering and Technology (IJCET), ISSN (Print), International Journal of Computer Engineering

International Journal of Emerging Technology and Advanced Engineering Website: (ISSN , Volume 2, Issue 4, April 2012)

IMAGE COMPRESSION TECHNIQUES

David Rappaport School of Computing Queen s University CANADA. Copyright, 1996 Dale Carnegie & Associates, Inc.

EE-575 INFORMATION THEORY - SEM 092

ISSN (ONLINE): , VOLUME-3, ISSUE-1,

Advanced low-complexity compression for maskless lithography data

Engineering Mathematics II Lecture 16 Compression

Silver Oak College of Engineering and Technology Information Technology Department Mid Semester 2 Syllabus 6 th IT

A New Algorithm based on Variable BIT Representation Technique for Text Data Compression

A Simple Lossless Compression Heuristic for Grey Scale Images

Dictionary Based Compression for Images

IJSRD - International Journal for Scientific Research & Development Vol. 3, Issue 10, 2015 ISSN (online):

Wavelet Based Image Compression Using ROI SPIHT Coding

CIS 121 Data Structures and Algorithms with Java Spring 2018

EE67I Multimedia Communication Systems Lecture 4

On The Performance of Markup Language Compression

Compression; Error detection & correction

Compressing and Decoding Term Statistics Time Series

Lempel-Ziv-Welch Compression

IMAGE COMPRESSION USING HYBRID QUANTIZATION METHOD IN JPEG

Optimization of Query Processing in XML Document Using Association and Path Based Indexing

STUDY OF VARIOUS DATA COMPRESSION TOOLS

Image compression. Stefano Ferrari. Università degli Studi di Milano Methods for Image Processing. academic year

Image Compression for Mobile Devices using Prediction and Direct Coding Approach

CS/COE 1501

A tutorial report for SENG Agent Based Software Engineering. Course Instructor: Dr. Behrouz H. Far. XML Tutorial.

Transcription:

Rashmi Gadbail,, 2013; Volume 1(8): 783-791 INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK EFFECTIVE XML DATABASE COMPRESSION TECHNIQUE BY USING ADAPTIVE 1. Asst. Prof., IBSS COE, Amravati. HUFFMAN CODING ALGORITHM RASHMI N. GADBAIL, SHILPA INGOLE 2. Asst. Prof., LAIT College, Badlapur, Mumbai. Accepted Date: 27/02/2013 Publish Date: 01/04/2013 Keywords Compression, Decompression, Efficient XML Compression and Decompression, Adaptive Huffman Coding Corresponding Author Ms. Rashmi N. Gadbail Abstract The Extensible Markup Language (XML) is one of the most important formats for data interchange on the Internet. XML documents are used for data exchange and to store large amount of data over the web. These documents are extremely verbose and require specific compression for efficient transformation. In this proposed work we are enhancing the existing compressors which use Adaptive Huffman coding. It is based on the principle of extracting data from the document, and grouping it based on semantics. The document is encoded as a sequence of integers, while the data grouping is based on XML tags/attributes/comments. The main disadvantage of using XML documents is their large sizes caused by highly repetitive (sub) structures of those documents and often long tag and attribute names. Therefore, a need to compress XML, both efficiently and conveniently to use. The re-organized data is now compressed by adaptive Huffman coding. The special feature of adaptive Huffman coding algorithm is that, it has extremely accurate compression as well as it eliminates the repetition of dictionary based words in xml database. Using Adaptive Huffman algorithm, we derived probabilities which dynamically changed with the incoming data, through Binary tree construction.

Rashmi Gadbail,, 2013; Volume 1(8): 783-791 INTRODUCTION The Extensible Markup Language (XML) is one of the most important formats for data interchange on the Internet. XML documents are used for data exchange and to store large amount of data over the web. These documents are extremely verbose and require specific compression for efficient transformation. In this proposed work we are enhancing the existing compressors which use Adaptive Huffman coding. It is based on the principle of extracting data from the document, and grouping it based on semantics[1].the document is encoded as a sequence of integers, while the data grouping is based on XML tags/attributes/comments. The main disadvantage of using XML documents is their large sizes caused by highly repetitive (sub) structures of those documents and often long tag and attribute names. Therefore, a need to compress XML, both efficiently and conveniently to use. The design goal of Effective compression of XML database by using Adaptive Huffman coding is to provide extremely efficient accurate compression of XML documents while supporting "online" usage. In this context, "online" usage means: (a) only one pass through the document is required to compress it, (b) compressed data is sent to the output stream incrementally as the document is read, and (c) decompression can begin as soon as compressed data is available to the decompress or. Thus transmission of a document over a heterogeneous systems can begin as soon as the compressor produces its first output, and, consequently, the decompress or can start decompression shortly thereafter, resulting in a compression scheme that is well suited for transmission of XML documents over a wide-area network. Why there is need to Compress XML? XML representations are very large and can be up to ten-times as large as equivalent binary representations. Meaningful tags make an XML document self-describing, but they also increase the document's size. In most cases, an element contains at most one data item. In other words a data item usually comes with a start-tag and an end tags. XML is the lingua franca (verbose) of web services, thus necessitating large volumes of XML data to be sent over networks. Reducing data size helps conserve network bandwidth. Consider the invoice record in Example 1.3.

Rashmi Gadbail,, 2013; Volume 1(8): 783-791 <? xml version="1.0" encoding="utf-8"?> <Invoice ID="ST87302" date="2003-03-17" paid="true" > <Customer ID="2376"> <Name> <First Name>Wally</First Name> <Last Name>Robertson</Last Name> </Name> <Billing Address> <Street>200 University Ave. </Street> <City>Waterloo</City> <Province>ON</Province> </Billing Address> </Customer> <Product List> <Product ID="HI0941" > <Name>MP3 Player</name> <Price>139.99</Price> <Units>2</Units> </Product> <Product ID="XP6015" > <Name>CD Writer</name> <Price>79.99</Price> <Units>4</Units> </Product> </Product List> </Invoice> Example 1.1: An Invoice Record In XML Format This document contains 19 data items and its length is 688 bytes. The data itself, stored in plain text as in Example 1.2, has only 144 bytes. This value still can be further reduced if we use binary format [7]. The repeating tag characteristic makes the XML version almost 5 times longer than the plain text version. ST87302 2003-03-17 true 2376 Wally Robertson 200 University Ave. Waterloo ON HI0941 MP3 Player 139.99 2 XP6015 CD Writer 79.99 4 Example 1.2: An Invoice Record In Plain Text. XML Compression: XML compression is an lossless compression technique. A compression approach is lossless only if it is possible to exactly reconstruct the original data from the compressed version. There is no loss of any information during the compression process. Lossless compression is called reversible compression since the original data may be recovered perfectly by

Rashmi Gadbail,, 2013; Volume 1(8): 783-791 decompression [16]. XML is a text representation for a tree structured data. Hence, a straightforward logical approach for compressing XML documents is to use the traditional general purpose text compression tools. In compression process XML data symbols, tags and frequencies of its symbols are collected and maintained dynamically according to the source file. XML Decompression The decompression process is obvious or can be easily derived from the compression process. Any compression algorithm will not work unless a means of decompression is also provided due to the nature of data compression.xml decompression is exactly opposite to the xml compression [16]. Compressed XML data is sent to the output stream incrementally as the document is read, and decompression can begin as soon as compressed data is available to the decompressor. Thus transmission of a document over a heterogeneous systems can begin as soon as the compressor produces its first output, and, consequently, the decompress or can start decompression shortly thereafter, resulting in a compression scheme that is well suited for transmission of XML documents over a wide-area network. Fig 1.3: compression & Decompression. II RELATED WORK Various sophisticated algorithms have been proposed for lossless text compression A very promising development in the field of lossless data compression is the Burrows-Wheeler Compression Algorithm (BWCA), introduced in 1994 by Michael Burrows and David Wheeler. The algorithm received considerable attention since of its Lempel-Ziv like execution speed and its compression performance close to state-ofthe-art PPM algorithms. A preprocessing method is performed on the source text before applying an existing compression algorithm. The transformation is designed to make it easier to compress the source file. The star encoding is generally used for this type of pre processing transformation of the source text. XGrind: several proposals and references there in make use of the observation that the pioneering

Rashmi Gadbail,, 2013; Volume 1(8): 783-791 work in this domain was XGrind which was based on static Huffman coding. XGrind was the first XML-conscious compression scheme to support querying without full decompression [7]. XPRESS: Like XGRIND, features a homomorphism compression process requiring two passes over the document. XPRESS also supports querying of compressed data and claims to achieve better compression than X Grind [8]. However, it uses a semi-adaptive form of arithmetic coding which also necessitates two passes Over the original XML document. III. IMPLEMENTATION A.) Compression Techniques for XML Database: Lossless Compression: Lossless compression techniques provide exact recovery of the original data from their compressed version. Any information contained in an original cannot be lost during compression and reconstruction. These techniques are used widely in applications to save storage space and network bandwidth. Adaptive Huffman coding for xml database compression XML simplifies data exchange among heterogeneous computers, but it is notoriously verbose and has spawned the development of many XML-specific compressors and binary formats. We presented an XML test and a combined efficiency metric integrating compression ratio and execution speed. With the help of adaptive Huffman compression technique. The Adaptive Huffman compression is more efficient than static Huffman compression it is an important dimension for lossless data compression. In computer science and information theory Huffman coding is an entropy encoding algorithm used for lossless data compression [12]. In the adaptive Huffman coding, an alphabet and frequencies of its symbols are collected and maintained dynamically according to the source file on each iteration. The Huffman tree is also updated based on the alphabet and frequencies dynamically. When the encoder and decoder are at different locations, both maintain an identical Huffman tree for each step independently. Therefore, there is no need transferring the Huffman tree. The Adaptive Huffman algorithm provides effective compression by just transmitting the node position in the tree without transmitting the entire code [12]. Unlike static Huffman algorithm the statistics of the sensor data need not be known for encoding the

Rashmi Gadbail,, 2013; Volume 1(8): 783-791 data [12]. That is why adaptive Huffman is extremely accurate with respect to the compression. Adaptive algorithm processed by following manner: reversible and decompress able so that the decompressed document is a mirror image of the original. Compression & Decompression of XML database By Adaptive Huffman Coding Adaptive Huffman coding provides We proposed here an efficient way of extremely efficient and highly accurate compressing XML documents by using adaptive compression of XML documents while Huffman coding. High compression ratio and supporting "online" usage. In this context, speed is equally important. We also require the "online" usage means: (a) only one pass transformation of xml database to be fully through the document is required to reversible and decompress able so that the compress it. decompressed document is a mirror image of the Adaptive Huffman determines the mapping to original. The main idea is for the compressor and code words using a running estimate of the the decompressor to start with an empty source symbols probabilities. Huffman tree and to modify it as symbols are Compressed data is sent to the output stream being read and processed incrementally as the document is read. Decompression can begin as soon as compressed data is available to the decompressor. Adaptive Huffman coding encodes an alphabet with fixed codes. That allows us to directly search keywords in the compressed data. Since the data volumes reduced, such compressing of xml data may be even faster than the original data, it performs transformation of xml database to be fully Fig: XML Compression & Decompression Process

Rashmi Gadbail,, 2013; Volume 1(8): 783-791 XML Parsers Parser is a program or module that checks a wellformed syntax and provides a capability to manipulate XML data elements. In this project work we used DOM (Document Object Model) PARSING [18]. DOM (Document Object Model): Is a standardized platform and language independent object oriented interface defining a number of object with which document (in particular HTML or XML) can be described as a hierarchical tree structure. The standardized objects and methods are used to easily manipulate documents and produces uniform, reusable programs. Structure DOM describes a document as a tree structure of nodes. In tree structure XML data consists of nodes and edges that represent elements attributes, text. Parsing of XML data is done using DOM parser, in which all nodes of data are parsed according to node type [12]. Creating the hierarchy: DOM, in essence, is a collection of nodes. With different types of information potentially contained in a document, there are several different types of nodes defined. In creating a hierarchy for an XML file, it's natural to produce something conceptually like the structure below. While it is an accurate depiction of the included data, it is not an accurate description of the data as represented by the DOM. This is because it represents the elements, but not the nodes [12]. Experimental Evaluation: Performance metrics We measure and compare the performance of the XML compression tools using the following metrics. Compression Ratio Represents the ratio between the sizes of compressed and uncompressed XML documents as given by: Compression Ratio = (Compressed Size)/(Uncompressed Size). Compression Time Represents the elapsed time during the compression process, i.e. the period of time between the start of program execution on a document until all the data are written to disk.

Rashmi Gadbail,, 2013; Volume 1(8): 783-791 5.1.3. Decompression Time: Represents the elapsed time during the decompression process i.e. the period of time between the start of program execution on reading the decompressed format of the XML document until delivering the original document. Experimental Results Table 5.1: Comparative analysis Performance Analysis 5.3.1 Comparative analysis Here experimental results are there for some standard datasets of xml. The documents are selected to cover a wide range of sizes where the smallest document is 16 KB and the biggest document is 9MB. CONCLUSION Adaptive Huffman coding encodes an alphabet with fixed codes. That allows us to directly search keywords in the compressed data [6]. Since the data volumes reduced, such compressing of xml data may be even faster than the original data. The resulting output of proposed work will be that xml document compresses with the help of adaptive Huffman algorithm and that compressed data will be decompressed as well and provide original xml document over an heterogeneous systems REFERENCES 1. Bray, et al. Extensible Markup Language (XML) 1.0, October 2000, http://www.w3.org/tr/rec-xml. 2. G. Girardot and N. Sundaresam,: an encoding format for efficient representation and exchange of XML over the Web, http://www9.org/w9cdrom/154/154.html 3. D. Huffman, A Method for Construction of Minimum-Redundancy Codes, Proc. of IRE, September 1952.

Rashmi Gadbail,, 2013; Volume 1(8): 783-791 4. H. Liefke and D. Suciu. XMill: An Efficient Compressor for XML Data. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pages 153 164, May 2000. 5. P. G. Howard and J. S. Vitter. Analysis of Arithmetic Coding for Data Compression. In Proceedings of the IEEE Data Compression Conference, pages 3 12, April 1991. 6. Tolani P.M. and Haritsa J.R., XGRIND: a queryfriendly XML compressor, in Proc. 2002 Int l Conf. on DatabaseEng., pp. 225-34 7. World Wide Web Consortium. Document Object Model (DOM) Level 1Speci_cation Version 1.0, W3C Recommendation 1 October, 1998 edition.