A Case Study On Practical Usability Of Dependently Typed Languages - Deflate

Similar documents
Coding Trees in the Deflate format

Coding Trees in the Deflate format

Optimization of a verified Deflate implementation

Haskell Beats C using Generalized Stream Fusion

Dependent Types and Irrelevance

Category: Informational May DEFLATE Compressed Data Format Specification version 1.3

File Fragment Encoding Classification: An Empirical Approach

7. Archiving and compressing 7.1 Introduction

A Research Paper on Lossless Data Compression Techniques

DEFLATE COMPRESSION ALGORITHM

15 July, Huffman Trees. Heaps

RMNet function calls. Parameters: Usage: Micro Focus. RM/COBOL Development System - RMNET

zlib Update Jim Johnston, TPF Development Lab March 23, 2015 TPFUG Dallas, TX

Hyper Text Transfer Protocol Compression

Data Compression Techniques

7: Image Compression

[MS-MCI]: Microsoft ZIP (MSZIP) Compression and Decompression Data Structure

ENSC Multimedia Communications Engineering Topic 4: Huffman Coding 2

ETSI ES V1.1.1 ( )

File: Racket File Format Libraries

Lempel-Ziv-Welch (LZW) Compression Algorithm

File: PLT File Format Libraries

An Advanced Text Encryption & Compression System Based on ASCII Values & Arithmetic Encoding to Improve Data Security

STUDY OF VARIOUS DATA COMPRESSION TOOLS

Basic Compression Library

Simple variant of coding with a variable number of symbols and fixlength codewords.

HTTP/2: What You Need to Know. Robert

Network Working Group. Category: Informational August 1996

Brotli Compression Algorithm outline of a specification

File: Racket File Format Libraries

In the Compression Hornet's Nest: A Security Study of Data Compression in Network Services

Lecture 3. Essential skills for bioinformatics: Unix/Linux

Category: Informational December 1998

Compressing Data. Konstantin Tretyakov

Huffman Coding Implementation on Gzip Deflate Algorithm and its Effect on Website Performance

Data Compression for Bitmap Indexes. Y. Chen

A New Compression Method Strictly for English Textual Data

Data Blocks: Hybrid OLTP and OLAP on compressed storage

Program Construction and Data Structures Course 1DL201 at Uppsala University Autumn 2010 / Spring 2011 Homework 6: Data Compression

Compiler Construction I

Ch. 2: Compression Basics Multimedia Systems

Compiling Parallel Algorithms to Memory Systems

Epilog: Further Topics

CS/COE 1501

token string (required) A valid token that was provided by the gettoken() method

Parallel DEFLATE Decoding using GPGPU COMP Single Semester Project

S 1. Evaluation of Fast-LZ Compressors for Compacting High-Bandwidth but Redundant Streams from FPGA Data Sources

A Surfeit of SSH Cipher Suites

Data compression with Huffman and LZW

Abusing JSONP with Rosetta Flash

Scalable Compression and Transmission of Large, Three- Dimensional Materials Microstructures

Video Compression An Introduction

CSE 143 Lecture 22. Huffman Tree

Horn Formulae. CS124 Course Notes 8 Spring 2018

Internet Engineering Task Force (IETF) Request for Comments: 7541 Category: Standards Track. May 2015

User Commands GZIP ( 1 )

Data Representation. Reminders. Sound What is sound? Interpreting bits to give them meaning. Part 4: Media - Sound, Video, Compression

DICOM Correction Proposal

Multimedia Networking ECE 599

CS106B Handout 34 Autumn 2012 November 12 th, 2012 Data Compression and Huffman Encoding

Black Problem 2: Huffman Compression [75 points] Next, the Millisoft back story! Starter files

File: Racket File Format Libraries

Parallel LZ77 Decoding with a GPU. Emmanuel Morfiadakis Supervisor: Dr Eric McCreath College of Engineering and Computer Science, ANU

HTTP TRAFFIC CONSISTS OF REQUESTS AND RESPONSES. All HTTP traffic can be

CSE 421 Greedy: Huffman Codes

The Data Link Layer. 32 PART I Networking Basics

Improved Recovery and Reconstruction of DEFLATEd Files

GZIP is a software application used for file compression. It is widely used by many UNIX

Lossless Compression Algorithms

Lecture Coding Theory. Source Coding. Image and Video Compression. Images: Wikipedia

A Comparative Study of Lossless Compression Algorithm on Text Data

Data Representation. Types of data: Numbers Text Audio Images & Graphics Video

COSC431 IR. Compression. Richard A. O'Keefe

Essential Skills for Bioinformatics: Unix/Linux

Market Data Platform Real Time. SNAPSHOT DATA Futures & Options Market

Statistical Modeling of Huffman Tables Coding

Objectives CINS/F1-01

NAME SYNOPSIS. Perl version documentation - Compress::Zlib. Compress::Zlib - Interface to zlib compression library. use Compress::Zlib ;

Practical Knowledge Transfering, moving and exporting files Martin Dahlö

Avro Specification

PageSpeed Insights. Compressing resources with gzip or deflate can reduce the number of bytes sent over the network.

JPEG Joint Photographic Experts Group ISO/IEC JTC1/SC29/WG1 Still image compression standard Features

AN 831: Intel FPGA SDK for OpenCL

A Packet Header Compression Algorithm based on TCP Flow Clustering and Huffman Encoding

WIRE/WIRELESS SENSOR NETWORKS USING K-RLE ALGORITHM FOR A LOW POWER DATA COMPRESSION

RFC 803. Dacom 450/500 Facsimile Data Transcoding A. Agarwal, M. J. O Connor and D. L. Mills 2 November Introduction

Abusing JSONP with. Michele - CVE , CVE Pwnie Awards 2014 Nominated

Compression Bombs Strike Back

MILCOM October 2002 (Anaheim, California) Subject

Master Course Computer Networks IN2097

Graduate-Credit Programming Project

Designing Survivable Services from Independent Components with Basic Functionality

Vorlesung Advanced Topics in HCI (Mensch-Maschine-Interaktion 2)

Zip file extractor. Zip file extractor

High Efficiency Video Coding. Li Li 2016/10/18

Study of LZ77 and LZ78 Data Compression Techniques

Abstractness, Specificity, and Complexity in Software Design

Improved Videotransmission over Lossy. Channels using Parallelization. Dept. of Computer Science, University of Bonn, Germany.

HTTP Inspection Engine

Open ebook File Format 1.0. DRAFT VERSION 001 November 5, 1999

Transcription:

A Case Study On Practical Usability Of Dependently Typed Languages - Deflate Christoph-Simon Senjak Lehr- und Forschungseinheit für Theoretische Informatik Institut für Informatik Ludwig-Maximilians-Universität München Oettingenstr.67, 80538 München Oberseminarvortrag 19. Juli 2013 Christoph-Simon Senjak (LMU München) Deflate - A Case Study 2013-08-19 1 / 12

Foreword This talk gives an intermediate overview of work in progress. It mainly contains ideas and goals. It is limited to one special data format, but aims to find problems and solutions through its implementation. The ultimate goal is to provide techniques for dealing with low-level structures in a verifiable way. Christoph-Simon Senjak (LMU München) Deflate - A Case Study 2013-08-19 2 / 12

Why Deflate? The Web HTTP gzip Rar, Zip,... SSH ZLIB Deflate PNG Main implementation is the zlib (zlib.net). Though the first release was 1995, in 2005 there were still security vulnerabilities. And there are still a lot of bugfixes with every version. Christoph-Simon Senjak (LMU München) Deflate - A Case Study 2013-08-19 3 / 12

Deflate - Basics Specified in RFC 1951. RFC 1952 specifies the GZIP file format, and RFC 1950 specifies the ZLIB file format. Both are often confused with Deflate. Can make use of Huffman-Codings and Run-length encoding. Does not require any specific compression algorithm, even though some are recommended. Christoph-Simon Senjak (LMU München) Deflate - A Case Study 2013-08-19 4 / 12

Deflate - Overview D e f l a t e ::= ( 0 Block ) 1 Block ( 0 1 ) Block ::= 00 UncompressedBlock 01 DynamicallyCompressedBlock 10 S t a t i c a l l y C o m p r e s s e d B l o c k UncompressedBlock ::= l e n g t h l e n g t h b y t e s S t a t i c a l l y C o m p r e s s e d B l o c k ::= ( code!= 256 ) code f o r 256 DynamicallyCompressedBlock ::= header codes ( code!= 256 ) code f o r 256 Compressed Blocks may also contain backreferences to a 32KiB backbuffer. Christoph-Simon Senjak (LMU München) Deflate - A Case Study 2013-08-19 5 / 12

Codings - Definitions 1. Must be prefix-free, that is, no code prefixes another code, so they form a tree. 2. For Deflate, we additionally require that the shorter codes lexicographically precede longer codes. (RFC 1951, 3.2.2) 3. Furthermore, all codes of a given bit length have lexicographically consecutive values, in the same order as the symbols they represent. (RFC 1951, 3.2.2) - For every prefix-free coding one can find an equally good prefix-free coding satisfying (2.) and (3.) - Such a coding is uniquely defined by the sequence of code lengths. Christoph-Simon Senjak (LMU München) Deflate - A Case Study 2013-08-19 6 / 12

Codings - Example RFC 1951 gives the following example: Consider A 00, B 1, C 011, D 010. The tree looks like A 0 1 D 0 1 0 1 It does not satisfy (2.) and (3.), but the equally good A 10, B 0, C 110, D 111 does. It is uniquely determined by the sequence 2,1,3,3. B C Christoph-Simon Senjak (LMU München) Deflate - A Case Study 2013-08-19 7 / 12

Run-Length Encoding Data streams often contain duplicates of strings that have already been sent. Several algorithms exist to find these duplicates and eliminate them, using a backreference. #include <stdio.h> \n#include <unistd.h> #include <stdio.h> \n unistd.h> 10 Christoph-Simon Senjak (LMU München) Deflate - A Case Study 2013-08-19 8 / 12

Pitfalls The format is specified to operate on bytes. Bits have to be correctly extracted from them confusing rules about positions of lsb and msb. In some cases, byte-boundaries are relevant it is not possible to abstract away from the bytes and operate only on the resulting bitstream. Almost no implementation gives a pure Deflate stream, even though it claims so (for example java.util.deflateroutputstream) extracting them from gzip streams seems to be the easiest way. Christoph-Simon Senjak (LMU München) Deflate - A Case Study 2013-08-19 9 / 12

Goals Provide a formal specification of the Deflate format in Agda. Create an implementation of inflate (decompression of a Deflate-stream) which is both efficient and verified. Test it against many gzipped files, to make sure the specification is compliant with the standard. Christoph-Simon Senjak (LMU München) Deflate - A Case Study 2013-08-19 10 / 12

An Intermediate Result We made an implementation of gunzip in pure Haskell, to make sure that we understood the standard correctly, and see the problems that occur. The implementation takes about 2 minutes for about 6 MB, which is slow, but quite good regarding its purely functional source code. The main problem with efficiency is the 32K backbuffer for RLE. Implementation as a List is too slow. Current implementation with recursive slowdown is better, but still too slow. Christoph-Simon Senjak (LMU München) Deflate - A Case Study 2013-08-19 11 / 12

Plans Currently working on the formalization in Agda. Porting and verifying the current implementation from Haskell to Agda. Defining a low-level language with semantics formalized in Agda, in which the equivalence of an efficient implementation of Deflate can be shown. Christoph-Simon Senjak (LMU München) Deflate - A Case Study 2013-08-19 12 / 12