Fully Integrated Communication Terminal and Equipment. FlexWave II :Executive Summary

Similar documents
FPGA Provides Speedy Data Compression for Hyperspectral Imagery

Image data compression with CCSDS Algorithm in full frame and strip mode

Nios II Processor-Based Hardware/Software Co-Design of the JPEG2000 Standard

An FPGA Based Adaptive Viterbi Decoder

Optimized architectures of CABAC codec for IA-32-, DSP- and FPGAbased

Modified SPIHT Image Coder For Wireless Communication

SIGNAL COMPRESSION. 9. Lossy image compression: SPIHT and S+P

Design of 2-D DWT VLSI Architecture for Image Processing

Compression II: Images (JPEG)

Design and Implementation of Lossless Data Compression Coprocessor using FPGA

A New Configuration of Adaptive Arithmetic Model for Video Coding with 3D SPIHT

DCT Coefficients Compression Using Embedded Zerotree Algorithm

PINE TRAINING ACADEMY

IMAGE DATA COMPRESSION

Wavelet Based Image Compression Using ROI SPIHT Coding

An introduction to JPEG compression using MATLAB

Review and Implementation of DWT based Scalable Video Coding with Scalable Motion Coding.

The Standardization process

JPEG Joint Photographic Experts Group ISO/IEC JTC1/SC29/WG1 Still image compression standard Features

IMAGE COMPRESSION. October 7, ICSY Lab, University of Kaiserslautern, Germany

Comparative Study and Implementation of JPEG and JPEG2000 Standards for Satellite Meteorological Imaging Controller using HDL

SPIHT Image Compression on FPGAs

A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN

A Public Domain Tool for Wavelet Image Coding for Remote Sensing and GIS Applications

ECE 533 Digital Image Processing- Fall Group Project Embedded Image coding using zero-trees of Wavelet Transform

FPGA Implementation of 2-D DCT Architecture for JPEG Image Compression

Multi-level Design Methodology using SystemC and VHDL for JPEG Encoder

An Optimum Approach for Image Compression: Tuned Degree-K Zerotree Wavelet Coding

A LOW-COMPLEXITY AND LOSSLESS REFERENCE FRAME ENCODER ALGORITHM FOR VIDEO CODING

CSEP 521 Applied Algorithms Spring Lossy Image Compression

High Speed Arithmetic Coder Architecture used in SPIHT

TKT-2431 SoC design. Introduction to exercises. SoC design / September 10

JPEG. Wikipedia: Felis_silvestris_silvestris.jpg, Michael Gäbler CC BY 3.0

Standard Codecs. Image compression to advanced video coding. Mohammed Ghanbari. 3rd Edition. The Institution of Engineering and Technology

13.6 FLEXIBILITY AND ADAPTABILITY OF NOAA S LOW RATE INFORMATION TRANSMISSION SYSTEM

FPGA Implementation of Rate Control for JPEG2000

FPGA Implementation of Image Compression Using SPIHT Algorithm

Implication of variable code block size in JPEG 2000 and its VLSI implementation

JPEG 2000 A versatile image coding system for multimedia applications

Dynamically Reconfigurable Processing Module (DRPM), Application on Instruments and Qualification of FPGA Package

An Implementation of CCSDS B-1 Recommended Standard

Image Compression Algorithms using Wavelets: a review

Comparison of EBCOT Technique Using HAAR Wavelet and Hadamard Transform

CS 335 Graphics and Multimedia. Image Compression

Design of Coder Architecture for Set Partitioning in Hierarchical Trees Encoder

Low-Memory Packetized SPIHT Image Compression

FAST AND EFFICIENT SPATIAL SCALABLE IMAGE COMPRESSION USING WAVELET LOWER TREES

JPEG 2000 Implementation Guide

HIGH LEVEL SYNTHESIS OF A 2D-DWT SYSTEM ARCHITECTURE FOR JPEG 2000 USING FPGAs

LogiCORE IP AXI DataMover v3.00a

Agenda. Introduction FPGA DSP platforms Design challenges New programming models for FPGAs

Hardware Acceleration of the Embedded Zerotree Wavelet Algorithm

The S6000 Family of Processors

Imaging Sensor with Integrated Feature Extraction Using Connected Component Labeling

CMPT 365 Multimedia Systems. Media Compression - Image

Integrated Circuit ORB (ICO) White Paper V1.1

Wireless Communication

April 9, 2000 DIS chapter 1

Low-complexity video compression based on 3-D DWT and fast entropy coding

ESA IP Cores Service Current status, activities, and future plans. Kostas Marinis ESTEC/TEC-EDM

Universal Serial Bus Host Interface on an FPGA

Wavelet Transform (WT) & JPEG-2000

The Xilinx XC6200 chip, the software tools and the board development tools

Efficient and Low-Complexity Image Coding with the Lifting Scheme and Modified SPIHT

UNIVERSAL SPACEWIRE INTERFACE TO/FROM VME AND TO/FROM PCI

Lecture 8 JPEG Compression (Part 3)

Design and Tradeoff Analysis of JPEG-2000 on Hardware-Reconfigurable Systems

New Perspectives on Image Compression

Virtex-5 GTP Aurora v2.8

Image Compression Algorithm for Different Wavelet Codes

DRPM architecture overview

JPEG Descrizione ed applicazioni. Arcangelo Bruna. Advanced System Technology

Analysis and Comparison of EZW, SPIHT and EBCOT Coding Schemes with Reduced Execution Time

A qualitative analysis of the benefits of LUTs, Processors, embedded memory and interconnect in MPSoC platforms. Kees Vissers.

PERFORMANCE ANAYSIS OF EMBEDDED ZERO TREE AND SET PARTITIONING IN HIERARCHICAL TREE

Module 6 STILL IMAGE COMPRESSION STANDARDS

Keywords - DWT, Lifting Scheme, DWT Processor.

JPEG Baseline JPEG Pros and Cons (as compared to JPEG2000) Advantages. Disadvantages

A SCALABLE COMPUTING AND MEMORY ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION ON FIELD-PROGRAMMABLE GATE ARRAYS. Theepan Moorthy and Andy Ye

Massively Parallel Computing on Silicon: SIMD Implementations. V.M.. Brea Univ. of Santiago de Compostela Spain

Fast FPGA Implementation of EBCOT block in JPEG2000 Standard

The Next Generation of Compression JPEG 2000

Optimized Progressive Coding of Stereo Images Using Discrete Wavelet Transform

Embedded Descendent-Only Zerotree Wavelet Coding for Image Compression

A 3-D Virtual SPIHT for Scalable Very Low Bit-Rate Embedded Video Compression

DIGITAL VS. ANALOG SIGNAL PROCESSING Digital signal processing (DSP) characterized by: OUTLINE APPLICATIONS OF DIGITAL SIGNAL PROCESSING

Image compression. Stefano Ferrari. Università degli Studi di Milano Methods for Image Processing. academic year

A Coprocessor for Accelerating Visual Information Processing

A Image Comparative Study using DCT, Fast Fourier, Wavelet Transforms and Huffman Algorithm

Overview of ROCCC 2.0

Design and Implementation of 3-D DWT for Video Processing Applications

Parallel FIR Filters. Chapter 5

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow

Pipelined Fast 2-D DCT Architecture for JPEG Image Compression

IMAGE COMPRESSION USING EMBEDDED ZEROTREE WAVELET

Three-D DWT of Efficient Architecture

Design of Transport Triggered Architecture Processor for Discrete Cosine Transform

DESIGN OF DCT ARCHITECTURE USING ARAI ALGORITHMS

CHAPTER 4 BLOOM FILTER

Virtex-II Architecture. Virtex II technical, Design Solutions. Active Interconnect Technology (continued)

Transcription:

Fully Integrated Communication Terminal and Equipment FlexWave II :Executive Specification : Executive, D36B Authors : J. Bormans Document no. : Status : Issue Date : July 2005 ESTEC Contract : 376/99/NL/FM(SC) ESTEC Technical Management: R. Weigand European Space Agency Contract Report The work described in this report was done under ESA contract. Responsibility for the contents resides with the authors or organization preparing it.

Author: J. Bormans Page of 2005/08/0 Contacts: ESA- ESTEC: R. Weigand Microeletronics Section TEC-EDM E-Mail: Roland.Weigand[at]esa.int IMEC : E. Blokken E-Mail.: Eddy.Blokken[at]imec.be : J. Roggen E-Mail: Jean.Roggen[at]imec.be

Author: J. Bormans Page 2 of 2005/08/0 FlexWave-II overview The FlexWave-II has been developed as a dedicated image compression component for spaceborne applications, enabling a multitude of application scenarios, including lossless and lossy compression. The FlexWave-II provides scalable compression, allowing gradual enhancement or degradation of the image quality in a programmable way. A wavelet-based compression scheme has been selected because of the intrinsic scalable characteristics. Moreover the compression criteria can be tuned separately for optimal measurement and visual data compression. The FlexWave-II provides full scalability features and high processing performance. It supports push-broom image processing. The wavelet transform engine is capable of computing up to 5 levels of wavelet transform with 5/3-, 9/3- or 9/7-tap wavelet filters, for image sizes as large as k k pixels. On an FPGA implementation, clocked on 4 MHz, a processing performance of up to 0 Mpixels/second was measured. The wavelet compression engine allows two compression modes: a fixed compression ratio mode optimised for userdefined criteria and a fixed quantisation mode with user defined quantisation tables. The FlexWave-II implements a complete wavelet-based image compression engine, including (see Figure ): A programmable, block-based wavelet transform; Scaling and thresholding of the wavelet coefficients; A DC-coefficients stuffing unit; 4 parallel (to achive high throughput) AC-coefficient compression units containing a Shapiro-based EZT encoder, a 4-symbol arithmetic coder for the dominant symbols, a serial-parallel conversion unit for the subordinate symbols, packet collectors for both the dominant and the subordinate data streams.

Author: J. Bormans Page 3 of 2005/08/0 Address Control Data Address DataIn DataOut Control External Interface ObjFifo Embedded Zero Tree coder Scalar Fifo Scalar Fifo Arithmetic coder Serial parallel convertor Scalar Fifo Scalar Fifo Packet Collector Packet Collector WT coefficients coder (2) WT coefficients coder (3) INPUT Local Wavelet Transform ObjFifo Quantisation Unit WT coefficients coder (4) Scalar Fifo DC-image coder Packet Assembler Figure : Top-level block diagram of the FlexWave-II Functional Flexibility The FlexWave-II was designed with flexibility in mind to be able to deal with a great variety of image compression contexts. The main enabling element is a programmable Local Wavelet Transformation engine (see below) that performs a block-based wavelet computation, which is completed by parameterisable functional blocks. The FlexWave-II allows for:. A programmable blocksize (4 4, 8 8, 6 6 or 32 32) 2. A programmable number of transformations (2, 3, 4 or 5) 3. A programmable wavelet filter (5/3, 9/3 or 9/7) all implemented with the lifting scheme (= lossless transform) 4. Programmable wavelet transform allowing strip-based processing required for pushbroom operation. 5. Programmable input pixel accuracy (up to 2 bit) 6. Programmable thresholding and scaling of the wavelet coefficients 7. Programmable quantisation of the wavelet coefficients Only the computation itself is block-based, the algorithmic behaviour is kept frame-based, i.e. no tiling artefacts are introduced.

Author: J. Bormans Page 4 of 2005/08/0 Local Wavelet Transform Engine FlexWave-II s Local Wavelet Transform (LWT) is a superscalar architecture with dedicated processing modules, interchanging their data along very localised data transfer busses for high-speed processing. The architecture has been tailored to an almost 00% utilisation factor of the DWT filter. Table : Memory requirements in function of the image dimensions for the Row Column (RC) and the Local Wavelet Transform (LWT) It is important to note that this block processing provides functionally/algorithmically identical results as classical wavelet transform techniques, e.g., the Row-Column (RC) wavelet transform. As a consequence, no tiling or blocking artefacts occur, while achieving very significant memory gains, as confirmed by Table, comparing the LWT to RC with respect to memory requirements. Because the LWT processes rows of blocks, the memory requirements are only proportional to the image width and are independent of the image height. Consequently, images with an infinite height (= push-broom processing) can be dealt with.

Author: J. Bormans Page 5 of 2005/08/0 Figure 2: High-Level LWT architecture The LWT data transfer centric implementation is constructed around three main parts shown in Figure 2: the hierarchical memory architecture, the hierarchical controller architecture and the filter block. FlexWave-II Implementation Results & Tests The FlexWave-II is available as a VHDL design kit, including all the necessary test benches and the programming environment for the LWT engine (to generate the appropriate LWT instructions for a given set of parameters). Figure 3: Alpha Data ADM-XRC board For testing the FlexWave-II implementation an Alpha Data ADM-XRC board has been used (see Figure 3). The features that are present on this board are: Xilinx Virtex 2000E FPGA, Four external memories of 52K 36 bits,

Author: J. Bormans Page 6 of Access to the PCI bus, and Two independent and programmable clocks. 2005/08/0 Two external memories are used for temporary storage of intermediate results of the LWT block, the two remaining memories are used for data IO between the FPGA and the PC, one containing the input image, the other containing the compressed data. The mapping results, including the interface logic to the PCI bus and the external memories, are listed in Table 2. Additionally, a study proved the feasibility of an ASIC implementation in the UMC EuroPractice 0.8mm technology. Table 2: FlexWave-II FPGA implementation report Table 3 gives an overview of the synthesis results obtained for a clock period of 20 ns (50 MHz clock) and a RAM read delay of 5 ns. Synthesis was performed using worst case operating conditions and estimated RAM delays. Table 3: Synthesis results for 0.8mm UMC technology. The numbers are obtained for a clock period of 20 ns, the RAM delays were set to 5 ns. The figures listed here are obtained using inaccurate RAM models and do not take layout back annotation into account.

Author: J. Bormans Page 7 of 2005/08/0 After synthesis, timing analysis has been performed for the available operating conditions (Best Case, Typical and Worst Case). These timing results, summarised in Table 4, were obtained without layout back annotation. Table 4: Timing analysis results for the available operating conditions. Figure 4 shows the configuration used for testing the FlexWave-II implementation. The FlexWave-II core together with an interface has been implemented on the board s FPGA. The interface is allowing the software to perform the following actions: Access the parameters of the FlexWave-II trough the PCI Bus. Access to the two of the four memories either from software, trough the PCI Bus, or di-rectly from the FlexWave-II. These memories are used as the input and output memories. The clocks are also controlled from the software and set to the desired frequencies. Input Memory Output Memory Software PCI Bus Interface FlexWave II Xilinx Virtex 2000 E Board Alpha Data ADM-XRC Figure 5 FlexWave-II Test Configuration

Author: J. Bormans Page 8 of The following procedure for testing is applied in each case:. The input image is converted from row wise into block wise (Figure 2). 2005/08/0 2. The FPGA is programmed, clocks are set, and the FlexWave-II settings are downloaded. 3. Process Input Image. - The blocked image is downloaded into the Input Memory - Send to the FlexWave-II component the start signal - Wait for the processing to finish. FlexWave-II are written into the Output Memory - Retrieve the compressed data and the processing information 4. Decode and Compare processed data. - Sort the incoming data from the board. - Decode the sorted data. - Compare the decoded image with the original image. Both images should be identical. Figure 6 shows a FlexWave-II demonstration set-up whereby images from a single source are compressed in a scalable way by the FlexWave-II and sent to two terminals with different decoding capabilities. As a consequence, the high-end terminal is able to decode the images at a higher quality than the low-end terminal without the need for the encoding process to be aware of this. FlexWave-II and CCSDS IMEC has supported ESA during the standardisation of the Consultative Committee for Space Data Systems (CCSDS) Image Data Compression activity. This initiative aimed at extending the range of available tools to compress images beyond lossless only techniques (such as the Rice algorithm), so as to be suited for a wide range of applications.

Author: J. Bormans Page 9 of 2005/08/0 FlexWave-II developments have helped convincing CCSDS to adopt a wavelet-based scheme (Figure 7). Figure 6: FlexWave-II demonstration set-up Input Data Discrete Wavelet Transform Bit-Plane Encoder Coded Data Figure 7: General Schematic of the CCSDS Coder

Author: J. Bormans Page 0 of 2005/08/0 The entropy coder is not a zero-tree coder as is the case in FlexWave-II, but an NASA originated Bit-Plane Encoder (BPE). IMEC has tuned the LWT processor so that it can seamlessly interface with the BPE, by producing blocks of 8x8 (or possibly 6x6) wavelet transformed coefficients. The BPE encoder is then able to produce the (scalable) compressed bitstream.