HARDWARE BASED BINARY ARITHMETIC ENGINE

Similar documents
Optimized architectures of CABAC codec for IA-32-, DSP- and FPGAbased

EFFICIENT DEISGN OF LOW AREA BASED H.264 COMPRESSOR AND DECOMPRESSOR WITH H.264 INTEGER TRANSFORM

FPGA Implementation of Discrete Fourier Transform Using CORDIC Algorithm

International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

Pipelined Quadratic Equation based Novel Multiplication Method for Cryptographic Applications

Content-Based Adaptive Binary Arithmetic Coding (CABAC) Li Li 2017/2/9

Implementation of Pipelined Architecture Based on the DCT and Quantization For JPEG Image Compression

Design of Convolution Encoder and Reconfigurable Viterbi Decoder

University, Patiala, Punjab, India 1 2

FPGA Based Design and Simulation of 32- Point FFT Through Radix-2 DIT Algorith

Implementation of Double Precision Floating Point Multiplier on FPGA

Designing an Improved 64 Bit Arithmetic and Logical Unit for Digital Signaling Processing Purposes

High Performance Pipelined Design for FFT Processor based on FPGA

HDL Implementation of an Efficient Partial Parallel LDPC Decoder Using Soft Bit Flip Algorithm

Pipelined High Speed Double Precision Floating Point Multiplier Using Dadda Algorithm Based on FPGA

Efficient design and FPGA implementation of JPEG encoder

An FPGA Based Floating Point Arithmetic Unit Using Verilog

FPGA Implementation of 2-D DCT Architecture for JPEG Image Compression

Controller IP for a Low Cost FPGA Based USB Device Core

International Journal of Advanced Research in Computer Science and Software Engineering

FPGA Implementation of Low Complexity Video Encoder using Optimized 3D-DCT

FPGA Implementation of High Speed AES Algorithm for Improving The System Computing Speed

H.264 / AVC Context Adaptive Binary Arithmetic Coding (CABAC)

An Efficient Hardware Architecture for H.264 Transform and Quantization Algorithms

A MULTI ALPHABET ARITHMETIC CODING HARDWARE IMPLEMENTATION FOR SMALL FPGA DEVICES

Multi-level Design Methodology using SystemC and VHDL for JPEG Encoder

PERFORMANCE ANALYSIS OF HIGH EFFICIENCY LOW DENSITY PARITY-CHECK CODE DECODER FOR LOW POWER APPLICATIONS

Implementation of FFT Processor using Urdhva Tiryakbhyam Sutra of Vedic Mathematics

A High Speed Binary Floating Point Multiplier Using Dadda Algorithm

Performance Analysis of CORDIC Architectures Targeted by FPGA Devices

FPGA Implementation of Image Compression Using SPIHT Algorithm

FPGA Implementation of High Performance Entropy Encoder for H.264 Video CODEC

Efficient Implementation of Low Power 2-D DCT Architecture

Fig.1. Floating point number representation of single-precision (32-bit). Floating point number representation in double-precision (64-bit) format:

FPGA Implementation of a High Speed Multistage Pipelined Adder Based CORDIC Structure for Large Operand Word Lengths

Matrix Manipulation Using High Computing Field Programmable Gate Arrays

DESIGN AND IMPLEMENTATION OF 32-BIT CONTROLLER FOR INTERACTIVE INTERFACING WITH RECONFIGURABLE COMPUTING SYSTEMS

Review on Floating Point Adder and Converter Units Using VHDL

Implementation of Optimized ALU for Digital System Applications using Partial Reconfiguration

Area Efficient SAD Architecture for Block Based Video Compression Standards

Design and Simulation of 32 bit Floating Point FFT Processor Using VHDL

A Dedicated Hardware Solution for the HEVC Interpolation Unit

Power Optimized Programmable Truncated Multiplier and Accumulator Using Reversible Adder

ISSN Vol.02, Issue.11, December-2014, Pages:

Figurel. TEEE-754 double precision floating point format. Keywords- Double precision, Floating point, Multiplier,FPGA,IEEE-754.

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

HCTL Open Int. J. of Technology Innovations and Research HCTL Open IJTIR, Volume 4, July 2013 e-issn: ISBN (Print):

DESIGN AND IMPLEMENTATION OF SDR SDRAM CONTROLLER IN VHDL. Shruti Hathwalia* 1, Meenakshi Yadav 2

Improved Context-Based Adaptive Binary Arithmetic Coding in MPEG-4 AVC/H.264 Video Codec

Design and Optimized Implementation of Six-Operand Single- Precision Floating-Point Addition

High-Performance VLSI Architecture of H.264/AVC CAVLD by Parallel Run_before Estimation Algorithm *

FPGA Implementation of 4-Point and 8-Point Fast Hadamard Transform

HIGH-PERFORMANCE RECONFIGURABLE FIR FILTER USING PIPELINE TECHNIQUE

FPGA based High Performance CAVLC Implementation for H.264 Video Coding

Comparison of pipelined IEEE-754 standard floating point multiplier with unpipelined multiplier

Design and Implementation of 3-D DWT for Video Processing Applications

An Efficient FPGA Implementation of the Advanced Encryption Standard (AES) Algorithm Using S-Box

Design and Simulation of UART for Serial Communication

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Design and Simulation of Pipelined Double Precision Floating Point Adder/Subtractor and Multiplier Using Verilog

N RISCE 2K18 ISSN International Journal of Advance Research and Innovation

Hardware Implementation of Cryptosystem by AES Algorithm Using FPGA

Design of 2-D DWT VLSI Architecture for Image Processing

EFFICIENT RECURSIVE IMPLEMENTATION OF A QUADRATIC PERMUTATION POLYNOMIAL INTERLEAVER FOR LONG TERM EVOLUTION SYSTEMS

32 bit Arithmetic Logical Unit (ALU) using VHDL

A Modified Radix2, Radix4 Algorithms and Modified Adder for Parallel Multiplication

I. Introduction. India; 2 Assistant Professor, Department of Electronics & Communication Engineering, SRIT, Jabalpur (M.P.

Design and Implementation of Lossless Data Compression Coprocessor using FPGA

Optimized Design and Implementation of a 16-bit Iterative Logarithmic Multiplier

An FPGA Implementation of 8-bit RISC Microcontroller

An Efficient Design of Vedic Multiplier using New Encoding Scheme

Design and Implementation of Hamming Code on FPGA using Verilog

HARDWARE IMPLEMENTATION OF LOSSLESS LZMA DATA COMPRESSION ALGORITHM

A COMPARISON OF CABAC THROUGHPUT FOR HEVC/H.265 VS. AVC/H.264. Massachusetts Institute of Technology Texas Instruments

An Implementation of Double precision Floating point Adder & Subtractor Using Verilog

FPGA Provides Speedy Data Compression for Hyperspectral Imagery

FPGA Implementation of ALU Based Address Generation for Memory

[Sahu* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116

16 BIT IMPLEMENTATION OF ASYNCHRONOUS TWOS COMPLEMENT ARRAY MULTIPLIER USING MODIFIED BAUGH-WOOLEY ALGORITHM AND ARCHITECTURE.

An Efficient Implementation of Floating Point Multiplier

Implementation of Floating Point Multiplier Using Dadda Algorithm

THE INTERNATIONAL JOURNAL OF SCIENCE & TECHNOLEDGE

High-Throughput Entropy Encoder Architecture for H.264/AVC

Adaptive FIR Filter Using Distributed Airthmetic for Area Efficient Design

FPGA IMPLEMENTATION FOR REAL TIME SOBEL EDGE DETECTOR BLOCK USING 3-LINE BUFFERS

Design and Implementation of VLSI 8 Bit Systolic Array Multiplier

FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST

Design of a High Speed CAVLC Encoder and Decoder with Parallel Data Path

Fused Floating Point Arithmetic Unit for Radix 2 FFT Implementation

Huffman Coding Author: Latha Pillai

Three Dimensional Motion Vectorless Compression

Vendor Agnostic, High Performance, Double Precision Floating Point Division for FPGAs

PROJECT REPORT IMPLEMENTATION OF LOGARITHM COMPUTATION DEVICE AS PART OF VLSI TOOLS COURSE

FPGA IMPLEMENTATION OF BIT PLANE ENTROPY ENCODER FOR 3 D DWT BASED VIDEO COMPRESSION

Design of Double Precision Floating Point Multiplier Using Vedic Multiplication

DESIGN AND IMPLEMENTATION OF VLSI SYSTOLIC ARRAY MULTIPLIER FOR DSP APPLICATIONS

High Speed Special Function Unit for Graphics Processing Unit

Implementation of A Optimized Systolic Array Architecture for FSBMA using FPGA for Real-time Applications

DESIGN AND IMPLEMENTATION OF APPLICATION SPECIFIC 32-BITALU USING XILINX FPGA

Double Precision Floating-Point Multiplier using Coarse-Grain Units

Transcription:

HARDWARE BASED BINARY ARITHMETIC ENGINE 1 N. SARANYA, 2 DHIRENDRA KUMAR TRIPATHI, 3 DR. R. MUTHAIAH 1 Assiciate System Engineer, IBM India Pvt., Ltd.,, Banglore, India-560066 2 Asst. Prof., Department of Information and Technology, SASTRA University, Thanjavur, India-613401 3 Assoc. Prof., Department of Information and Technology, SASTRA University, Thanjavur, India-613401 E-mail: saran11sastra@gmail.com, dkt@core.sastra.edu, sjamuthaiah@core.sastra.edu ABSTRACT Context Based Binary Arithmetic Coding (CBAC) is a part of JZ profile of Audio Video Coding Standard (AVS).The goal of this paper is to present the efficient hardware based binary arithmetic coder which is the main part of binarisation involved in CBAC of AVS. This paper explains about the efficient arithmetic coding involved in the video transcoding. The major concerns of using JZ profile of AVS is movie compression for high-density storage and relatively higher computational complexity can be tolerated at the encoder side to provide higher video quality. This arithmetic coder avoids the slow multiplication, here the traditional arithmetic calculation is transformed to software domain using log. The proposed arithmetic engine will give high compression gain and an efficient coding design for handling the high resolution video sequence. This is used for the effective pipelining process and increase the overall processing speed. The implementation of arithmetic coder in CABAC (Context Adaptive Binary Arithmetic Coding) architecture is satisfying the CBAC feature. The study of this paper is required to know about the process of efficient AC(arithmetic coding) and the hardware based arithmetic engine in CBAC. The CBAC engine is implemented on Xilinx Virtex-5 ML501 Board. The synthesis and simulation results are presented. The proposed architecture can work with the 31.384MHz rate. Keywords: AC,CBAC,CABAC,AVS,Videotranscoding,Binarisation,Qcoder 1. INTRODUCTION The different types of techniques are used for data compression applications and it has different degrees of complexity. This type of techniques uses some common process. This follows the common processes of Numerical Processing, Logical Processing, Source Modeling and Entropy Coding[1]. At first the original data is given as an input, then numerical processing takes place which will have the processes like predictive coding and linear transform for the images. Logical Processing will change the data to a suitable form for compression. The variation in the statistical properties of data is given by source modeling. It is responsible for the identified data contexts and the statistics gathered. This makes the source model accurate. The final process is entropy coding. This is the process of representation of image in compact form. In different entropy coding method, arithmetic coding places the best vide in effectiveness in compression and effectiveness. CBAC in AVS belongs to Q coder family[2]. This belongs to the JZ profile of AVS[3].It offers an average of 13% bit saving compared with CABAC[4]. It adopts the simpler binarisation method and the context selection scheme. In addition to this, the traditional multiplication is transformed to the logarithmic domain. This produces the variation from CABAC. 2. CODING STAGES OF CABAC Figure 1 shows the architecture for the CABAC architecture. 1. Before the process of arithmetic coding, a nonbinary valued symbol is converted into binary valued symbol. 2. The binary valued symbol is encoded by the arithmetic coder before the transmission. 3. The probability model for the binary value of symbol is given as a context model. 4. Depending on the recently coded symbol the particular model is selected for coding process. It finally stores the values as 1 or 0. 5. Then the arithmetic coder will encode the binary value of the selected model. 6. The selected context model is updated depending on the actual coded value. 128

Figure 1: CABAC Architecture 3. PURPOSE OF ARITHMETIC CODING Arithmetic coding generates the non-block codes when compared to the Variable Length Coding. In this coding process the single arithmetic code is assigned to the entire sequence of symbol. Rounding errors are produced by Huffman coding, hence it has the restriction in code length to multiples of a bit. This deviation due to error is much higher when compared to the inaccuracies of arithmetic coding. 4. PRINCIPLE OF ARITHMETIC CODING The Principle of arithmetic coding depends on the recursive subdivision of intervals. Every binary symbol is assigned to the specific context model, which is updated during the whole coding process for the adaptive probability estimation. First the interval is subdivided in to two subintervals rlps and rmps. rlps is the range value belongs to LPS and rmps is the range belong to mps. PLPS is the probability estimation value of LPS. By checking whether the given bin to be encoded is MPS or LPS, the specific subinterval is selected as the current new interval. This coding process continuously updates two registers (i.e) R is assign to the range of the interval and the code Register C is assign to the lower value of interval. The arithmetic decoding process is the reverse process of encoding first the context model of bin value which is to be decoded is selected. Then the bin value is decoded by the arithmetic decoder engine. The context models are also continuously updated during the whole decoding process. This process is same as that of the encoding engine. In this decoder arithmetic engine the register range R which have the current interval and the register offset which have the position of input bit stream. If the binary value which is decoded as MPS of LPS is decided by the position of offset falls whether rlps or rmps range. These 2 registers are continuously updated. 5. PREVIOUS AC ENGINE In CABAC[5], the arithmetic calculation of rlps and probability estimation updates are estimated by using the look up table. This process requires large memory and it has less coding efficiency. The proposed method optimize the area versus speed processing in following way. In oreder to reduce the FPGA synthesized area at first a limited number of probability estimation units are implemented on hardware and they are repeatedly used. This saves area as compared to full parallel architecture for the probability estimation. The second optimization is selected is implementation of logarthimic engine using hardware multiplier. This makes logarithmic calculation much faster hence there will be overall increase in the speed CABAC engine. 6. STEPS OF AC ENGINE 1. Values of rlps and rmps subintervals are calculated. 2. Finite precision takes place during the whole coding process using renormalization. 3. Probability estimation is updated adaptively. by Step:1 rlps and rmps can be calculated rlps =R * PLPS (1) rmps =R rlps (2) or rmps =R * PMPS (3) rlps = R - rmps (4) 129

rlps is assign to upper interval and assign to lower interval. rmps is Step:2 At the time of encoding process, the register R is left shifted to make the MSB of the new interval R is always 1. At the time during decoding process the range and offset values of registers are left shifted to make the value always 1. Renormalization process has the leading 1 detection circuit and its responsible for the critical path delay. Step:3 As the third step, the probability estimation of binary source takes place. Here P is the probability of symbol 1 and q is the probability of symbol 0 and for the adjusting the adaptation speed parameter N is used, then here pk & qk are the probability estimation of 1 and 0 after the certain K events, the probability estimation after K+1 event is calculated b] p k+1=( N * p k ) / N+1 (if 0 occurs) (5) q k+1 =( N * q k ) / N+1 (if 1 occurs) (6) If PLPS & PMPS are used for marking probability of LPS and MPS, then can be written as (if LPS occurs) PMPS _ new = f *( PMPS_old ) (7) (if MPS occurs) PLPS_new = f *(PLPS_old) (8) 7. HARDWARE BASED BINARY ARITHMETIC ENGINE The Logarithm is used in CBAC of AVS to transform the slow multiplication into addition and subtraction. This hardware based Arithmetic Engine is designed by using this steps. The range value of logarithmic domain is needs to be updated using the subtraction when most probable symbol (MPS) occurs. The range of subinterval is given as range and the offset position is given as offset [6][7]. MPS Rang new = Range * pmps (9) Offset new = Offset (10) By using log the above equation is transformed to Lg_Rangnew=Lg_Range Lg_PMPS (11) Offsetnew = offset (12) Here PMPS is the probability of MPS and Lg-x is log value of x. The value of logarithm domain is transformed to the original domain when least probable symbol occurs[1] LPS Range new = Range - (PMPS * Range) = Range - rmps (13) Offset new = Offset + (Range*PMPS) = Offset + rmps (14) rmps gives the subinterval size associated to MPS. u(0) rmps PMPS rlps LPS l(0) rlps Figure 2: Range Interval Range Then the Probability adaptive estimation process is given by the equation[5] Initial Value: l(0)= 0, u(0)= 1 l(0) is the initial value of LPS (ie) the least value of lower interval and u(0) is the initial value of MPS (ie) the greatest value of upper interval. This l(0) and u(0) values are updated adaptively using the formula given below Interval Update: l(n)= l(n-1)+[u(n-1)-l(n-1)]f x (x(n-1)) (15) u(n)= l(n-1)+[u(n-1)-l(n-1)]f x (x(n)) (16) During the process of renormalization, the range value and offset value are left shifted to make MSB to be 1. The difference in the number of bits shifted between range and offset values are stored. This is done for the consecutive decoding cycles. 8. IMPLEMENTATION AND RESULTS This proposed Arithmetic Engine is simulated and synthesized using XILINX 12.1 ISE tool[8]. This is implemented using VHDL language. For implementing Xilinx Virtex-5 ML501 Board is used[9].this arithmetic engine operates at the frequency of 31.834MHz.The time latency required by the output after the clock is 7.430ns. 130

In order to synthesize the CABAC engine fixed point binary number system is adopted. In this eight bit binary system is used. In this case for example.45 is converted into the.45x255 116. The most critical part of the CABAC engine is logarithmic engine. In literature there are two popular way are given to implement log engine in the hardware. The first one is CORDIC based approach another is the series approximation [10]. Though the CORDIC based approach can be implemented with the less FPGA resources however the major limitation is it takes more time to calculate a Log value which will be serious limitation in real time processing. The second approach require more multipliers however it is very fast as compared to CORDIC. Since FPGA has many hardware multipliers available in its fabric, hence second approach is chosen for hardware implementation of logarithmic engine.using arithmetic engine the interval range of upper limit and lower limit of input is retrieved as an output.this interval updation takes place by analyzing whether the input lies in LPS or MPS range. The post layout simulation results are shown in figure 3 to figure 9. Figure 3 shows the log Value of the given 8 bit number. Figure 4 shows that given input lies in the range of mps.figure 5 and 6 shows further updation of the both lower and upper interval of MPS. Figure 7 shows that given input lies in the range of LPS. Figure 8and 9 shows further updation of the both lower and upper interval of LPS. Table 1. DEVICE UTILISATION SUMMARY LOGIC UTILISATION Slice Registers USED AVAILABLE %USED 212 13312 1% 4 I/P LUTs 373 26624 1% FF pairs 140 26624 0% Bonded IOBs 61 487 12% No. of Multipliers 13 32 40% Gclk 1 8 12% 9. CONCLUSION In this paper, we present the hardware based Arithmetic Engine which combined both original and logarithmic domain. This improves the coding efficiency and increases the overall processing speed. The device utilization summary(table 1) gives the details about the efficient use of the devices. From the whole block of image, the sub block of image which is specified can be efficiently found out by using the adaptive updation in hardware based binary arithmetic engine.this is useful to get the higher resolution image. In future this Arithmetic Engine can be designed with the software and hardware co-design approach. Further research about AC is currently underway mainly targeting towards two directions. 1. To build more accurate statistical models for source, which is important for coding gain. 2. To explore cost-effective VLSI architectures of AC codec. This arithmetic coding will be further developed to overcome the challenge in process of multi bin per cycle for the demand of encoding and decoding of the high resolution picture. The use of this hardware based binary arithmetic engine will further increases the video quality with high resolution. REFERENCES [1]. Amir Said, Introduction to arithmetic Coding and Practice,Imaging Systess Laboratory HP Laboratories Palo Alto,HPL-2004-76, April 21, 2004. [2]. Study and comparison of H.264/MPEG4 part10 AVC main profile with AVS P2 Jizhun profile p-p-t. [3]. Arithmetic coding for data Compression Ian H.Witten, Radford M.Neal and John G.Cleary Junhao Zheng, WenGa DavidWu, DonXie An efficient VLSI architecture for CBAC of AVS HDTV decoder. [4]. W.B.Pennebaker,J.L.Mitchell, An overview of the basic principles of the Q-Coder adaptive binary arithmetic coder, IBM Journal of Research and Development 32(1988)717 726. [5]. W.Yu,Y.He, Arithmetic Codec on Logarithm Domain, Picture Coding Symposium,2006 [6]..D.Marpe, H.Schwarz, T.Wiegand, Contextbased adaptive binary arithmetic coding in the H.264/AVC video compression standard, IEEE Transaction Circuits and Systems for Video Technology 13 (2003) 620 636. [7]. Xilinx: Virtex-5 Family overview DS100(v5.0) February 6, 2009. [8]. Xilinx: ISIM User Guide UG660(v11.3) September 16, 2009. [9]. Uwe Meyer- Baese, Digital Signal Processing with Field Programmable Gate Arrays,Springer,2007 [10]. 131

AUTHOR PROFILES: N. SARANAYA received the B.E. degree in electronics & communication engineering. She received her M.Tech in VLSI Design form SASTRA University, Thanjavur, India. Currently she is with the Tata Consulatncy Services as embedded system designer.her interests include VLSI based Arithmetic Coding, Image processing and embedded system. DHIRENDRA KUMAR TRIPATHI received the B.E. in electronics & communication engineering from UPTU, in 2005. He completed his M.Tech. in VLSI at NIT Trichy,INDIA. Currently, he is an Assistant Professor at SASTRA University,Thanjavur,India. His interests are in MIMO, SDR and Coginitive Radio. Dr. R. MUTHAIAH received the B.E. degree in electronics & instrumentation engineering from Annamalai University, in 1989. He received M.E Power electronics & industrial drives. He received the Ph.D. degree in Image processing from SASTRA University. Currently, he is a Associate professor at School of computing in SASTRA University, Thanjavur, India. His research interests include VLSI based Image processing and Signal processing. 132

FIGURE 3 LOG VALUE OF THE INPUT FIGURE 4 INPUT LIES IN RANGE OF MPS FIGURE 5 UPDATION OF LOWER INTERVAL FOR MPS FIGURE 6 UPDATION OF UPPER INTERVAL FOR LPS 133

FIGURE 7 INPUT LIES IN RANGE OF LPS FIGURE 8 UPDATION OF LOWER INTERVAL FOR MPS FIGURE 9 UPDATION OF UPPER INTERVAL FOR MPS 134