Parallel Inverse Halftoning by Look-Up Table (LUT) Partitioning

Similar documents
A Binarization Algorithm specialized on Document Images and Photos

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Parallelism for Nested Loops with Non-uniform and Flow Dependences

TN348: Openlab Module - Colocalization

An Optimal Algorithm for Prufer Codes *

The Codesign Challenge

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

High-Boost Mesh Filtering for 3-D Shape Enhancement

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Enhanced Watermarking Technique for Color Images using Visual Cryptography

Hybrid Non-Blind Color Image Watermarking

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

Enhanced AMBTC for Image Compression using Block Classification and Interpolation

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

Analysis of Continuous Beams in General

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

Image Representation & Visualization Basic Imaging Algorithms Shape Representation and Analysis. outline

Concurrent Apriori Data Mining Algorithms

RADIX-10 PARALLEL DECIMAL MULTIPLIER

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines

Floating-Point Division Algorithms for an x86 Microprocessor with a Rectangular Multiplier

CMPS 10 Introduction to Computer Science Lecture Notes

An Entropy-Based Approach to Integrated Information Needs Assessment

Specifications in 2001

3D vector computer graphics

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION

Problem Set 3 Solutions

Color and Printer Models for Color Halftoning

Outline. Digital Systems. C.2: Gates, Truth Tables and Logic Equations. Truth Tables. Logic Gates 9/8/2011

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

Multi-stable Perception. Necker Cube

Lecture 5: Multilayer Perceptrons

News. Recap: While Loop Example. Reading. Recap: Do Loop Example. Recap: For Loop Example

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

A fault tree analysis strategy using binary decision diagrams

A SYSTOLIC APPROACH TO LOOP PARTITIONING AND MAPPING INTO FIXED SIZE DISTRIBUTED MEMORY ARCHITECTURES

Video Proxy System for a Large-scale VOD System (DINA)

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

Circuit Analysis I (ENGR 2405) Chapter 3 Method of Analysis Nodal(KCL) and Mesh(KVL)

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

THE low-density parity-check (LDPC) code is getting

Cluster Analysis of Electrical Behavior

Constructing Minimum Connected Dominating Set: Algorithmic approach


Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach

Load Balancing for Hex-Cell Interconnection Network

Positive Semi-definite Programming Localization in Wireless Sensor Networks

Array transposition in CUDA shared memory

A Lossless Watermarking Scheme for Halftone Image Authentication

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Related-Mode Attacks on CTR Encryption Mode

FPGA Based Fixed Width 4 4, 6 6, 8 8 and Bit Multipliers using Spartan-3AN

Support Vector Machines

Wavefront Reconstructor

Hermite Splines in Lie Groups as Products of Geodesics

Area Efficient Self Timed Adders For Low Power Applications in VLSI

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

Design and Implementation of an Energy Efficient Multimedia Playback System

A RECONFIGURABLE ARCHITECTURE FOR MULTI-GIGABIT SPEED CONTENT-BASED ROUTING. James Moscola, Young H. Cho, John W. Lockwood

3 Image Compression. Multimedia Data Size/Duration Kbits Telephone quality speech. A Page of text 11 x 8.5

Classifier Selection Based on Data Complexity Measures *

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

MOTION PANORAMA CONSTRUCTION FROM STREAMING VIDEO FOR POWER- CONSTRAINED MOBILE MULTIMEDIA ENVIRONMENTS XUNYU PAN

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

The stream cipher MICKEY-128 (version 1) Algorithm specification issue 1.0

Private Information Retrieval (PIR)

Shape-adaptive DCT and Its Application in Region-based Image Coding

Distance Calculation from Single Optical Image

Programming in Fortran 90 : 2017/2018

X- Chart Using ANOM Approach

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap

PYTHON IMPLEMENTATION OF VISUAL SECRET SHARING SCHEMES

A New Approach For the Ranking of Fuzzy Sets With Different Heights

Computational ghost imaging using a fieldprogrammable

S1 Note. Basis functions.

Convolutional interleaver for unequal error protection of turbo codes

THE PULL-PUSH ALGORITHM REVISITED

Esc101 Lecture 1 st April, 2008 Generating Permutation

Module Management Tool in Software Development Organizations

Assembler. Building a Modern Computer From First Principles.

High level vs Low Level. What is a Computer Program? What does gcc do for you? Program = Instructions + Data. Basic Computer Organization

The Research of Ellipse Parameter Fitting Algorithm of Ultrasonic Imaging Logging in the Casing Hole

CHAPTER 4 PARALLEL PREFIX ADDER

CHAPTER 2 DECOMPOSITION OF GRAPHS

Parallel matrix-vector multiplication

Memory Modeling in ESL-RTL Equivalence Checking

Color Image Segmentation Using Multispectral Random Field Texture Model & Color Content Features

Mathematics 256 a course in differential equations for engineering students

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

3D Virtual Eyeglass Frames Modeling from Multiple Camera Image Data Based on the GFFD Deformation Method

Data Hiding and Image Authentication for Color-Palette Images

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

Transcription:

Parallel Inverse Halftonng by Look-Up Table (LUT) Parttonng Umar F. Sddq and Sadq M. Sat umar@ccse.kfupm.edu.sa, sadq@kfupm.edu.sa KFUPM Box: Department of Computer Engneerng, Kng Fahd Unversty of Petroleum & Mnerals, Dhahran Saud Araba Telephone: +9--0 099 Fax: +9--0 9

Abstract Look-Up Table (LUT) method for nverse halftonng s computaton less, fast, and also yelds goods results. It employs a sngle LUT that s stored n a ROM and contans pre-computed contone (gray level) values for nverse halftone operaton. Ths paper proposes an algorthm that can perform parallel nverse halftone operaton by parttonng the sngle LUT nto N smaller Look-Up Tables (s-luts). Thereby, upto k (k N) pxels can be concurrently fetched from the halftone mage, and ther contone values can also be fetched concurrently from separate smaller Look-Up Tables (s-lut). The parallelzaton ncreases the speed of nverse halftonng by upto k tmes whle the total entres n all s-luts remans equal to the entres n the sngle LUT of the seral LUT method. Some degradaton n mage qualty s possble due to pxel loss durng parallel fetchng. Ths s due to some contone values cannot be fetched n the same cycle because some other contone value s beng fetched from the s-lut. The complete mplementaton of the algorthm requres two CPLD devces for computatonal porton, external content addressable memores (CAM) and statc RAMs to store s-luts. Keywords: () Inverse Halftonng, () Hardware Implementaton, () Look-Up Table Inverse Halftonng, () Complex Programmable Logc Devces (CPLD), () Image Processng, () Parallelzng, () Complex Programmable Logc Devces (CPLD).

. Introducton The process of rendton of contnuous tone pctures on meda on whch only two levels can be dsplayed s defned as Halftonng []. The problem has ganed mportance snce the tme of prntng press when attempts were made to prnt mages on paper by adjustng the sze of dots accordng to the local prnt ntensty. Ths process s termed as analog halftonng. Wth the avalablty and adopton of b-level devces such as fax machnes and plasma dsplays, dgtal halftonng has become mportant []. The nput to a dgtal halftonng system s a gray level mage n whch pxels have more than two levels (e.g. levels), and the result of the halftonng process s an mage that has only two levels.e., or 0. Inverse halftonng on the other hand, s the reconstructon of gray level mages from halftone mages. Inverse halftone operaton fnds applcaton n areas where processng s requred on prnted mages. The mages are frst scanned, nverse halftoned and then operatons lke zoomng, rotaton and transformaton are appled. Standard compresson technques cannot process halftones drectly and therefore nverse halftonng s requred before compresson of prnted mages can be performed []. Look-Up Table (LUT) nverse halftonng s a fast and low computaton method []. LUT nverse halftonng was frst ntroduced by Netraval and Bowen [], but requres some nformaton to be known that s not always avalable for halftone mages. Subsequently Tng and Rskn [] proposed another LUT method but dd not amed to obtaned good qualty. In the recent past a computaton free LUT method was proposed by Mese and Vadyanathan [, ]. It can provde fast nverse halftonng wth good mage qualty, and t can be appled on several dfferent halftones. Two more methods for LUT nverse halftonng [, ] were suggested by Kuo-Lang Chung et al. and P. C. Chang et al. and ther methods can gve better

mage qualty but they are not completely computaton free and requre computaton n addton to Look-Up Table (LUT) access. In Mese et al. method, one template that conssts of the pxel to be nverse halftoned, and pxels n ts neghborhood s fetched from the halftone mage n a p-bts (p=,, ) vector and s used to form the address for the LUT. Its precomputed contone value s fetched from ths address of the LUT. However, ths method s seral and s able to nverse halftone only one template at a tme. In ths paper, we present an algorthm that can perform parallel nverse halftone operaton by parttonng the sngle LUT of Mese at al. method nto N number of smaller Look-Up Tables (s-luts). The N s-luts contan total entres equal to the entres n the sngle LUT of seral LUT method. In ths way, the proposed algorthm can provde sgnfcant advantage n speed of nverse halftone operaton and at the same tme provdes savng n memory requrements. In the proposed algorthm, k templates are concurrently fetched from the halftone mage and ther contone values are obtaned through s-luts. The rest of ths paper s organzed as follows: Frst the seral LUT method s descrbed then the parallelzaton of LUT method for nverse halftonng s descrbed n detal that bascally employs parttonng the LUT based on some parttonng. Ths s followed by the smulaton of the proposed algorthm and dscusson about ts performance. In the last secton, mplementaton detals of the proposed algorthm usng CPLD devces are dscussed. Look-Up Table (LUT) Method for Inverse Halftonng In the LUT method for nverse halftonng a template represented by t s a group of pxels that conssts of pxel to be nverse halftoned and the pxels n ts neghborhood. The LUT method uses three types of templates namely: pels, 9pels and Rect. The pels conssts of -pxels, 9pels conssts of 0-pxels and Rect conssts of pxels. The templates are

fetched from the halftone mage followng the raster-scan style,.e., from left to rght, n a row and from top to bottom. One pxel wth surroundng ones (so called a template (t)) s fetched and nverse halftoned before the next template s fetched. The Look-Up Table (LUT) stores pre-computed contone values of a large number of templates. The templates for storage n the LUT are obtaned from a tranng set of mages that comprse halftone mages and correspondng contnuous tone mages. The templates are fetched from the halftone mages and ther contone values are fetched from correspondng contnuous tone mages. When a template occurs more than once then ts contone value s the mean of all contone values that corresponds to that template n the tranng set. The nverse halftone operaton s performed n ths way that a template (t) s fetched from the halftone mage and t s sent to the Look-Up Table (LUT). If the LUT has the stored contone value for the template (t) t returns t otherwse the template (t) undergoes through anyone of these methods: (a) Low Pass Flterng, or (b) Best Lnear Estmator []. When same halftone algorthm s used n tranng set mages and the mages gong through nverse halftone operaton then all templates always fnd correspondng contone value n the LUT, and consequently, ths method becomes completely computaton free. The LUT method for nverse halftonng can also be appled to color halftones where a separate LUT exsts for color planes R, G, and B. Parallel Look-Up Table (LUT) Inverse Halftonng In order to perform parallel LUT nverse halftonng, two or more templates should be fetched from the halftone mage and LUT (Look-Up Table) nverse halftone operaton s appled to them at the same tme. The man problems n parallelzng LUT method for nverse halftonng are the followng:

(a) The Look-Up Table (LUT) s composed of a sngle memory block that does not allow smultaneous access to more than one locaton. Therefore, parallel templates cannot fetch ther contone values at the same tme. (b) If the LUT method for nverse halftonng s parallelzed as t s then the memory requrements grow very large because one needs to store one template (t) for each template that s fetched n parallel. The next secton presents an algorthm to parallelze the LUT method for nverse halftonng whle solvng the above problems.. Algorthm to Perform Parallel LUT Inverse Halftonng Ths secton shows the algorthms that can perform parallel nverse halftone operaton by enhancng the seral LUT method of Mese and Vadyanathan []. In the proposed algorthm N smaller Look-Up Tables (s-luts) are used n place of a sngle LUT of seral LUT method. The proposed algorthm also ntroduces a crcutry that can dstngush k templates that are concurrently fetched from the halftone mage through unque numbers. As a result of these two modfcatons, k templates can be fetched concurrently and go through parallel nverse halftone operaton usng N s-luts and therefore ther contone values can be obtaned smultaneously. The proposed parallel nverse halfonng usng s-luts conssts of two steps: () Algorthm to generate N smaller Look-Up Tables (s-luts), and () Algorthm to send k concurrently fetched templates to dstnct s-luts. In the rest of ths secton the algorthms are descrbed n detal.. Idea behnd Proposed Algorthm The proposed algorthm to perform parallel nverse halftone operaton s based on the ntaton to partton the sngle LUT nto N smaller Look-Up Tables (s-luts). The

parttonng can be done lnearly or can use any sophstcated technque. In lnear parttonng the contents from the tranng set are assgned to s-luts based on some fxed crtera lke equal number of contents n all N s-luts. Ths approach has the problem that durng nverse halftone operaton t becomes dffcult to estmate whch template value exsts n whch s- LUT. The algorthm proposed n ths paper, nstead partton the LUT nto N s-luts by usng a new approach. The new approach s ntated when some halftone mages are observed and t s found that adjacent,.e., ether top-bottom, or left-rght template values dffer from each other n terms of number of ones present n them. The paper defnes a functon that takes XOR between the fetched template and m, where m s the mean of all template values present n the tranng set. Then the bts n the XOR result are added to calculate the number of ones. At ths pont a unque result s obtaned for each concurrently fetched template,.e., k unque value are obtaned. However, ther values vary from 0 to P -, whereas number of s-luts s N. In next step, mod N operaton s appled and numbers n range from 0 to N- are obtaned for all concurrently fetched templates. The graph n Fgure shows the percentage of tmes ths approach s successful s dstngushng concurrently fetched templates. Takng mod N s computaton free when N s an exponent of,.e.,,,,,, or etc then mod N operaton s performed by only keepng least sgnfcant log N-bts of the nput. Percentage of tmes the proposed approach s unable to dstngushly represent all concurrently fetched templates 0 0 0 0 0 0 0 0 0 0 0 0 0 number of s-luts,.e., N when k= when k= when k= Fgure : Graph showng performance of proposed approach to dstngush concurrently fetched templates.

The N s-luts wll be stored n N external memores and templates fetched from the halftone mage act as nput addresses to memores. Dstrbuton of templates among N s-luts should be unform so that memores of equal szes can be utlzed, however when s-luts do not have equal szes than large s-luts can be stored n more than one memores and n that case (number of memores) > N. Two other approaches that can be also be used are: () to add the bts n the templates and then take mod N, and () drectly take mod N of the fetched template. However, frst takng XOR wth m yelds the best mage qualty among them therefore the algorthm proposed n ths paper uses only ths approach.. Algorthm to generate N smaller Look-Up Tables (s-luts) N number of smaller Look-Up Tables (s-luts) must be generated before nverse halftone operaton s performed, smlar to the procedure of seral LUT method. The s-luts are numbered from 0 to N-, where N must be an exponent of,.e.,,,,, etc. The algorthm s shown n Fgure. It starts by buldng Tranng_set, that conssts of contnuous tone mages and ther halftone versons. In step, a template represented by t s fetched from the halftone mage. In step, frst the fetched templates t s taken XOR gated wth m, where m= mean of all template values present n the tranng set. Then bts n the obtaned result are added and fnally ts mod N operaton s taken by keepng only least sgnfcant log N-bts. The result now obtaned s the result of step. In the next step, template t s sent to s-lut that has same number as the result returned n step. Now procedure from step to step are repeated by fetchng another template from the tranng set and t contnuous untl all templates n the tranng set are fetched and stored n s-luts.

Fgure : Algorthm to generate smaller Look-Up Tables (s-luts).. Algorthm to send k concurrently fetched templates to dstnct s-luts Ths algorthm performs the task to assgn unque numbers n range from 0 to N- to k concurrently fetched templates and then send them to dstnct s-luts usng ther unque numbers. In ths way t can perform parallel nverse halftone operaton usng s-luts. The algorthm s shown n Fgure. It starts by fetchng k templates from the halftone mage n whch the templates are numbered from to k. Then n step, mod N operaton s performed on templates by keepng ther only sgnfcant log N-bts. In step, f any two or more templates have same value returned n step then among them only the template that has the hghest number assgned to t n step s kept and others havng same step s result are dscarded. The templates that are not dscarded are now sent to s-luts that have same numbers as ther values returned n step. In step, contone values to templates that were dscarded n step or the templates that do not fnd ther contone values n ther s-luts are assgned by copyng contone values from ther neghbors. Fnally contone values of all k fetched templates are delvered to the output. Ths process repeats untl all templates present n the halftone mage are nverse halftoned. Ths algorthm s ppelned therefore, each step 9

can be performed n parallel on dfferent data nputs and new k templates can be fetched on every clock cycle from the halftone mage. Fgure : Algorthm to perform parallel nverse halftonng usng s-luts.. Smulaton Ths secton shows smulaton of the proposed algorthm. The smulaton s performed by mplementng t n Java programmng language. It starts by buldng a tranng set of gray level and correspondng halftone mages. Then value of N s chosen and s-luts are generated. The parallel nverse halftone operaton s performed by settng dfferent values of k. In ths secton, frst the results of generatng s-luts are shown and then some halftone mages that are not present n the tranng set are nverse halftoned and are shown wth ther mage qualty n terms of Peak Sgnal to Nose Rato (PSNR).. Generaton of s-luts The s-luts are generated usng the tranng set and parttonng of templates among N s- LUTs s shown n Fgures and. Fgure shows the parttonng when N= and Fgure shows the parttonng when N=. It s shown that each s-lut stores almost equal number 0

of contents when N= and when N= large varaton n s-lut szes occur wth some s- LUTs remans empted. Number of templates present n s-luts 0000 9000 000 000 000 000 000 000 000 000 0 0 smaller Look-Up Tables (s-luts) Fgure : Dstrbuton of templates to s-luts (when N=). 0000 number of templates present n s-luts 9000 000 000 000 000 000 000 000 000 0 0 9 0 smaller Look-Up Tables (s-luts) Fgure : Dstrbuton of templates to s-luts (when N=).. Parallel Inverse Halftonng Ths secton shows the smulaton of the proposed algorthm to perform parallel nverse halftone operaton usng s-luts. The smulaton shows nverse halftone operaton accomplshed wth dfferent values of k and N. The graph n Fgure shows average mage qualty when compared to qualty of mages obtaned from the proposed algorthm n

terms of PSNR. The graph shows curves drawn for dfferent values of k and ther y-axs values.e., mage qualty vares wth ncrease n the value of N. However, N should an exponent of.e.,,,,, etc. The results show an average of results obtaned from mages: Boat, peppers, and clock. Some sample mages obtaned from the seral LUT method and from proposed algorthm are shown n Fgures to, along wth ther orgnal contnuous tone versons. Image qualty n terms of PSNR n db 0 0 0 0 0 0 0 number of smaller Look-Up Tables (s-luts).e., N When k= When k= When k= Fgure : Performance of the proposed algorthm n terms of mage qualty for dfferent values of k and N.

Fgure : Orgnal contnuous tone mage named peppers. Fgure : Peppers obtaned from seral LUT method, PSNR= 9. db. Fgure 9: Peppers obtaned from nverse halftone operaton usng proposed algorthm wth k= and N=, PSNR= 9.0 db.

Fgure 0: Orgnal contnuous tone mage named clock. Fgure : Clock obtaned fro seral LUT method, PSNR= 0. db. Fgure : Clock obtaned from nverse halftone operaton usng proposed algorthm wth k= and N=, PSNR= 0.0 db.

Fgure : Orgnal contnuous tone mage named boat. Fgure : Boat obtaned from seral LUT method, PSNR=.0 db. Fgure : Boat obtaned from nverse halftone operaton usng proposed algorthm wth k= and N=, PSNR=.9 db.

. Integrated Crcut Desgn The ntegrated crcut that can mplement the proposed algorthm wth parameters k= and N= s desgned. The target platform s Feld Programmable Gate Array (FPGA) devces. The crcut s dvded nto blocks and each block s represented through Boolean equatons and can work ndependently on dfferent nputs. The followng text shows the detals of the ntegrated crcut. Block : In ths stage four templates I 0, I, I and I are fetched from the halftone mage and stored n regsters t 0, t, t and t respectvely. The Boolean equatons representng logc n ths block are shown below: t 0 (0 p-) = I 0 (0 p-), t (0 p-) = I (0 p-), t (0 p-) = I (0 p-), t (0 p-) = I (0 p-) (.) Block : In ths block bts n each template are added usng Carry Save Adder (CSA) Tree. The Boolean equatons representng operatons n ths block are shown below and the CSA tree for N= and p=0 s shown n Fgure : S = CSA_TREE(t (0 p-)), Where = 0,,, &. (.) Fgure : Carry Save Adder (CSA) Tree when p=0 and N=.

Block : In ths block templates t 0, t, t and t are appended wth sequence numbers 00, 00, 0, and 00 respectvely. The Boolean expressons representng operatons n ths block are: t 0 (0 p+) = t 0 (0 p-) & 00, t (0 p+) = t (0 p-) & 00, t (0 p+) = t (0 p-) & 0, t (0 p+) = t (0 p-) & 00. (.) Block : Ths block conssts of four x multplexers. The Boolean expressons that represent the logc n ths block are: A [0](0...p + ) A [] (0...p + ) A [](0...p + ) A [](0...p + ) A [](0...p + ) A [](0...p + ) A [](0...p + ) A [](0...p + ) slut () slut () slut () slut () slut () slut () slut () slut () slut (0) (t '(0...p + )), slut () slut () slut () slut () slut () slut (0) (t '(0...p + )), slut (0) (t '(0...p + )), slut (0) (t '(0...p + )), slut () slut (0) (t '(0...p + )), slut () slut () slut (0) (t '(0...p + )), slut (0) (t '(0...p + )), slut (0) (t '(0...p + )), (.) In the above equatons = 0 to. =0 for frst de-multplexer, = for second de-multplexer and so on. The ndces A [0] to A [] represents outputs from a de-multplexer. Block : Ths block conssts of eght x multplexers that are connected to s-luts. The Boolean expressons that represent logc operatons n ths block are shown below: g (0..p + ) (A [ ] (p) + A [ ] (p + ) + A [ ] (p + )) A [ ] (0.. p + ) + (A [ ] (p) + A [ ] (p + ) + A [ ] (p + )) (A [ ] (p) + A [ ] (p + ) + A [ ] (p + )) A [ ] (0..p + ) + (A [ ] (p) + A [ ] (p + ) + A [ ] (p + )) (A [ ] (p) + A [ ] (p + ) + A [ ] (p + )) (A [ ] (p) + A [ ] (p + ) + A [ ] (p + )) A [ ] (0..p + ) + (A [ ] (p) + A [ ] (p + ) + A [ ] (p + )) (A [ ] (p) + A [ ] (p + ) + A [ ] (p + )) (A [ ] (p) + A [ ] (p + ) + A [ ] (p + )) (A [ ] (p) + A [ ] (p ) + A [ ] (p + )). A [ ] (0..p + ) (.) In the above equaton, = 0 to and =0 refers to frst multplexer, = refers to second multplexer and so on. The output g 0 s from frst multplexer, g s from second multplexer and so on. Block : In ths block templates fetch ther gray level values from s-luts. In hardware, t contans mplementaton of eght smaller Look-Up Tables (s-luts) usng Content

Addressable Memory (CAM) and Read Only Memory (ROM) pars. A combnaton of CAM- ROM s used because each s-lut stores a very small fracton of values out of p possble values, when templates are p-bts wde. The block dagram n Fgure shows the mplementaton of one s-lut. The CAM stores the templates that are assgned to the s-lut and ROM stores the gray level values. The Boolean equatons llustratng the operatons n ths block are shown below: number of entres n a smaller Look Up Table (slut) = number of grey levels =.e. bts x (0..d ) c (0..) CAM (g (0..p )), ROM (x (0..d ) (.) f(0...p + ) g (0...p + ) In the above expressons, = 0 for s-lut number 0, = for s-lut number and so on. vares from 0 to. When the contents of an s-lut are large and cannot ft n a sngle memory module than more one memory modules or CAM-ROMs should be used. Fgure shows one s-lut mplemented usng two CAM-ROM pars. The CAM returns zero ROM address for entres not present n t and as a result of ths output from all ROMs can be OR gated to get one vald result of that s-lut. Block : In ths block gray level values of non-dscarded templates are coped to templates that were dscarded. The approach used s that, a dscarded template s assgned gray level value of template that was not dscarded and has nearest hghest number appended to t n Step of the algorthm. The Boolean expressons representng ntegrated crcut that perform operatons n ths block are shown below: d a f (p) f (p + ) f (p + ),where = 0 to a (0..) a c (0..) + a c (0..) + a c (0..) + a c (0..) + a c (0..) + a c (0..) + a c (0..) + a c (0..) a a + a + a + a + a + a + a + a 9 a (0..) a a (0..) + a b (0..) 0 0 0 9 Gray _ level (0..) a (0..) t 0 0 9 where Gray_level t0 s the gray level value correspondng to template t 0. (.)

b f (p). f ( p+ ). f (p+ ), where = 0 to b (0..) b c (0..) + b c (0..) + b c (0..) + b c (0..) + b c (0..) + b c (0..) + b c (0..) + b c (0..) b b + b + b + b + b + b + b + b 9 b (0..) b b (0..) + b d (0..) 0 0 Gray_level(0..) b (0..) t 0 9 0 9 where Gray_level t s the gray level value correspondng to template t. d f (p). f (p + ). f (p+ ), where = 0to d d d (0..) d + d + d d (0..) + d Gray_ level (0..) d + d + d (0..) + d e (0..) + d + d d (0..) a c (0..) + a c (0..) + a c (0..) + a c (0..) + a c (0..) + a c (0..) + a c (0..) + a 9 0 0 t 0 9 0 9 where Gray_level t s the gray level value correspondng to template t. (.) c (0..) (.9) e f (p). f (p+ ). f (p + ),where = 0to e (0..) e c (0..) + e c (0..) + e c (0..) + e c (0..) + e c (0..) + e c (0..) + e c (0..) + e 0 Gray_ level (0..) e (0..) t where Gray_level t s the gray level value correspondng to template t. c (0..) (.0) The four gray level values: Gray_level t0, Gray_level t, Gray_level t and Gray_level t are stored at correct (row, column) coordnates n the output gray level mage. The algorthm s ppelned n whch each step can work ndependently on dfferent nputs. Fgure : smaller Look-Up Table (s-lut) mplemented n terms of CAM and ROM Fgure : One s-lut mplemented usng two ROMs and CAMs. 9

Hardware Implementaton The computatonal part of the proposed algorthm wth k= and N= s mplemented usng VHDL language on Altera Complex Programmable Devces (CPLD). It consumes two CPLDs and external CAMs and SRAMs are used to store s-luts. Fgure llustrates the system block dagram. The CPLDs used are Altera [9] MAX II and CAM and SRAM are mplemented n Altera APEX FPGA devces but can be replaced wth dscrete devces n future desgns. The CPLD I contans the proposed parallelzaton algorthm and CPLD II contans the pxel compensaton crcut. The assgnment of template numbers to ncomng 9pels s performed partally n both CPLD I & II n order to ft the desgn wthn MAX II pn count and to reduce fttng complexty of CPLD I. Fgure 9: Block dagram of the algorthm mplementaton. In Fgure, CPLD I accepts 9pels from the halftone mage and sent each 9pels accordng to value return by XM functon to ts four outputs out of total eght output ports. The ports from CPLD I are connected to CAMs that are connected to SRAMs. The grey level values from SRAMs go to CPLD II where crcuts for gray level value copyng are present. The CPLD II gves grey level values n the correct sequence,.e., G corresponds to contone value of P and so on. The results of CPLD mplementaton obtaned from Ftter and Tmng analyzer tools present n Altera Quartus II.0 are tabulated n Table I. 0

Table I: Results of CPLD mplementatons Devce Area I/O pns Clock Frequency CPLD I Logc elements: /. MHz EPM0GFI 09/0 CPLD II EPM0GFI Logc elements: /0 /. MHz. Concluson The parallelzaton of LUT nverse halftonng s performed whch has the followng advantages: (a) In place of one pxel, now up-to k pxels can be fetched and nverse halftoned smultaneously, (b) N number of smaller Look-Up Tables (s-luts) are used n place of a sngle LUT, however, total entres n all s-luts reman equal to the entres present n the sngle LUT of seral LUT method, and (c) Fast parallel nverse halftone operaton can be performed on fast hardware nstead of on embedded hardware-software. Acknowledgements The authors acknowledge Kng Fahd Unversty of Petroleum & Mnerals, Dhahran, Saud Araba for all support.

References [] Murat Mese and P. P. Vadyanathan, Recent Advances n Dgtal Halftonng and Inverse Halftonng Method, IEEE Trans. Crcuts and Systems I, June 00. [] Png Wah Wong and Nasr D. Memon, Image Processng for Halftonng, IEEE Sgnal Processng Magazne, vol. 0, July 00. [] Murat Mese and P. P. Vadyanathan, Lookup Table (LUT) Method for Inverse Halftonng, IEEE Transactons on Image Processng, vol. 0, October 00. [] A. N. Netraval and E. G. Bowen, Dsplay of Dthered Images, Proc. SID, vol., pp. -90, 9. [] M. Y. Tng and E. A. Rskn, Error-dffused Image Compresson usng a bnary to gray scale decoder and predctve pruned tree structured vector quantzaton, IEEE Trans. Image Proceddng, vol., pp. -, 99. [] P. C. Chang, C. S. Yu and T. H. Lee, Hybrd LMS-MMSE Inverse Halftonng Technque, IEEE Transactons on Image Processng, vol. 0, January 00. [] Kuo-Lang Chung; Shh-Tung Wu, Inverse Halftonng Algorthm usng Edge-Based Lookup Table Approach, IEEE Trans. Image Processng, Volume, Issue 0, Oct. 00, pp. 9. [] R. Floyd and L. Stenberg, An Adaptve Algorthm for Spatal Grey-scale, Proc. SID, pp. -, 9. [9] http://www.altera.com