DSP-Based Parallel Processing Model of Image Rotation

Similar documents
Speech Recognition Based on Efficient DTW Algorithm and Its DSP Implementation

Design and Development of a High Speed Sorting System Based on Machine Vision Guiding

Data Processing System to Network Supported Collaborative Design

Contents Systems of Linear Equations and Determinants

Implementation of a backprojection algorithm on CELL

Research on Sine Dynamic Torque Measuring System

Design of three-dimensional photoelectric stylus micro-displacement measuring system

Research on software development platform based on SSH framework structure

Application of Two-dimensional Periodic Cellular Automata in Image Processing

The GPU-based Parallel Calculation of Gravity and Magnetic Anomalies for 3D Arbitrary Bodies

Optimizing industry robot for maximum speed with high accuracy

Available online at ScienceDirect. Procedia Engineering 111 (2015 )

View-dependent fast real-time generating algorithm for large-scale terrain

Available online at ScienceDirect. Procedia Engineering 99 (2015 )

Available online at ScienceDirect. Energy Procedia 69 (2015 )

An algorithm of lips secondary positioning and feature extraction based on YCbCr color space SHEN Xian-geng 1, WU Wei 2

The IIC interface based on ATmega8 realizes the applications of PS/2 keyboard/mouse in the system

Skew Detection and Correction of Document Image using Hough Transform Method

Hybrid ant colony optimization algorithm for two echelon vehicle routing problem

Design of New Oscillograph based on FPGA

Computer Graphics. Chapter 1 (Related to Introduction to Computer Graphics Using Java 2D and 3D)

[10] Industrial DataMatrix barcodes recognition with a random tilt and rotating the camera

Out-of-Plane Rotated Object Detection using Patch Feature based Classifier

A Hand Gesture Recognition Method Based on Multi-Feature Fusion and Template Matching

Available online at Procedia Engineering 7 (2010) Procedia Engineering 00 (2010)

Design of three-dimensional reconstruction system for farm production

Available online at ScienceDirect. Procedia Computer Science 89 (2016 )

Implementation and Optimization of LZW Compression Algorithm Based on Bridge Vibration Data

A Rapid Automatic Image Registration Method Based on Improved SIFT

Implementation Of Harris Corner Matching Based On FPGA

2D rendering takes a photo of the 2D scene with a virtual camera that selects an axis aligned rectangle from the scene. The photograph is placed into

Study on Image Position Algorithm of the PCB Detection

An Efficient AC Algorithm with GPU

Chapter 18. Geometric Operations

Cache Performance Research for Embedded Processors

Design and Simulation Based on Pro/E for a Hydraulic Lift Platform in Scissors Type

Citation for the original published paper (version of record):

FAST REGISTRATION OF TERRESTRIAL LIDAR POINT CLOUD AND SEQUENCE IMAGES

Research on monitoring technology of Iu-PS interface in WCDMA network

Semi-Passive Solar Tracking Concentrator

EE 584 MACHINE VISION

Available online at ScienceDirect. Procedia Engineering 161 (2016 ) Bohdan Stawiski a, *, Tomasz Kania a

Comparative Study of ROI Extraction of Palmprint

Parallel 3D Images Surface Texture Editing

Memory Performance Characterization of SPEC CPU2006 Benchmarks Using TSIM1

A parallel algorithm for generating chain code of objects in binary images

ScienceDirect. The use of Optical Methods for Leak Testing Dampers

Available online at Procedia Engineering 29 (2012) 69 73

Research of Fault Diagnosis in Flight Control System Based on Fuzzy Neural Network

Investigation of Algorithms for Calculating Target Region Area

An Improved PageRank Method based on Genetic Algorithm for Web Search

Application of Geometry Rectification to Deformed Characters Recognition Liqun Wang1, a * and Honghui Fan2

Computer Vision. Coordinates. Prof. Flávio Cardeal DECOM / CEFET- MG.

Lecture 17: Recursive Ray Tracing. Where is the way where light dwelleth? Job 38:19

Numerical Recognition in the Verification Process of Mechanical and Electronic Coal Mine Anemometer

The Design of the Embedded WEB Server Based on ENC28J60

Research on Embedded CNC Device Based on ARM and FPGA

Pharmacy college.. Assist.Prof. Dr. Abdullah A. Abdullah

An effective algorithm for mining sequential generators

Research on the Application of Digital Images Based on the Computer Graphics. Jing Li 1, Bin Hu 2

ScienceDirect. Forming of ellipse heads of large-scale austenitic stainless steel pressure vessel

An Extended Byte Carry Labeling Scheme for Dynamic XML Data

The Analysis and Detection of Double JPEG2000 Compression Based on Statistical Characterization of DWT Coefficients

A Fast Algorithm of Neighborhood Coding and Operations in Neighborhood Coding Image. Synopsis. 1. Introduction

Achieve Significant Throughput Gains in Wireless Networks with Large Delay-Bandwidth Product

Stripe Noise Removal from Remote Sensing Images Based on Stationary Wavelet Transform

Performance Study of GPUs in Real-Time Trigger Applications for HEP Experiments

A Fast Video Illumination Enhancement Method Based on Simplified VEC Model

A 3D Point Cloud Registration Algorithm based on Feature Points

Available online at ScienceDirect. Procedia Computer Science 45 (2015 )

Available online at ScienceDirect. Procedia Computer Science 59 (2015 )

Pupil Localization Algorithm based on Hough Transform and Harris Corner Detection

Interpolation is a basic tool used extensively in tasks such as zooming, shrinking, rotating, and geometric corrections.

Forward interpolation on the GPU

Robust Control of Bipedal Humanoid (TPinokio)

An 8-Bit Scientific Calculator Based Intel 8086 Virtual Machine Emulator

Available online at ScienceDirect. Procedia Engineering 90 (2014 )

FSRM Feedback Algorithm based on Learning Theory

Localized Topology Control for Minimizing Interference under Physical Model

High Capacity Reversible Watermarking Scheme for 2D Vector Maps

Edge-directed Image Interpolation Using Color Gradient Information

A Novel Optimization Method of Optical Network Planning. Wu CHEN 1, a

Keywords - DWT, Lifting Scheme, DWT Processor.

Implementation of Step Motor Control under Embedded Linux Based on S3C2440

Digital Image Fundamentals II

Code Transformation of DF-Expression between Bintree and Quadtree

Numerical simulation of 3-D seepage field in tailing pond and its practical application

THE STUDY FOR MATCHING ALGORITHMS AND MATCHING TACTICS ABOUT AREA VECTOR DATA BASED ON SPATIAL DIRECTIONAL SIMILARITY

A Partition Method for Graph Isomorphism

An Adaptive Threshold LBP Algorithm for Face Recognition

Application of Log-polar Coordinate Transform in Image Processing

An Overview of Optical Label Switching Technology

Parallel-computing approach for FFT implementation on digital signal processor (DSP)

Face Hallucination Based on Eigentransformation Learning

Cell Clustering Using Shape and Cell Context. Descriptor

Research on Technologies in Smart Substation

Unknown Malicious Code Detection Based on Bayesian

A Sub-pixel Disparity Refinement Algorithm Based on Lagrange Interpolation

A Comparison of Color Models for Color Face Segmentation

Implementation and Improvement of DSR in Ipv6

Transcription:

Available online at www.sciencedirect.com Procedia Engineering 5 (20) 2222 2228 Advanced in Control Engineering and Information Science DSP-Based Parallel Processing Model of Image Rotation ZHANG Shuang,2a,2b, JIN Gang 2a,2b,QIN Yu-ping,3 a.the engineering & technical college of chengdu university of technology, leshan, 64000,china; 2.a. The Institute of Optics and Electronics; b. The Key Laboratory of Beam Control, Chinese Academy of Sciences, Chengdu,60209, China; 3.College of Mathematics &Software Science, Sichuan Normal University, Chengdu,60068,china; Abstract In the DSP real-time image processing, large amounts of data exchange and memory operation limit efficiency of the program implementation. In order to improve the efficiency of DSP, a model of blocking and parallel processing is proposed in this paper; firstly the large binary image is divided into blocks according to the memory size of DSP, the block diagram is rotated by DSP in accordance with the mode of readout and processing and storing, and finally they are put together. So the data processing amount in DSP is not only reduced, the efficiency of the implementation process is but also improved. 20 Published by Elsevier Ltd. Open access under CC BY-NC-ND license. Selection and/or peer-review under responsibility of [CEIS 20] Keywords: DSP; image block; image rotation; parallel processing mode. Introduction The image orientation is widely used in digital image processing. With the development of technology, the application level is continually improved in the image processing, ile higher efficiency, higher accuracy requirements of image rotation are put forward. For example, in the aviation field, the display processing of the highly efficient and accurate digital map need the embedded platforms with limited resources to achieve real-time rotation of the large-format map image because the existing * Corresponding author. Tel.: +354386694. E-mail address: zhangshuanghua@26.com. 877-7058 20 Published by Elsevier Ltd. doi:0.06/j.proeng.20.08.46 Open access under CC BY-NC-ND license.

ZHANG Shuang et al. / Procedia Engineering 5 (20) 2222 2228 2223 graphics chips has no the image rotation function. Using DSP platform is a way to achieve the specific implementations, ile two aspects must be carefully considered, one is a small selection of the rotation of computational algorithms, and second, give full play to the powerful DSP platform, parallel computing capacity. However, the traditional rotation algorithm is rotating the ole large image, so it is relatively difficult to achieve image rotation because of the lesser data memory in DSP. Likewise is proposed in some algorithms, although the problem of an amount of data is solved, it also the processing speed of DSP is greatly reduced. Therefore, an image rotation model with DSP-based blocking and parallel processing is presented to improve greatly image processing efficiency and accuracy in this paper. 2. Traditional image rotation transformation algorithm In the Cartesian coordinate system, a point is rotated clockwiseα degree, and the effect after rotation is shown in Figure, γ is the distance from the point to the origin. The length of γ remains the same during rotation, and β is the separation angle formed between the link line from the point and the origin and X Axis. Fig. Rotation transformation formula: Before rotation: x0 = r cos( β ) y0 = r sin( β ) After rotation: x = r cos( β a) y = r sin( β a) Because cos( β a) = cos( β)cos( a) + sin( β)sin( a) sin( β a) = sin( β)cos( a) cos( β)sin( a) So simplified formula is x = x0 cos( a) + y0 sin( a) y = x0 sin( a) + y0 cos( a) The expression above is expressed as matrix expression:

2224 ZHANG Shuang et al. / Procedia Engineering 5 (20) 2222 2228 The simplified formula is: x cos( a) sin( a) 0 x 0 y = sin( a) cos( a) 0 y0 0 0 x cos( a) sin( a) x0 = y sin( a) cos( a) y0 Fig.2 image example with zoom and rotation Fig.3 image example without zoom and rotation Here, we must transform all the points of the rotated image into the original image buffer points, and thus an in crease in processing time is expected. ( x, y+ ) ( xy+, ) ( x+, y+ ) ( x, y) ( x, y ) ( x +, y) ( x, y ) ( xy, ) ( x+, y ) Fig.4 neighboring 8-connected pixels However, for removing such holes, Cheng et al. (990)first choose the two neighboring 8-connected pixels located at the upper side for a reference black pixel, and calculate the Euclidean distances between the pairs of the rotated positions. They found that the distance between the new points can be 0,, 2,2 and 5 and that en the distance is 2 or 5, holes can be generated. They propose that the value of midpoint be filled with to eliminate holes. However this involves not only calculation of coordinate mapping but also calculation of distance between the 8-connected pixels and thus quite long processing time is inevitable. Danielsson and Hammerin (992) [4] use a 3-pass algorithm, in ich rotation is accomplished by shifting an image three times using the equation below: a a x tan( ) 0 tan( ) 0 2 x 2 = y sin( a) y0 0 0 This method successfully removes the holes but is generally slower than the general method. Thus inverse mapping method, Cheng s method and the 3-pass algorithm could eliminate holes, but each algorithm involves more than one transform of points of black pixels. Thus, adopting these methods requires long calculation time. 3. DSP parallel rotation algorithm Through analysis of the traditional algorithm, it is known that the traditional method is based on the rotation of the ole image. However, through practice it is found that in the image rotation process implemented by DSP, because the data of the ole image is too large, the processing is particularly slow or even impossible. Therefore, a parallel algorithm for block processing is presented in this paper.

ZHANG Shuang et al. / Procedia Engineering 5 (20) 2222 2228 2225 3.. Algorithm for Image division The large binary image processing is involved in engineering applications. Because the storage space and memory occupied by the binary images are great, and sometimes even more than the hardware's storage capacity, it is necessary to break the large image up into blocks in order to implement image rotation function better in the DSP. Fig.5 Former image Fig. 6 Divided into 2 2 image Sub-block division and refinement The large image is logically divided into sub-block of the same size required to meet memory space of the DSP, it is shown in Figure 7, and sub-block elements without the source pixel are set to 0: F ( x + i w, y + j h) 0 x + i w w, 0 y + i h h C = 0 other Each piece of the image is refined by the above algorithm. In the actual project, refinement order of the sub-block can be designed for DSP applications according to actual demand. For example, in the interactive application of vector, the foreground displays the current refined sub-blocks and interact operation, ile in the background sub-block refinement is finished. The usable order is C,( C C C ), L 2 22 2 ich is shown in Figure 7., C C... 2 C2 C... 22 C...Cn C... n2 Fig. 7 Block of image Each sub-block C is refined in the same array space, and before refinement the corresponding image data of each sub-block is required to be assigned to an array. After refining the results will be saved to the data buffer zone for refinement. As detailed data before and after refinement were stored by bits, therefore, take the image data and refinement results are to be identified to write bytes and bits. Bytes and bits need to be ascertained en extracting image data and writing refinement results. 3.2. Piece-Point Algorithm for Image Rotation G 3.2.. Input image and rotated sub-images: and R Input image G consists of h w pixels. Each pixel ( x, y ) is associated with an intensity level. The set of all those pixels is denoted by G and its bounding rectangle by G. C m 2m C nm

2226 ZHANG Shuang et al. / Procedia Engineering 5 (20) 2222 2228 Rotated sub-image R consists of h W pixel, ich form a set R of pixel. Intensity levels at each pixel ( X, Y ) is calculated by interpolation using intensity levels in the neighborhood. 3.2.2. Output image and location function An interpolation value calculated at a pixel ( X, Y) R in the rotated sub-image is stored at some pixel sxy (, ) R in the original input image. the function ( ) s determining the location is referred to as location function. A simple function sxy (, ) = ( XY, ) ich maps a pixel ( X, Y ) in pixel ( X, Y ) in G R to a.we could use different location functions, but this simple function seems best for row-major and column-major raster scans. So, we implicitly fix the function. Then, an output image after correcting rotation is a range of the function. It is rather easy to move the output image to the center position of the original rectangle G in an in-place manner. 3.2.3. Correspondence between two coordinate systems Let ( 0, 0) pixel( X, Y ) in x y be the xy coordinates of the lower left corner of the rotated document. Now, a R is a point (, ) x y in the rectangle G with x = x0 + X cos( a) Ysin( a) y = y0 + X cos( a) + Ysin( a) The corresponding point ( x, y ) defined above is denoted by p( XY, ). 3.2.4. Scan order σ ( X, Y ) Let σ be a scanning over the pixels in R, It is a mapping from R to a set of integers {0,, L }, that is, σ ( X, Y) = i means that the pixel ( X, Y ) is scanned in the i th order. If σ is a row-major raster scan, σ ( X, Y) = X + Y W ere X = 0, L, W and Y = 0, L, H. A column-major raster order is symmetrically characterized by σ ( X, Y) = Y + X H 4. DSP-DM642 memory analysis on data access To improve the data processing capability of DSP-DM4, several ways can usually be used as follows: first, making full use of the processor's internal cache, and putting the data accessed frequently into the cache; second, DMA operations are applied so that the processor and memory work in parallel; third, memory package instructions and optimization procedures are put into use. In specific applications, three methods are usually used in combination, nevertheless the blocking partition and parallel processing is laid emphasis upon in this paper. 4.. DSP-DM642-chip storage structures DSP-DM642-chip has a two-level CACHE structure, and its mechanism is transparent. The first level includes the program CACHE and data CACHE ich are independent each other and their memory size are 6kB, known as the LP and LD. The second level I 2 includes a common cache for program and data, and its capacity is 256kB, nevertheless a part of the cache can be divided as common

ZHANG Shuang et al. / Procedia Engineering 5 (20) 2222 2228 2227 storage, known as L2SRAM. In addition to on-chip the other is external store ich can be expanded greatly to meet the requirement. 4.2. DSP-DM642 of DMA In the DSP, DMA controller is a peripheral ich CPU can access, and other integrated (on-chip) serial ports, host interface, memory interface chip lie on peripheral bus in the system. DMA is background data transfer and can work independent of the CPU and transfer data with the speed of CPU clock speed. CSI Produced by TI (Chip Support Library) supports well the use of DMA. The dedicated DMA modules for each of the DMA help to control registers and decrease the vast operation on registers for the developers, besides, there is a DAT module. 4.3. DSP-DM642 image rotation parallel processing mode To improve the data processing capabilities of DSP-DM642, the large image need to be divided into blocks according to memory demand, the process is shown in Fig.8. To save memory space and reduce CPU load, the images after block partition are saved out of DSP chip and are processed one by one en real-time rotation. Fig. 8 Block partition of large image by memory DSP-DM642 image rotation based on parallel processing mode During data processing, in advance the large binary images have been divided into n m according to memory processing requirement of DSP-DM642 and the divided images have been stored in the storage area out of the chip, and furthermore the DSP-DM642 processes the data with the "reading + processing + storage " Parallel mode. Because we have a substantial pre-binary image processing by the memory requirements of DSP- DM642 divided, will be divided into good image data storage area on the chip, and DSP-DM642 in accordance with the "reading + processing + storage " Parallel mode of processing the data. Compared to single-line data processing the processing capacity is greatly improved and time is greatly reduced. In Single-line data processing time:set the time of reading a block diagram as ; the time of processing a pro Out block diagram ast ; the time of storing a block diagram as the time of processing a large binary image is: n m In pro Out T = ( t + t + t ) overall i= j= t t ; i n, j m and i, j Z Usually, during blocking the image, each sub-block is divided into the portion with the same data amount as possible in order to ensure the maximum utilization of DSP, and therefore the time function on the parallel processing data can be obtained.. So

2228 ZHANG Shuang et al. / Procedia Engineering 5 (20) 2222 2228 In Time parallel processing of data:set the time of reading a block diagram as t ; the time of processing a pro block diagram ast ; the time of storing a block diagram as Out the time of processing a large binary image is: n m In pro Out T = ( t + t + t ) overall 3 i= j= t ; i n, j m and i, j Z For example, a binary image is divided into 2 2, and A, B, C, D to represent each sub-block respectively, then the process of parallel processing are shown in Tab. and Fig.9. Read in data Process data Output data Cyc A Cyc2 B A Cyc3 C B A Cyc4 D C B Cyc5 D C Cyc6 D Tab. The process of sub-block 2 2parallel processing. So 5. Conclusion Fig.9 Parallel processing of data model Through experiments it was found that the parallel processing doubles its efficiency during processing the large binary image and the processing quality is better than the ole single-line processing. However, because the model of data processing also involves image segmentation, it is inevitable that there are some gaps in the image stitching after image blocking. Therefore, in order to get better results, the algorithms on the non-destructive image segmentation and seamless stitching must be achieved. In general, the processing model can meet the requirement of astronavigation and remote sensing on the ole. References [] WANG Si-guo, WANG Yan. An Improved A Algorithm for Image Revolving[J].TRANSACTIONS OF SHENYANG LIGONG UNIVERSITY, vol.28 No.5:pp22-25,2009 [2]Cheng, H.D., Tang, Y.Y., Suen, C.Y.. Parallel image transformation and its VLSI implementation[j]. Pattern Recognition, vol. 23 NO.0:pp3 29.990