Application of Improved Lzc Algorithm in the Discrimination of Photo and Text ChengJing Ye 1, a, Donghai Zeng 2,b

Similar documents
Autonomous System Network Topology Discovery Algorithm Based On OSPF Protocol

An adaptive container code character segmentation algorithm Yajie Zhu1, a, Chenglong Liang2, b

Evaluation of texture features for image segmentation

A Novel Image Super-resolution Reconstruction Algorithm based on Modified Sparse Representation

Fault Diagnosis of Wind Turbine Based on ELMD and FCM

Data Cleaning for Power Quality Monitoring

An Adaptive Threshold LBP Algorithm for Face Recognition

5th International Conference on Information Engineering for Mechanics and Materials (ICIMM 2015)

Pedestrian Detection with Improved LBP and Hog Algorithm

Design of Fault Diagnosis System of FPSO Production Process Based on MSPCA

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. Research on motion tracking and detection of computer vision ABSTRACT KEYWORDS

Adaptive Zoom Distance Measuring System of Camera Based on the Ranging of Binocular Vision

Video Inter-frame Forgery Identification Based on Optical Flow Consistency

A Novel Image Classification Model Based on Contourlet Transform and Dynamic Fuzzy Graph Cuts

The Comparative Study of Machine Learning Algorithms in Text Data Classification*

An Integrated Face Recognition Algorithm Based on Wavelet Subspace

Adaptive Boundary Effect Processing For Empirical Mode Decomposition Using Template Matching

Application Research of Wavelet Fusion Algorithm in Electrical Capacitance Tomography

Rotation Invariant Finger Vein Recognition *

Texture Image Segmentation using FCM

Research on QR Code Image Pre-processing Algorithm under Complex Background

Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach

An Inspection Method of Rice Milling Degree Based on Machine Vision and Gray-Gradient Co-occurrence Matrix

Image and Video Quality Assessment Using Neural Network and SVM

Research on Intrusion Detection Algorithm Based on Multi-Class SVM in Wireless Sensor Networks

Qiqihar University, China *Corresponding author. Keywords: Highway tunnel, Variant monitoring, Circle fit, Digital speckle.

The Population Density of Early Warning System Based On Video Image

Face recognition based on improved BP neural network

Bayesian Spam Detection System Using Hybrid Feature Selection Method

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a

Edge Detection Algorithm Based on the Top-hat Operator Ying-Li WANG a, Shan-Shan MU b

AN IMPROVED K-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION

An Improved Algorithm for Image Fractal Dimension

Network Traffic Classification Based on Deep Learning

Image segmentation based on gray-level spatial correlation maximum between-cluster variance

Structure-adaptive Image Denoising with 3D Collaborative Filtering

Time Series Clustering Ensemble Algorithm Based on Locality Preserving Projection

A Data Classification Algorithm of Internet of Things Based on Neural Network

Click here, type the title of your paper, Capitalize first letter

The Research and Design of the Android-Based Facilities Environment Multifunction Remote Monitoring System*

Prediction of traffic flow based on the EMD and wavelet neural network Teng Feng 1,a,Xiaohong Wang 1,b,Yunlai He 1,c

Temperature Calculation of Pellet Rotary Kiln Based on Texture

Research on the New Image De-Noising Methodology Based on Neural Network and HMM-Hidden Markov Models

Compression of RADARSAT Data with Block Adaptive Wavelets Abstract: 1. Introduction

Image Enhancement Techniques for Fingerprint Identification

Design of student information system based on association algorithm and data mining technology. CaiYan, ChenHua

A New Spatial q-log Domain for Image Watermarking

Traffic Flow Prediction Based on the location of Big Data. Xijun Zhang, Zhanting Yuan

Research on adaptive network theft Trojan detection model Ting Wu

A New Evaluation Method of Node Importance in Directed Weighted Complex Networks

Mobile Camera Based Calculator

Stripe Noise Removal from Remote Sensing Images Based on Stationary Wavelet Transform

Image Denoising Methods Based on Wavelet Transform and Threshold Functions

Iris Recognition Using Curvelet Transform Based on Principal Component Analysis and Linear Discriminant Analysis

Application of partial differential equations in image processing. Xiaoke Cui 1, a *

Research on Full-text Retrieval based on Lucene in Enterprise Content Management System Lixin Xu 1, a, XiaoLin Fu 2, b, Chunhua Zhang 1, c

RETRACTED ARTICLE. Web-Based Data Mining in System Design and Implementation. Open Access. Jianhu Gong 1* and Jianzhi Gong 2

Color Image Segmentation

Research and Realization on the key technology of. Motion Image Matching Based on UAV

Research on Evaluation Method of Video Stabilization

Journal of Chemical and Pharmaceutical Research, 2015, 7(3): Research Article

Critique: Efficient Iris Recognition by Characterizing Key Local Variations

ADVANCED IMAGE PROCESSING METHODS FOR ULTRASONIC NDE RESEARCH C. H. Chen, University of Massachusetts Dartmouth, N.

The Afresh Transform Algorithm Based on Limited Histogram Equalization of Low Frequency DCT Coefficients

A Modular k-nearest Neighbor Classification Method for Massively Parallel Text Categorization

Separation of Surface Roughness Profile from Raw Contour based on Empirical Mode Decomposition Shoubin LIU 1, a*, Hui ZHANG 2, b

Research of Traffic Flow Based on SVM Method. Deng-hong YIN, Jian WANG and Bo LI *

A Hand Gesture Recognition Method Based on Multi-Feature Fusion and Template Matching

An Abnormal Data Detection Method Based on the Temporal-spatial Correlation in Wireless Sensor Networks

IMPROVED FACE RECOGNITION USING ICP TECHNIQUES INCAMERA SURVEILLANCE SYSTEMS. Kirthiga, M.E-Communication system, PREC, Thanjavur

Warship Power System Survivability Evaluation Based on Complex Network Theory Huiying He1,a, Hongjiang Li1, Shaochang Chen1 and Hao Xiong1,2,b

A Real-time Detection for Traffic Surveillance Video Shaking

Geometric Rectification Using Feature Points Supplied by Straight-lines

Mobile Cloud Multimedia Services Using Enhance Blind Online Scheduling Algorithm

Research on friction parameter identification under the influence of vibration and collision

Estimating Noise and Dimensionality in BCI Data Sets: Towards Illiteracy Comprehension

manufacturing process.

Computer Aided Drafting, Design and Manufacturing Volume 26, Number 2, June 2016, Page 18

MOTION STEREO DOUBLE MATCHING RESTRICTION IN 3D MOVEMENT ANALYSIS

A Method and System for Thunder Traffic Online Identification

An Inspection Method of Rice Milling Degree Based on Machine Vision and Gray-gradient Co-occurrence Matrix

An Effective Denoising Method for Images Contaminated with Mixed Noise Based on Adaptive Median Filtering and Wavelet Threshold Denoising

Modeling Body Motion Posture Recognition Using 2D-Skeleton Angle Feature

Diagonal Principal Component Analysis for Face Recognition

Convolution Neural Networks for Chinese Handwriting Recognition

A New Method in Shape Classification Using Stationary Transformed Wavelet Features and Invariant Moments

Performance Degradation Assessment and Fault Diagnosis of Bearing Based on EMD and PCA-SOM

FUZZY C-MEANS ALGORITHM BASED ON PRETREATMENT OF SIMILARITY RELATIONTP

Face Recognition Technology Based On Image Processing Chen Xin, Yajuan Li, Zhimin Tian

Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging

Study on Image Position Algorithm of the PCB Detection

Open Access Reconstruction Technique Based on the Theory of Compressed Sensing Satellite Images

Face Hallucination Based on Eigentransformation Learning

Research on Design and Application of Computer Database Quality Evaluation Model

A Study on Different Challenges in Facial Recognition Methods

Based on correlation coefficient in image matching

Leaf Image Recognition Based on Wavelet and Fractal Dimension

An algorithm of lips secondary positioning and feature extraction based on YCbCr color space SHEN Xian-geng 1, WU Wei 2

Fast Face Recognition Based on 2D Fractional Fourier Transform

Clustering Analysis based on Data Mining Applications Xuedong Fan

Transcription:

2016 International Conference on Information Engineering and Communications Technology (IECT 2016) ISBN: 978-1-60595-375-5 Application of Improved Lzc Algorithm in the Discrimination of Photo and Text ChengJing Ye 1, a, Donghai Zeng 2,b 1 Guangdong Vocational Institute of Science and Technology, Guangzhou 510640, Guangdong, China 2 Guangdong Vocational Institute of Science and Technology, Guangzhou 510640, Guangdong, China a yecj2000@163.com, b zdhwjy2000@163.com Keywords: Lempel-Ziv complexity Multi-scale LZC Photo and text Abstract. For the commonly used image two value method, there is a lack of precision and accuracy of the processing image containing noise or uneven illumination and so on, A new method for two valued difference image of complexity at different scales. Divide the signal sequence into several areas and do multi-scale binaryzation process to the sequence. Whith no increasing the number of symbols, get more meticulous and more accuracy complexity by the traditional LZC algorithm. The experiments show that the new method has advantages of certain anti-noise abilities, can be regarded as the feature of photo and text. Therefore, We can apply the improved LZC algorithm to the distinction between photo and text. Introduction It is not an easy thing to extract effective information from nonlinear complex signals. Therefore, we have to develop various methods to extract characteristic quantity of signals, of which the LZC algorithm developed by Lempel and Ziv is a simple and fast algorithm to measure the complexity of signals. Today the LZC algorithm has been applied in various aspects to extract the features of complex signals. To automatically find the content we need from a lot of files and images is an example of extracting features of complex signals. And of course the LZC algorithm can be applied in this aspect. However, the coarse graining method of the LZC algorithm has flaws, and if it is directly used, it may cause serious consequences. In this article, the author will improve the LZC algorithm with respect to the flaws and apply the improved LZC algorithm to distinguish photo and text. Lempel-Ziv Complexity Algorithm In 1976, Lempel and Ziv proposed a complexity algorithm to measure the increase of sequence and the increasing speed of new models with time. We call this complexity Lempel-Ziv Complexity. Its algorithm is as follows: (1) Reconstruction of time sequence: first, obtain the average value of the subject sequence; then reconstruct a symbol sequence with the subject sequence. Let the x which is bigger than the average value be 1, and the x smaller than the average value be 0. (2) Behind an existing substring in the ( 0, 1 ) sequence, add a character Q or a character string Q, and we obtain a new symbol sequence S. Let Sv be the sequence after removing the last character of the character string S. Then check whether Q belongs to the substring that already shows in Sv. If it does, adding Q calls copy. Then we can extend Q, which is to increase k, and then repeat the steps above until Q no longer shows in Sv. If Q does not show, it is called insert ; when it is insert, place a. behind Q; then consider all characters before. a target sequence, and repeat the steps above until the sequence ends. (3) After (2) is completed, we can obtain a character string sectioned by.. The total number of the sections is defined as complexity c(n);

(4) According to Lempel and Ziv s study, the c (n) of almost all s { 0, 1 } tends to a constant value: n lim c(n)=b(n)= (1) n log 2n Therefore b(n) is an asymptotic behavior of the sequence, which can be used to normalize c(n), becoming relative complexity measure : c( n) Clz = 0 Clz 1 (2) b( n) From the description above we can see that: the LemPel-Ziv Complexity (LZC) algorithm can reflect the degree of disorder of sequences. Therefore, the dynamics-based LZC algorithm is often used as a measurement. For example, (1) the LZC algorithm can be used to calculate the EEG (electroencephalogram) signals under different states and then distinguish different states according to relative difference, and based on that, we may conduct further studies on some mechanisms of the brain; (2) the LZC algorithm can be used to distinguish text and image, which will be described in detail below. Improvement of the LZC Algorithm According to its introduction, the features of the LZC algorithm are apparent. Merits: (1) the data needed to calculate Lempel-Ziv complexity are relatively short and the quantities of calculation are not large; and (2) without machine learning theory, relevant data features and laws can be extracted. However, the first step of calculating Lempel-Ziv complexity is to reconstruct the original time sequence into (0,1) sequence. With such coarse graining processing method, it will cause loss of information and even change the dynamic characteristics of original signals completely. Therefore, the author has improved the LZC algorithm. The LZC algorithm can be improved from two aspects: (1) preprocessing of signals; and (2) improvement of the coarse graining method. This article mainly focuses on the research of the improvement of the coarse graining method. With respect to the preprocessing of signals, use the wavelet decomposition algorithm to decompose the low frequency of signals, or use the wavelet packet decomposition algorithm to decompose the high frequency of signals, to filter singular value and remove noise. As the application of the wavelet decomposition algorithm and wavelet packet decomposition algorithm is mature, they will not be discussed further in the article. Traditional Coarse Graining Method The premise and key to the calculation of Lempel-Ziv complexity is to coarse-grain original signals, while the traditional coarse graining method of original signals is binaryzation. Suppose the known time sequence is {x(i) i=1,2,...,n}. Reconstruct x(i) using the binaryzation process. Set n x ave =( x( i) )/n (3) i 1 where is the average value of the original sequence x(i). Let {S(i) i=1,2,...,n} denote the signal sequence after x(i) is reconstructed. When the element x(j) in the original sequence is <, assign the symbol 0 to S(j); otherwise, assign 1 to S(j). When j is 1,2,...,n, a symbol sequence S(i) composed of 0 and 1 is thus established. Multi-scale Binaryzation Process The multi-scale binaryzation process is an improved method of the traditional coarse graining process. On the basis of the traditional coarse graining process, a more reliable sequence can be obtained with the multi-scale process. The multi-scale binaryzation process is described as follows: (1) Divide original signals into several sections, which is equivalent to a coarse graining process. The degree of the coarse graining is determined by the number of the sections. The more the number of the

sections is, the lower the degree of the coarse graining process is. In this article, original signals are equally divided into four sections, that is, calculate the average value of the whole signal sequence first, then divide it into two sections with the average value as the dividing line; and then obtain the average values of the two sections respectively and divide the two sections into to another two sections based on their respective average value. Finally the signals are divided into four sections. (2) If the value of the first point of the signal sequence is larger than the average value, indicate the point as 1; otherwise indicate it as 0. The value setting method of the first point is the same as that in the traditional binaryzation process. (3) With respect to the second point of the signal sequence and the points behind, the binaryzation result is determined by comparing with the previous point. If it increases to another section, the indicate the binaryzation value of the point as 1; if it decreases to another section, indicate the binaryzation value of the point as 0; it is in the same section, the binaryzation value of the point is the same as that of the previous point. 1 1 1 1 0 0 0 0 1 1 0 1 0 1 1 0 (a) (b) Figure 1. Comparison between two binaryzation methods. Take Figure 1 for example. Figure (a), The sequence is divided into two sections. The sequence (11110000) is obtained through reconstruction with the traditional binaryzation method, Figure (b), The sequence is divided into four sections. The sequence (11010110) is reconstructed with the multi-scale binaryzation method. According to the traditional binaryzation method, all of the four points A, B, C and D are larger than the average value, so they are indicated as 1; while E, F, G and H are smaller than the average value, so they are indicated as 0. According to the multi-scale binaryzation method, A for the first point, the value is larger than the average, so it is indicated as 1; B increases to another section, so it is indicated as 1; C decreases to another section, so it is indicated as 0; and so forth. The traditional coarse graining method can only show whether the signal sequence is larger or smaller than the average value, while the multi-scale coarse graining method can not only show the change of the signal sequence more accurately with 0 and 1, but also indicate a certain scale of rising trend and extension of the signal sequence with 1 and a certain scale of declining trend and extension of the signal sequence with 0. By comparison we can see that the multi-scale coarse graining method describes a changing process in a certain extent and scope, and it is more elaborate and logic than the traditional coarse graining method. Discrimination Photo and Text with the LZC Algorithm The LZC algorithm is used to analyze one dimensional sequence, while photo and text are two dimensional graphics. They seem to have no relation. However, photo and text can be scanned into one dimensional sequence by rows or lines, and then the LZC algorithm can be used to analyze them. Therefore, LZC complexity can also be used in two dimensional graphics. Based on past experiences, when calculating LZC complexity, the sequence with a length of 2000 is more suitable. Therefore, we have chosen the 50 50 square area in image and changed the image area into one dimensional sequence to analyze the difference in complexity of photo and text under difference scales. The experiment results show that the complexity of image and text increases with the decrease of the

computation scale. However, the increase of the complexity of photo is larger than that of text. Based on that difference, we can distinguish photo and text. The specific method of using the LZC algorithm to distinguish photo and text is as follows: calculate the complexity of photo under four scales S1, S2, S3 and S4 (from large to small) and obtain LZ1, LZ2, LZ3 and LZ4; obtain the slope K that indicates the change of complexity with the change of scale using the least square method, and distinguish text and photo based on the difference in k. In specific calculation, regarding the photo with a relatively small size, choose 50 50 image area only; regarding the photo with a relatively large size, choose several areas according to the photo size and take the average value of the k values of the several areas as the feature of the image. In the article, 50 photo and 50 texts have been used to do the experiment. Of the images, some were downloaded from the Internet and some were taken by digital camera, and the smallest image size is 50 50 while the largest is 874 585. All of the photo and texts are 8-digit grayscale images converted by scanning or being taken pictures of by digital camera, As shown in Figure 2, part of the experimental sample map. The red square area in the graph is a random selection of image complexity calculation area. Figure a Figure b Figure c Figure 2. photo and texts. Figure d

Table 1 shows the number of k values of the texts and photo in different sections. Table 1. k value of the photo and texts. photo texts [0,0.02) 0 3 [0.02,0.04) 0 7 [0.04,0.06) 1 1 [0.06,0.08) 0 1 [0.08,0.10) 2 0 [0.10,0.12) 4 1 [0.12,0.14) 6 0 [0.14,0.16) 6 0 [0.16,0.18) 7 0 [0.18,0.20) 6 0 [0.20,0.22) 4 0 The experiment results show that the k value of 98% (49) of the photo is larger than or equal to 0.08, and that of 86% (43) is larger than or equal to 0.12. The k value of 96% (48) of the texts is smaller than 0.08, and that of 88% (44) is smaller than 0.06. From the experiment results we can see that the method proposed in this article can be used to distinguish photo and text, and has the following characteristics: For text, if the picture is serious noise, the computational complexity will be affected, so before the computational complexity of image denoising the appropriate. For photo, if the photo in the background gray value is single, will also affect the computational complexity, as the scale decreases and little change. Summary Traditional LZC algorithm is a classic image of the two values of the algorithm, But it is the problem of the fine degree difference and the accuracy of the input image with noise or uneven illumination. So this article is aimed at these problems, the author improved it using the multi-scale binaryzation algorithm to make it more elaborate and logic; then the author discrimination photo and text with the improved LZC algorithm. The results show that the accuracy of the discrimination is very high (over 95%), There is no need to increase the number of symbols when using the method. Moreover, the method has higher accuracy and can be used to distinguish photo and text and expands the application scope of the LZC algorithm. Later we will conduct further studies on the following aspects: (1) how to use the method to discrimination image and photo, and image and text; (2) take the complexity method under different scales as a feature, combine the feature with other features of image, use existing classification tools to classify the image, compensate for the deficiencies of different classification features, improve classification accuracy and enhance its practicability. References [1] J. van der Geer, J.A.J. Hanraads, R.A. Lupton, The art of writing a scientific article, J. Sci. Commun. 163 (2000) 51-59. [1] Zhang Lianyi, Effect of coarse graining coding on Lempel-Ziv complexity, J. Journal of Shanghai Dianji University. 16 (2013) 193-197. [2] Hu Haiqing, Tan Jianlong, Zhu Yatao, etc. Application of improved SIFT algorithm in text and image matching, J. Computer Engineering. 39 (2013) 239-243.

[3] Luo Zhizeng, Cao Ming, Analysis of features of motor imagery electroencephalogram based on multi-scale Lempel-Ziv complexity, J. Chinese Journal of Sensors and Actuators. 24 (2011) 1033-1037. [4] Zhang DianZhong, Research on the correlation between the mutual information and Lempel-Ziv complexity of nonlinear time series, J. Acta Phys Sin. 56 (2007) 3152-3157. [5] Jin Ningde, Dong Fang, Zhou Shu, The complexity measure analysis of conductance fluctuation signals of gas-liquid two-phase flow and its flow pattern characterization, J. Acta Phys Sin. 56 (2007) 720-729. [6] Zhang Dianzhong, Analysis and improvement of the coarse graining method in Lempel-Ziv complexity algorithm, J. Chinese Journal of Computational Physics. 25 (2008) 499-504. [7] He Weixing, Chen Xiaoping, Analysis and improvement of the binaryzation method in Lempel-Ziv complexity algorithm, J. Journal of Jiangsu University (Natural Science Edition). 25 (2004) 261-264. [8] Song Haohao, Wang Xin, An improved LZC image coding algorithm, J. Journal of Image and Graphics. 9 (2004) 460-464. [9] Wang. J.P, Stochastic relaxation on partitions with connected components and its application to age segmentation, IE EE-PAMI. 20 (1998) 619-636. [10] Gupta L., Sortrakul. T.A, Gaussian-mixture-based image segmentation algorithm. Pattern Recognition. 31(1998) 315-326.