An Efficient RFID Data Cleaning Method Based on Wavelet Density Estimation

Similar documents
Towards New Heterogeneous Data Stream Clustering based on Density

Evolution-Based Clustering Technique for Data Streams with Uncertainty

A Network Intrusion Detection System Architecture Based on Snort and. Computational Intelligence

The Research of Delay Characteristics in CAN Bus Networked Control System

Research on QR Code Image Pre-processing Algorithm under Complex Background

An Adaptive Threshold LBP Algorithm for Face Recognition

DENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE

Density-Based Clustering Based on Probability Distribution for Uncertain Data

Metric and Identification of Spatial Objects Based on Data Fields

Open Access Self-Growing RBF Neural Network Approach for Semantic Image Retrieval

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA

The Elimination Eyelash Iris Recognition Based on Local Median Frequency Gabor Filters

Stripe Noise Removal from Remote Sensing Images Based on Stationary Wavelet Transform

An Improved Frequent Pattern-growth Algorithm Based on Decomposition of the Transaction Database

Design of student information system based on association algorithm and data mining technology. CaiYan, ChenHua

Mining Frequent Itemsets for data streams over Weighted Sliding Windows

Role of big data in classification and novel class detection in data streams

Temporal Weighted Association Rule Mining for Classification

The Comparative Study of Machine Learning Algorithms in Text Data Classification*

Research on Community Structure in Bus Transport Networks

Open Access Research on the Data Pre-Processing in the Network Abnormal Intrusion Detection

A Novel Field-source Reverse Transform for Image Structure Representation and Analysis

Generating Optimized Decision Tree Based on Discrete Wavelet Transform Kiran Kumar Reddi* 1 Ali Mirza Mahmood 2 K.

DOI:: /ijarcsse/V7I1/0111

A Patent Retrieval Method Using a Hierarchy of Clusters at TUT

Video Inter-frame Forgery Identification Based on Optical Flow Consistency

K-means based data stream clustering algorithm extended with no. of cluster estimation method

Structure-adaptive Image Denoising with 3D Collaborative Filtering

An Improved Algorithm for Reducing False and Duplicate Readings in RFID Data Stream Based on an Adaptive Data Cleaning Scheme

An Efficient Algorithm for finding high utility itemsets from online sell

A Multipath AODV Reliable Data Transmission Routing Algorithm Based on LQI

City, University of London Institutional Repository

An Enhanced Density Clustering Algorithm for Datasets with Complex Structures

ZigBee Routing Algorithm Based on Energy Optimization

Research and Improvement of Apriori Algorithm Based on Hadoop

Image Classification Using Wavelet Coefficients in Low-pass Bands

Density Based Clustering using Modified PSO based Neighbor Selection

Spatial Outlier Detection

Image Enhancement Techniques for Fingerprint Identification

Managing Uncertainty in Data Streams. Aleka Seliniotaki Project Presentation HY561 Heraklion, 22/05/2013

Anomaly Detection on Data Streams with High Dimensional Data Environment

Mining User - Aware Rare Sequential Topic Pattern in Document Streams

Research Article. Three-dimensional modeling of simulation scene in campus navigation system

A New Distance Independent Localization Algorithm in Wireless Sensor Network

A Novel Image Classification Model Based on Contourlet Transform and Dynamic Fuzzy Graph Cuts

RoboCup 2014 Rescue Simulation League Team Description. <CSU_Yunlu (China)>

A Study on Reverse Top-K Queries Using Monochromatic and Bichromatic Methods

High Capacity Reversible Watermarking Scheme for 2D Vector Maps

Shape Optimization Design of Gravity Buttress of Arch Dam Based on Asynchronous Particle Swarm Optimization Method. Lei Xu

Discovering Advertisement Links by Using URL Text

American International Journal of Research in Science, Technology, Engineering & Mathematics

arxiv: v1 [cs.cv] 6 Jun 2017

RETRACTED ARTICLE. Web-Based Data Mining in System Design and Implementation. Open Access. Jianhu Gong 1* and Jianzhi Gong 2

A Novel Image Transform Based on Potential field Source Reverse for Image Analysis

A Survey on DBSCAN Algorithm To Detect Cluster With Varied Density.

A Highly Accurate Method for Managing Missing Reads in RFID Enabled Asset Tracking

An Abnormal Data Detection Method Based on the Temporal-spatial Correlation in Wireless Sensor Networks

EFFICIENT ADAPTIVE PREPROCESSING WITH DIMENSIONALITY REDUCTION FOR STREAMING DATA

An Improved DFSA Anti-collision Algorithm Based on the RFID-based Internet of Vehicles

Quality Assessment of Power Dispatching Data Based on Improved Cloud Model

An Improved DCT Based Color Image Watermarking Scheme Xiangguang Xiong1, a

IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 05, 2016 ISSN (online):

Open Access Research on the Prediction Model of Material Cost Based on Data Mining

FSRM Feedback Algorithm based on Learning Theory

COMP 465: Data Mining Still More on Clustering

Video annotation based on adaptive annular spatial partition scheme

IMAGE RESTORATION VIA EFFICIENT GAUSSIAN MIXTURE MODEL LEARNING

Optimization of Observation Membership Function By Particle Swarm Method for Enhancing Performances of Speaker Identification

Fast K-nearest neighbors searching algorithms for point clouds data of 3D scanning system 1

Research on Parallelized Stream Data Micro Clustering Algorithm Ke Ma 1, Lingjuan Li 1, Yimu Ji 1, Shengmei Luo 1, Tao Wen 2

COMPARISON OF DENSITY-BASED CLUSTERING ALGORITHMS

Semi supervised clustering for Text Clustering

Sliding HyperLogLog: Estimating cardinality in a data stream

Data Stream Clustering Using Micro Clusters

Research on Hybrid Network Technologies of Power Line Carrier and Wireless MAC Layer Hao ZHANG 1, Jun-yu LIU 2, Yi-ying ZHANG 3 and Kun LIANG 3,*

Introduction to Trajectory Clustering. By YONGLI ZHANG

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

Parameter Modeling for Single Screw Pump Based On CATIA Secondary Development Platform Heng Fu 1,a, Yanhua Gu 2,b *, Xiaoyu Wang 3,b, Xiu Fang Zhang 4

Detection of Anomalies using Online Oversampling PCA

A Framework for Clustering Massive Text and Categorical Data Streams

Classification of Printed Chinese Characters by Using Neural Network

A Tentative Study on Ward Monitoring System based on Zigbee Technology Jifeng Liang

Adaptive Algorithm in Image Denoising Based on Data Mining

A Data Classification Algorithm of Internet of Things Based on Neural Network

Improvement of SURF Feature Image Registration Algorithm Based on Cluster Analysis

A Test Sequence Generation Method Based on Dependencies and Slices Jin-peng MO *, Jun-yi LI and Jian-wen HUANG

SCENARIO BASED ADAPTIVE PREPROCESSING FOR STREAM DATA USING SVM CLASSIFIER

Reliability of chatter stability in CNC turning process by Monte Carlo simulation method

Sequences Modeling and Analysis Based on Complex Network

Traffic Flow Prediction Based on the location of Big Data. Xijun Zhang, Zhanting Yuan

A Feature Selection Method to Handle Imbalanced Data in Text Classification

Efficiently Mining Positive Correlation Rules

Evaluation of Moving Object Tracking Techniques for Video Surveillance Applications

An Approach for Reduction of Rain Streaks from a Single Image

ART 알고리즘특강자료 ( 응용 01)

A Kind of Fast Image Edge Detection Algorithm Based on Dynamic Threshold Value

Image and Video Quality Assessment Using Neural Network and SVM

Research on Mining Cloud Data Based on Correlation Dimension Feature

Noval Stream Data Mining Framework under the Background of Big Data

Transcription:

An Efficient RFID Data Cleaning Method Based on Wavelet Density Estimation Yaozong LIU 1*, Hong ZHANG 1, Fawang HAN 2, Jun TAN 3 1 School of Computer Science and Engineering Nanjing University of Science and Technology, Nanjing, China 2 Information Technology Department Nanjing Forest Police College, Nanjing, China 3 Department of Radiation Oncology University of Texas Southwestern Medical Center, Dallas, TX, USA new025@126.com Journal of Digital Information Management ABSTRACT: A large number of noise are usually carried in the original RFID data and need to be cleaned up before further processing. Outlier detection is an effective method for RFID data cleaning. In this paper, a point probability data model was proposed to describe the uncertain RFID data streams. The wavelet density threshold was incorporated in this method to adaptively detect the outliers in the sliding window by utilizing the multi-scale and multi-granularity characteristics of wavelet density estimation. The process of outlier detection for RFID data streams was discussed in depth. It was shown that, compared with the existing kernel density estimation algorithm, our method had higher efficiency and precision for the uncertain data streams of RFID data cleaning. Categories and Subject Descriptors H.2.8[Database Applications]: Data Mining General Terms: Data mining, RFID Data processing Keywords: RFID data streams, Data cleaning, Outlier, Uncertain, Wavelet Density Estimation (WDE) Received: 20 August 2014, Revised 28 September 2014, Accepted 3 October 2014 1. Introduction RFID applications produce massive data streams, which often contain some anomaly data (such as missing, inconsistent, or duplicate data). Those data would reduce the accuracy of the RFID application and result in RFID data streams uncertainty [1]. To avoid such problems, it is necessary to clean the uncertain RFID data streams by detecting outliers. Outlier detection is a subject undergoing intense study in data mining research. There are two major categories of outlier detection methods: distance based methods and density-based ones. Most of the outlier detections for uncertain data or data streams focus on the former, and thereof, the outliers can be categorized as local outliers and global outliers. Outlier detection is an important anomaly detection method that has gained wide attention and been extensively adopted in RFID data streams cleaning [2-3]. The outlier detection for uncertain data is more difficult than that for traditional certain data. The biggest differences between RFID data streams and traditional data streams are uncertainty and huge traffic. Therefore, new outlier detection methods need to be introduced. A tag may be not in the location of readers, 10 Journal of Digital Information Management Volume 13 Number 1 February 2015

but be mistakenly in their monitoring range for various reasons, a phenomenon known as cross-reads. The RFID cross-reads can be viewed as if a reader accidentally reads once or twice. It has a very small probability, but the impact is huge, and it is also an important cause of RFID data streams uncertainty. Liao et al. [2] treated crossreads as outliers, and calculated the spatial density distribution of tag objects by using kernel density estimation method to determine the real position of the marked object, which is located in the detection range of the reader that has the highest density values. 2. Related Works There are few research results of outlier detection for RFID data streams. Elio [4] proposed a framework of outlier detection for RFID data. However, the framework is not suitable for RFID data streams processing. For massive, variable, unreliable, distributed and other characteristics of RFID data streams, Liang et al. [3] proposed a distancebased local streams outlier detection algorithm (LSOD) and approximation-based global streams outlier detection algorithm(gsod), but the uncertainty of RFID data is not emphasized. The density estimation method is suitable for outlier detection. Yang et al. [5] used kernel density estimation (KDE) method for detecting global outlier from distributed data streams. But KDE is incapable of multi-scale and multi-granularity analysis for data streams. Because the wavelet method has the local details performance capability, the wavelet density estimator (WDE) has the local signal characteristics analysis capability, which makes it suitable for outlier detection on uncertain data streams. Theoretically, WDE is better than traditional KDE method. Liu et al. [6] applied the WDE method in local outlier detection for traditional data streams, and its detection accuracy and efficiency are higher in comparison with KDE method. Aggarwal et al [7] proposed an outlier detection technique based on density, in which the data records were distributed in an uncertain region with a probability density function (PDF), where the value of PDF in this region is 1. Their algorithm defined a η-probability to quantify the probability of an uncertain data object appearing in a dense area. Searching within multiple subspace, when η- probability is less than a given density threshold, the object is called (, η)-outlier. The density estimation problem is that when the data dimension increases, the data become sparse relative to the spatial dimension, which cannot effectively distinguish between normal points and outliers. Since RFID data are spatial-temporal and have a high spatial dimension, the outlier detection method for data streams based on density estimation needs to be improved. Based on previous studies, this paper proposes an uncertain outlier detection algorithm of cleaning for RFID data streams, combining wavelet density estimation method. 2. Uncertain RFID Data Streams Cleaning Method Based on Outlier Detection When an object attached to a tag moves into the reader s sensing area, it will produce an RFID reading. The RFID reader collects data on the tag at a fixed frequency (e.g. every 10ms), and then the data set can be regarded as a chronologically arriving data streams (time series streams). For RFID applications, we usually convert the uncertain streams data into probability data streams. Figure 1. Cleaning model for RFID data streams based on Wavelet Density Estimation Figure 1 is the cleaning model for RFID data streams based on WDE. Before doing the outlier detection, uncertain RFID data streams must be preprocessed. RFID data can be tagged as O i (time, tag_id, location), where location indicates the position of the tag, with uncertainty which can usually be represented by a tuple (x, y, z) p, where p is the present probability to measure the uncertainty of its position. There exist some uncertain data models which can be divided into two categories: value uncertainty and presence level uncertainty. The former considers that a property value in the database tuples distributed with certain probability in the discrete domains or continuous domains, such as normal distribution. The specific attributes of uncertainty are usually described by probability density function (PDF) or statistical parameters (e.g. variance). The uncertainty of tuples can also be described in various ways. In the point probability model, the attribute value of tuples and presence uncertainty can be represented by a probability value between [0, 1]. In general RFID applications, due to the misreading and duplicate reading of RFID reader, there exists uncertainty in the data, therefore it is very suitable for describing RFID data streams with the point probability model [8]. Definition 1. The point probability data streams. Uncertain data streams can be viewed as the point probability data streams composed of uncertain tuples. Uncertain data streams S (also known as point probability streams) are a k-dimensional sequence composed of independent uncertainty tuples. S = {(X 1, p 1 ),..., (X i, p i ),..., (X n, p n )}, 0 p i 1, where X i is the value of the i th tuple, p i is the presence probability of the tuple. RFID data streams can be represented by the probability data streams [8]. Assuming that each tuple can get multiple values in a discrete domain [a, b], the value of each tuple is based on a probability density function of this discrete domain. Journal of Digital Information Management Volume 13 Number 1 February 2015 11

In a RFID data streams tuple X i = (X 1,..., X d ), the attribute dimension of X is d. Assuming that each property is in line with the characteristics of independent and identically distributed, then the joint density probability of each attribute X is: P(X 1 [a 1, b 1 ],..., X d [a d, b d ]) b... a 1 1 b d a d i f(x 1,..., X d )dx 1,..., dx d (1) Since data streams have infinite resistance, it is not possible to store all the data before starting the calculations. It s necessary to determine the appropriate sample size of the data streams, typically using a sliding window mechanism, and the medium size of the sliding window can not only retain sample statistics completeness, but also reduce the storage cost of data streams. avg For collecting RFID data streams samples, we set p i as the observation probability for tag i in a gap. If the number of gaps in sliding window satisfies the inequality 1n (1/ρ) n i (2) p i avg Then it is guaranteed that tag i reading probability is greater than 1 ρ in the window n i. Definition 2. Adaptive sliding window. In order to detect the outliers for uncertain RFID data streams, the window size for reading the data streams is adaptive to change. The input is the average probability and the confidence probability of tag, the output is the size W i : W i = ComputingWindow(p i avg, ρ) (3) The number of tags detected in the window is represented by N, then the number of tags read in sliding window N meets the binomial distribution B(W i, p i avg ) [9]. When the confidence is ρ, the sufficient condition to ensure the integrity is given by Equation (2). When the window size is smaller than W, it is needed to expand the size of the window, while the conditions for window size changing are N - w i p i avg > 2 w i p i avg (1 p i avg ). The size of the sliding window should be adaptively adjusted according to the tag information changes, to ensure the collection of uncertain RFID data streams sample. 3.1 Sampling over RFID data streams Currently, sampling over RFID data streams is mainly based on the K-L distance and combined with finite mixture density to construct sampling function for analysis. For applications to track the positions of moving objects, Wang et al. [10] used the K-L distance method to capture the adaptive number of samples for a priori sample of the evolution of multi-tag RFID data. Since the kernel function and window width selection of KDE can greatly affect the accuracy. i Algorithm 1. Sampling over RFID data streams Step 1: Collect samples of the original RFID data streams by using eq. (2). Step 2: Use eq. (1) to calculate the probability density for each data streams tuple in sliding window, forming the RFID streams tuples in form of point probability data streams. Step 3: Repeat steps1 and 2. 3.2 Local outliers in RFID data streams The outliers in RFID data streams formed in RFID application systems can be divided into local outliers and global outliers [3]. This paper mainly focuses on cleaning of the local outliers on RFID data streams, while the global outliers are usually handled as abnormal events. Definition 2. RFID outliers. In the process of the RFID tag signal acquisition and recording, due to the severe noise interference, signal loss, reader failures and other reasons, there could be some signals too high or too low, which are called RFID outliers. Definition 3. Local outlier. When it is only needed to focus on the probability of data points appearing in a certain region, if the probability for a point to appear tends to be 0 or less than a given threshold, then the point is an outlier. For the probability data streams, we can measure the outlier by using the wavelet probability density function, which is called WDE outlier. Definition 4. WDE outlier. Let f denote a wavelet density function and the point x [a, b], when, ε( b a) ε( b a) PX ( [ x, x+ ] η where (ε,η) (0,1), 2 2 then x is an outlier. The range of the value of ε depends on the range of the value of η. Example 1 shows how to determine whether a point is an outlier. Example 1: Parameter ε = 0.15, η = 0.18, P(X [x 1 0.075, x + 0.075 ] = 0.06; 1 (a) P(X [x 2 0.075, x + 0.075 ] = 0.19; 1 (b) The probability from Equation (a) is much lower than the value of, while the probability from equation (b) is greater than the value of. It is easy to determine that is an outlier but is not. 3.3 Outlier detection for RFID data streams For the multi-scale and multi-granularity features of wavelet density estimation, we used the probability threshold and density estimation for determining whether a data point in current data streams sliding window is outlier, and the detection used the retention policy for threshold coefficient[11]. 12 Journal of Digital Information Management Volume 13 Number 1 February 2015

In order to ensure the threshold for uncertain data streams was chosen in a reasonable range, we chose the adaptive wavelet threshold selection method. The specific process of wavelet probability threshold is: For each non-zero wavelet coefficients d i, the random variable D i was set towith a certain probability and remained in the wavelet coefficients sequence, while the D i was set to 0 with another certain probability, namely the D i was removed from the wavelet coefficients sequence with condition E [D i ] = d i (mathematical expectation). The uncertainty of probability data streams tuples was measured by p i in definition 1. We used the threshold as the lower limit value of p i, which was value of taken from [, 1], (0 < <1). Algorithm 2: Outlier Detection for RFID data streams Step 1: Calculate the current sliding window WDE threshold of a sampled RFID data streams tuple from algorithm 1, and derive parameters and ; Step 2: Determine whether the data points in current window are outliers, based on definition 4 and parameters and; Step 3: If the pointis an outlier, then remove it and go to step 1. 4. Experimental Results To validate the efficiency of our outlier detection algorithm, we compared the detection accuracy of our algorithm with three other local outlier detection algorithms for data streams, on four different standard data sets (Table 1). The four standard data sets, of which the KDD-CUP99 is an uncertain data streams, were read out in the form of data streams. The result shows that the outlier detection accuracy of WDE on uncertain data streams (KDD-CUP99) is far superior to LOF and LoOP and slightly outperforms KDE. Their detection accuracy on other data streams are close. Therefore, the WDE algorithm is well suited for outlier detection in uncertain data streams. DataSet Method KDD- Ann- Ann- CUP99 Shuttle Throid1 Thyroid2 Outliers U2R Class7 Class1 Class2 To test the performance of our algorithm on real RFID data streams, we used simulated data generated based on the real data distribution observations which come from reader SR2240 on 2.4 GHz active tags, and the outliers meet uniform distribution [3]. Table 2 shows the experimental parameters. Experiment time granularity is in seconds and the time window size W = 2s. We investigated the effects of different window sizes on algorithm performance in Figure 2. RFID data streams were set in line with constant distribution where N = 20, and the number of same tags observed values in 2s are 20. In this section, we use examples to verify the accuracy and efficiency of the proposed method. The density estimation was performed using the Daubechies wavelet with vanishing moment 4. We compared the results with the sampling method based on KDE, in which the parameters were chosen with the methods from Liao et al. [2]. The main parameters of the experiment and the ranges are shown in Table 2. The main test performance indicators for this experiment are throughput and precision. Throughput = total number of observations / total processing time. Precision = correctly cross-reads times / total crossreads times. Parameters Description Value Range M Number of tags 10-1000 F Sampling frequency 5-20 events/sec W Sliding window size 10-100sec Wi Sliding window number 10-50 BW Window adjustment cardinality 1-5 NorN The number of normal tags M*W*F*95%- observed in sliding window M*W*F FalN The number of cross-read tags observed in sliding window 0-M*W*F Table 2. Experimental parameters and ranges SIZE 60839 43500 3428 3428 LOF 0.61 0.83 0.97 0.77 LoOP 0.74 0.91 096 0.81 KDE 0.92 0.95 0.96 0.82 WDE 0.95 0.96 0.97 0.88 LOF: Identifying Density-Based Local Outliers. LoOP: Local Outliers Probabilities. KDE: Kernel Density Estimation. WDE: Wavelet Density Estimation Table 1. Outlier Detection for four data sets Figure 2. Comparison of two methods of detection efficiency under different windows size Journal of Digital Information Management Volume 13 Number 1 February 2015 13

As can be seen from Figure 2, the throughput of WDE method is higher than that of KDE in all different window sizes, and the margin increases with the window size. For a more comprehensive comparison of our WDE data cleaning algorithm with KDE data cleaning algorithm, we used three different data sets. DataSet1 contains 1000, DataSet2 contains 5000, and DataSet3 contains 10,000 raw data records, respectively. The three data sets are then mixed with 10%, 20%, and 30% of noise data respectively. The performances of two cleaning methods on three different data sets for cleansing are compared, and showned in Figure 3. In Figure 3, when the proportion of noise increases, the cleaning time also increases. In the results of three different noise ratio tests, the cleaning time of WDE should be less than KDE. Comprehensive comparison shows that, in premise of ensuring the accuracy of cleaning, the time cost of WDE is less than KDE, especially when noise data make up higher proportion in RFID data cleaning. Figure 3. Comparison of the accuracy of read-rate Figure 4. Comparison of normal-reads and cross-reads Figure 4 shows the statistics of the relative amount of observation data between normal-reads and cross-reads. The real data analysis shows that in about 95% of the cases FalN < NorN, and in 5% of the cases FalN > = NorN. 5. Conclusions And Future Work We have proposed an outlier detection for RFID data streams method incorporating the wavelet density estimation and an improved point probabilistic streams data model. Compared with the existing kernel density estimation (KDE) method, our method is more comprehensive and powerful in that it can be applied in uncertain RFID data streams cleaning and has achieved better performance. The future work will be to explore this method further and its applications in distributed RFID data streams cleaning. Acknowledgement This work was supported by the Fundamental Research Funds for the Central Universities (No.LGYB201412) and the National Nature Science Foundation of China (No.61300053). References [1] Kanagal, B., Deshpande A. (2008). Online filtering, smoothing and probabilistic modeling of streamsing data. In: Proc. of the 24 th Int l Conf. on Data Engineering, p. 1160-1169. [2] Liao, Guoqiong., Li, Jing., Wan, Changxuan. (2010). Method over RFID Streams Based on Kernel Density Estimation. Journal of Computer Research and Development, 47 (Suppl)337-341.(In Chinese). [3] Guoqiong, Liao., Jing, Li (2010). Distance Based Outlier Dection for Distributed RFID Data Streams. Journal of Computer Research and Development, 47 (5) 930-939. (In Chinese). [4] Masciari, Elio. (2007). A framework for outlier mining in RFID data, In: Proc of the 11 th Int Database Engineering and Applications Symposium, p. 263-267. [5] Yidong, Yang., Zhihui, Sun., Jing, Zhang. (2005). Finding Outliers in Distributed Data Streams Based on Kernel Density Estimation. Journal of Computer Research and Development, 42 (9) 1498-1504. (In Chinese). [6] Liu, Yao-zong., Zhang, Hong., Meng, Jin., Han, Fawang. (2013).Outliers Detection in Data Streams Based on Wavelet Density Estimation. Computer Engineering, 39 (2) 178-181. (In Chinese) [7] Aggarwal, CC., Yu, P S. (2008). Outlier detection with uncertain data. In: Proc. of the 7 th SIAM, p. 483-493. [8] Zhang, Chen., Jin, Che-Qing., Zhou, Ao-Ying. (2010). Clustering Algorithm over Uncertain Data Streams. Journal of Software, 21 (9) 2173-2182. (In Chinese). [9] Shawn R J., Minos G., Michael J F. (2006). Adaptive cleaning for RFID data streams. Proceeding of the 32 nd International Conference on Very Large Data Bases, p. 167-174. [10] Wang, Yongli., Jiangbo Qian. (2012). Measuring the uncertainty of RFID data based on particle filter and particle swarm optimization. Wireless Networks, 18 (12) 307-318. [11] Heinz, C., Seeger, B. (2007). Adaptive Wavelet Density Estimators over Data Streams. In: Proc of the 19 th International Conference on Scientific and Statistical Database Management, p. 35-35. 14 Journal of Digital Information Management Volume 13 Number 1 February 2015