A CALCULATION METHOD OF DEEP WEB ENTITIES RECOGNITION

Similar documents
BRDPHHC: A Balance RDF Data Partitioning Algorithm based on Hybrid Hierarchical Clustering

Cluster Analysis of Electrical Behavior

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Parallelism for Nested Loops with Non-uniform and Flow Dependences

A Binarization Algorithm specialized on Document Images and Photos

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

Available online at Available online at Advanced in Control Engineering and Information Science

Classifying Acoustic Transient Signals Using Artificial Intelligence

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

The Research of Support Vector Machine in Agricultural Data Classification

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

A Deflected Grid-based Algorithm for Clustering Analysis

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

Performance Assessment and Fault Diagnosis for Hydraulic Pump Based on WPT and SOM

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

An Image Fusion Approach Based on Segmentation Region

A new segmentation algorithm for medical volume image based on K-means clustering

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Support Vector Machines

A Novel Adaptive Descriptor Algorithm for Ternary Pattern Textures

Chinese Word Segmentation based on the Improved Particle Swarm Optimization Neural Networks

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm

Research on Categorization of Animation Effect Based on Data Mining

Semantic Image Retrieval Using Region Based Inverted File

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

Network Intrusion Detection Based on PSO-SVM

Query Clustering Using a Hybrid Query Similarity Measure

UB at GeoCLEF Department of Geography Abstract

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

The Comparison of Calibration Method of Binocular Stereo Vision System Ke Zhang a *, Zhao Gao b

A CO-TRAINING METHOD FOR IDENTIFYING THE SAME PERSON ACROSS SOCIAL NETWORKS

Professional competences training path for an e-commerce major, based on the ISM method

Machine Learning 9. week

SCALABLE AND VISUALIZATION-ORIENTED CLUSTERING FOR EXPLORATORY SPATIAL ANALYSIS

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

An Optimal Algorithm for Prufer Codes *

Maximum Variance Combined with Adaptive Genetic Algorithm for Infrared Image Segmentation

The Shortest Path of Touring Lines given in the Plane

Mining User Similarity Using Spatial-temporal Intersection

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

BIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING

Image Emotional Semantic Retrieval Based on ELM

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Hierarchical Image Retrieval by Multi-Feature Fusion

Suppression for Luminance Difference of Stereo Image-Pair Based on Improved Histogram Equalization

A Feature-Weighted Instance-Based Learner for Deep Web Search Interface Identification

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Kernel Collaborative Representation Classification Based on Adaptive Dictionary Learning

Performance Evaluation of Information Retrieval Systems

An Improved Image Segmentation Algorithm Based on the Otsu Method

A Method of Hot Topic Detection in Blogs Using N-gram Model

Modular PCA Face Recognition Based on Weighted Average

Mathematics 256 a course in differential equations for engineering students

A NOTE ON FUZZY CLOSURE OF A FUZZY SET

Clustering Algorithm of Similarity Segmentation based on Point Sorting

Related-Mode Attacks on CTR Encryption Mode

Face Recognition Method Based on Within-class Clustering SVM

USING GRAPHING SKILLS

THE PATH PLANNING ALGORITHM AND SIMULATION FOR MOBILE ROBOT

Discriminative Dictionary Learning with Pairwise Constraints

Fast Feature Value Searching for Face Detection

The Man-hour Estimation Models & Its Comparison of Interim Products Assembly for Shipbuilding

Hierarchical clustering for gene expression data analysis

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

A Robust Webpage Information Hiding Method Based on the Slash of Tag

A NEW LINEAR APPROXIMATE CLUSTERING ALGORITHM BASED UPON SAMPLING WITH PROBABILITY DISTRIBUTING

Keyword-based Document Clustering

Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search

Pictures at an Exhibition

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques

A Clustering Algorithm for Key Frame Extraction Based on Density Peak

Experiments in Text Categorization Using Term Selection by Distance to Transition Point

Fuzzy C-Means Initialized by Fixed Threshold Clustering for Improving Image Retrieval

Private Information Retrieval (PIR)

Ontology Generator from Relational Database Based on Jena

Support Vector Machines

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

High-Boost Mesh Filtering for 3-D Shape Enhancement

Object-Based Techniques for Image Retrieval

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Programming in Fortran 90 : 2017/2018

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE

ApproxMGMSP: A Scalable Method of Mining Approximate Multidimensional Sequential Patterns on Distributed System

A Similarity Measure Method for Symbolization Time Series

Fuzzy Rough Neural Network and Its Application to Feature Selection

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

A Method of Query Expansion Based on Event Ontology

Module Management Tool in Software Development Organizations

A KIND OF ROUTING MODEL IN PEER-TO-PEER NETWORK BASED ON SUCCESSFUL ACCESSING RATE

An Image Compression Algorithm based on Wavelet Transform and LZW

MODULE DESIGN BASED ON INTERFACE INTEGRATION TO MAXIMIZE PRODUCT VARIETY AND MINIMIZE FAMILY COST

Finite Element Analysis of Rubber Sealing Ring Resilience Behavior Qu Jia 1,a, Chen Geng 1,b and Yang Yuwei 2,c

Remote Sensing Image Retrieval Algorithm based on MapReduce and Characteristic Information

Transcription:

A CALCULATION METHOD OF DEEP WEB ENTITIES RECOGNITION 1 FENG YONG, DANG XIAO-WAN, 3 XU HONG-YAN School of Informaton, Laonng Unversty, Shenyang Laonng E-mal: 1 fyxuhy@163.com, dangxaowan@163.com, 3 xuhongyan_lndx@163.com ABSTRACT A great amount of duplcated enttes exst n the heterogeneous data sources of Web. How to dentfy these enttes s the most mportant prerequste for pattern matchng and data ntegraton. To solve the defcency of current entty recognton methods, such as low automatc level and poor adaptablty, Deep Web entty recognton method based on BP neural network s proposed n ths paper. To make full use of the characterstc of autonomc learnng of BP neural network, the smlartes of semantc blocks are used as the nput of BP neural network. It obtans correct entty recognton model through tranng, and accomplshes the target of automated entty recognton of heterogeneous data sources. Fnally, the feasblty of the proposed method s verfed va experments. The proposed method can reduce manual nterventon and promote the effcency and the accuracy of entty recognton smultaneously. Keywords: Deep Web; BP Neural Network; Enttes Identfy; Smlarty 1. INTRODUCTION As the development of Web technology and the popularzaton of applcaton, the resource of the nternet nformaton s ncreased sharply. Web data sources can be dvded nto Surface Web and Deep Web accordng to the amount of nformaton t contaned. Usually, the Web data source whch can not be searched by the tradtonal search engne s defned as Deep Web [1]. Durng studyng the Deep Web, enttes recognton s the prmary premse about developng pattern matchng, and data ntegraton. So how to recognze the same enttes accurately and effectvely n the Deep Web has become the research focus n the feld of Deep Web. In Deep Web entty recognton feld, relevant scholars have carred out n-depth research, obtaned some representatve research results, such as Tejada, Knblock and Mnton put forward a seres of methods based on pattern matchng and publc attrbute weghts whch gve strng converson rules and apply n the publc attrbute to recognze the entty. The method ncreases the accuracy of enttes recognton, but t needs a pattern matchng as the prerequste, and needs a lot of manual nterventon []; Cu, Xao and Dng put forward the adaptve Web record matchng method based on the dstance, ths method ncreases the effcency of enttes recognton, but t s dffcult to applcaton n bad structured Web data [3]; Sarawag put forward the method that fnd equvalent unts n a gven table model, and then compare the smlarty of the correspondng attrbutes of the unts to conduct enttes recognton. Ths knd of method shortens the recognton tme, mproves the recall rate, but the expandablty of the method s poor [4]. Am to solvng the problems of the exstng research results such as automaton level s not hgh and the adaptablty of heterogeneous Web data source s poor, BP neural network technology s used n enttes recognton of Deep Web. The structure of ths paper s gven as follows. Secton ntroduces the theory of BP neural network. Secton 3 presents the Deep Web enttes recognton method base on BP neural network n detal. Secton 4 gves the specfc experment to verfy the practcablty of the proposed method; Secton 5 summarzes the whole paper and ponts out the advantages of the proposed method.. BP NEURAL NETWORK OVERVIEW Artfcal Neural Network (ANN) s a mathematcal model whch s bult on the bass of autonomous learnng to mtate the structure and functon of bologcal neural network. It can complete extremely complex pattern extracton or trend analyss by the analyss of a large number of complex data. 1175

The BP neural network learnng process s dvded nto two stages, namely nformaton forward propagaton stage and error back propagaton stage. In the nformaton forward transmsson stage, the nput samples enter from the nput layer, and then transport to output layer after processng by hdng layer. If there s large error between the actual output of the output layer and the expected output, the network wll conduct the error back propagaton stage. In the error back propagaton stage, the man task s correctng the threshold values of all neurons and the weghts of the connecton. Repeatng nformaton forward propagaton and error back propagaton untl error value meet the requrements, the learnng process s over. So the correct neural network model can be gotten. 3. DEEP WEB ENTITIES RECOGNITION METHOD BASED ON BP NEURAL NETWORK In heterogeneous Web data sources, the same entty has dfferent forms. Deep Web enttes recognton can detect same enttes from dfferent data source, n order to lay a sold foundaton for elmnatng data redundancy and comparng the same entty. Combnng the BP neural network wth the Deep Web enttes recognton can make full use of autonomous learnng of ANN to reduce manual nterventon and mprove the accuracy of recognton. Deep Web entty recognton method whch s base on BP neural network s dvded nto three man steps whch are to dvde the enttes nto blocks frstly, calculate semantc block smlarty secondly, tran enttes recognton model thrdly. The detaled ntroducton s as followed. 3.1 Dvde the enttes nto blocks The enttes are dvded nto some blocks accordng to an attrbute or the combnaton of some attrbutes whch have the relevant meanng n a semantc block. Through the analyss of the enttes, t can be found that the entty not only contans the nformaton of content, but also ncludes the metadata nformaton whch can nfluence the enttes recognton. It s necessary to preprocess the content of the enttes nformaton before tranng the enttes recognton model. Frst, the nformaton of content s dvded nto blocks. Then the semantc block s processed as a text. Thrdly, the entty of semantc block s compared wth other enttes blocks. Fnally, the smlartes value of semantc block are calculated as the nput of the BP neural network. Ths paper uses the partton method based on Web page layout tags for semantc structure. In the Deep Web stes of a gven feld, the semantc blocks contents of the entty are roughly smlar, only the extremely ndvdual semantc blocks are owned by a unque ste. Therefore, all learnng samples enttes should be dvded nto semantc blocks and the dvded blocks belong to a known attrbute. So the attrbute text labels can statstc correspondng to these semantc blocks. Then calculatng unon set and formng a set whch ncludes all attrbutes text labels correspondng to these semantc blocks wll be done. The set s recorded as A = {A 1, A,...,A n }, there n means the number of text labels whch are corresponded to the semantc blocks. After semantc block dvson of any entty n the data source, the entty can be represented by a semantc block set S = {S 1, S,..., S n }, there S means attrbute text labels of semantc block. If attrbute text label can not be found, S s empty. 3. Calculate semantc block smlarty After dvdng the entty nto blocks, snce each block has the dfferent contrbuton to enttes recognton, the smlarty between the semantc blocks and the enttes must be calculated. It s used as the nput of the neural network tranng. Specfc steps nclude dvdng the entty A to n blocks to form the correspondng semantc blocks set S={ S 1, S,..., S n }, calculatng the smlartes between each block wth the entty B to get a group of smlar values and expressng them as a set of vectors, record as T=(Sm(S 1,B),Sm(S,B),,Sm(S n,b)), Sm(Sx,B) whch means the smlarty for any block of entty A wth entty B. Because of the poor structure of HTML documents, the concept of pattern matchng wll ncrease the complexty of the calculaton. So n ths paper entty A wll be dvded frstly. Then the smlartes between the block wth entty B wll be calculated. Durng the calculaton, dfferent methods are used accordng to the dfferent feature attrbute types. (1) Non-strng type. For the propertes of nonstrng type, such as numerc, currency type, the smlartes can be calculated accordng to the range dstance between them. The block of entty A s recorded as non-strng t 1 and the one from entty B s recorded as non-strng t (may extract more non-strngs from entty B). The smlartes of each non-strng form of entty B wth t 1 can be calculated by usng formula (1), and the bggest smlarty s taken as the correspondng neural 1176

network nput base on the vsual characterstcs of t. s m( t, t ) 1 1 ( t t) + ( t t) 1 = t (1) there t s the average value of t 1 and t. () Strng type. For the propertes of strng type, the smlarty of semantc block and entty B s strng part can be calculated by usng strng edt dstance method. The method s realzed by countng the mnmum operaton number the source strng m converson to the target strng m j usng nsert, delete and replace. The formula [] s used to convert to the smlarty base on gettng the edt dstance. sm mn( m, mj ) Dm (, mj) ( m, m ) = max(0, ) mn( m, m ) edt j Dm ( 1, mj 1) + Cm ( mj) D( m, mj) = mn D( m 1, m j) + C( m ε ) Dm (, mj 1) + C( ε mj) j () There m means strng semantc block of entty A, m j means characters n the text part entty B, D(m,m j ) means edt dstance, C(m ->m j )means the cost to replace one character of m j from m, C(m ->ε) means to delete one character of m, C( ε->m j ) means to nsert one character nto m j. 3.3 Tran Enttes Recognton Model Ths step uses BP neural network to dentfy Deep Web entty. Frstly, the model of entty recognton needs to be establshed by BP neural network. The advantage of BP neural network s the weght of each attrbute needn t to be calculated. Determnng whether the Deep Web enttes match or not s accordng to tranng nner relatonshp of the attrbutes by the self-study of the BP neural network. Then the sample enttes are dvded nto two categores, namely the set of matchng enttes and the set of unmatched enttes. The two types of enttes are nputted to BP neural network respectvely to tran the weghts and threshold, and the entty dentfy model s generated. Fnally, the entty data that need to be match are nputted nto the traned entty recognton model, then the result wll be generated by the model. Specfc tranng steps are as follows. (1) The entty recognton model s establshed based on BP neural network. There are n nodes n the nput layer, whch are correspondng to the smlartes of n semantc blocks and another entty respectvely, f no correspondng smlarty then enters 0. The output layer has two nodes. When output vector s (1, 0), t represents the entty s matched. Whle the output s (0, 1), t means the entty s unmatched. The number of hdden layer nodes s determned by formula (3). n=sqrt(+j)+d (3) there n means the number of hde layer, means the number of nput layer, j means the number of output layer and d s the constant between [1, 10]. () The sample enttes are dvded to matchng entty set M (r) and the unmatched entty set U (r). Smlartes of the blocks of entty A and entty B are calculated respectvely whch denote as Sm(A,B). (3) The smlarty Sm(A,B) of M(r) entty set s used as the nput of the entty recognton model. The propertes text labels of the semantc block are corresponded wth the nput node. The smlartes value Sm(A,B) are nput to the nodes whch are correspondng to semantc blocks. If there s no correspondng node, the nput value s 0 and the desred output s (1, 0). Neural network s error back propagaton s used f there s hgher error between the actual output and the desred output. The network wll conduct an error back propagaton phase to adjust neural network s weghts and threshold untl the neural network convergence and error precson meet the requrement. (4) The smlarty Sm(A,B) of U(r) entty set s used as an nput, and the desred output s (0,1). The process s same wth step (3), untl the neural network convergence and the error precson meet the requrement; (5) The smlarty Sm(A,B) whch s used to test s nput to the completed entty recognton model and the output S s obtaned. (6) If the error range S n the target model s wthn the scope of M (r), the tested enttes wll be matched. On the other hand the error range of S n the target model s wthn the scope of the U (r), the tested enttes are unmatched. The target mode range s defned as the upper and lower lmts of the tranng set. In the experments of ths paper, for example, the lower lmt of target model M (r) s (0.85, 0.148), and the enttes error range of S ([0.85, 1], [0, 0.148]) means the enttes are matched. The upper lmt of target model U (r) s (0.308, 0.69), and the enttes error range of S ([0, 0.308], [0.69, 1]) means the enttes are unmatched. If the error s between ([0.308, 0.85], [0.148, 0.69]) that means t belongs to nether matched set nor unmatched set, an expert s need for manual recognton. 1177

4. EXPERIMENTS The expermental data s derved from two large webste, namely Amazon and dangdang. Smlar enttes are obtaned from the two webstes by submttng queres to the query nterface of the stes. These enttes are dvded nto tranng set and testng set. Frstly, the enttes are dvded nto semantc blocks. Ths experment gets 1 attrbute Table 1 The Smlarty Of The Tranng Samples text labels correspondng to semantc blocks as a collecton whch denotes K = {ISBN, ttle, author, market prce, prce, dscounts, press, paperback, barcode, ASIN, weght, sze}. The smlartes between each semantc block of any entty and another entty are calculated n the tranng set whch are used as the nput of tranng entty recognton model. The smlarty of the tranng samples s shown n Table 1. Tranng Set No ISBN ttle auth or mark et prce prce dsco unts Press pape rbac k barc ode AS IN weg ht sze Smlartes of M(r) Smlartes of U(r) 1 3 4 5 6 7 8 9 10 0.93 0.9 0.90 0.89 0.91 0.88 0.79 0.85 0.75 0.73 0.70 0.00 0.96 0.90 0.83 0.89 0.80 0.90 0.73 0.79 0.80 0.81 0.70 0.64 0.9 0.89 0.87 0.94 0.80 0.77 0.78 0.8 0.70 0.7 0.00 0.77 0.90 0.93 0.9 0.90 0.88 0.85 0.80 0.71 0.73 0.77 0.69 0.00 0.97 0.91 0.94 0.90 0.9 0.88 0.81 0.75 0.80 0.69 0.71 0.40 0.1 0.0 0.19 0.34 0. 0.15 0.08 0.17 0.11 0.06 0.08 0.01 0.09 0.3 0.1 0.0 0.15 0.19 0.08 0.09 0.07 0.07 0.00 0.09 0.17 0.3 0.18 0. 0.17 0.15 0.09 0.11 0.10 0.07 0.11 0.10 0.1 0.18 0.18 0.13 0.17 0.08 0.09 0.1 0.11 0.09 0.00 0.00 0.9 0.18 0.1 0.10 0.16 0.09 0.07 0.13 0.1 0.04 0.00 0.08 Fnally, the entty recognton model s establshed on the bass of BP neural network. Assume that the maxmum number of teratons of the neural network s not lmted untl the error s to meet the requrements, the objectve error s 0.001, and learnng rate s 0.. There are 1 nput nodes, output nodes of the neural network. The number of nodes n the hdden layer s 10 accordng to formula (3). After tranng the entty recognton model, the lower lmt of the M(r) target mode s (0.85, 0.148), the upper lmt of U(r) target mode s (0.308, 0.69). A par of matched entty A and entty B and a par of unmatched entty A and entty C whch are got n the test set are used to test. Entty A s dvded to blocks and the smlartes are computed respectvely wth entty B and entty C. The results of smlartes computaton are shown n Table. The test result of the entty A and entty B s (0.96, 0.038) whch s located n the range of target mode (0.85, 1] [0, 0.148]) range. So the entty A and entty B match. The test result of entty A and entty C s (0.06, 0.994) whch s located n range of the target mode ([0, 0.308] [0.69, 1]). So the entty A and C s unmatched. IS BN ttle Smlartes of semantc block of A and entty B Smlartes of semantc block of A and entty C Wth the ncreasng of numbers of samples, the accuracy of the model wll ncrease correspondngly. But the amount of tranng samples s too much whch wll result tme complexty ncreased. Ths paper gves a method by the evaluaton ndex of the F value to measure. F value s calculated by formula (4). Performance of ths paper to compare wth the method gven by lterature [5, 6] s shown n Fgure 1. F value of the Table Smlartes Of The Test Enttes auth mark prce dsco press or et unts prce 1178 paper back 0.94 0.93 0.95 0.90 0.88 0.85 0.89 0.81 0.77 0.14 0.79 0.00 0.19 0.3 0.5 0.11 0.30 0.1 0.09 0.10 0.07 0.15 0.06 0.05 bar cod e AS IN weg ht method whch s gven by lterature [5] s only 77% and F value of the method whch s gven by lterature [6] s 83%. But F value of the method whch s gven by ths paper s 93%. The expermental result ndcates that the proposed method n ths paper has better performance. F = (accuracy rate * recall rate * ) / (accuracy rate+ recall rate) (4) sze

100% The method of lterature[5] 80% 60% The method of 40% lterature[6] 0% The method of 0% ths paper 5. CONCLUSION Fgure 1 F Value Comparson The development on great wealth of Internet resources requests hgher on the effectveness and performance of entty recognton. Ths paper makes full use of the advantages of the BP neural network. A Deep Web entty matchng method s proposed on the bass of BP neural network. The method wthout the premse of pattern match reduces the complexty of the entty recognton by enttes blocks whle the smlartes of each the semantc block and another entty are used as the nput of the neural network tranng. The performance of the enttes recognton s mproved by the ndependent tranng of BP neural network. The contrbutons of the proposed method are to mprove the accuracy and the effcency of Deep Web entty recognton, at the same tme reduce the manual nterventon and solve the problem of low automatc level. The above contrbutons are verfed va the experments. ACKNOWLEDGMENTS Ths work was supported n part by Socal Scence Youth Foundaton of Mnstry of Educaton of Chna under Grant Nos. 1YJCZH048, the Natural Scence Foundaton of Laonng Provnce of Chna under Grant Nos. 010083 and Hundred, Thousand and Ten Thousand Talent Project of Laonng Provnce of Chna. REFRENCES: [1] WANG YAN, SONG BAO YAN, ZHANG JIA YANG et al. Deep Web query nterface dentfcaton approach based on label codng [J]. Journal of Computer Applcatons, 011, 31(5): 1351-1354. [] TEJADA SHEILA, KNIBLOCK CRAIG A, MINTON STEVEN. Learnng domanndependent strng transformaton weghts for hgh accuracy object dentfcaton [C]//Proceedngs of the eghth ACM SIGKDD nternatonal conference on Knowledge dscovery and data mnng. New York: ACM press, 00: 350-359. [3] CUN JIAN JUN, XIAO HONG YU, DING LI XIN. Dstance-Based Adaptve Record Matchng for Web Databases [J]. Journal of Wuhan Unversty, 01, 58(1): 89-94. [4] SARAWAGI SUNITA, BHAMIDIPATY ANURADHA. Interactve deduplcaton usng actve learnng [C]//Proceedngs of the eghth ACM SIGKDD nternatonal conference on Knowledge dscovery and data mnng. New York: ACM press, 00: 69-78. [5] LI WEN SYAN, CLIFTON CHRIS. SEMINT: A Tool for Identfyng Attrbute Correspondences n Heterogeneous Database Usng Neural Networks [J]. Data and Knowledge Engneerng. 000, 33: 49-84. [6] QIANG BAO HUA, CHEN LING, YU JIAN QIAO, et al. Research on Attrbute Matchng Approach Based on Neural Network [J]. Journal of Computer Scence, 006, 33(1): 49-51. 1179