A Multi-Analyzer Machine Learning Model for Marine Heterogeneous Data Schema Mapping

Similar documents
Research Article A Multianalyzer Machine Learning Model for Marine Heterogeneous Data Schema Mapping

Temperature Calculation of Pellet Rotary Kiln Based on Texture

Research on Design and Application of Computer Database Quality Evaluation Model

FSRM Feedback Algorithm based on Learning Theory

Research on Mining Cloud Data Based on Correlation Dimension Feature

CHAPTER 3 MAINTENANCE STRATEGY SELECTION USING AHP AND FAHP

STUDYING OF CLASSIFYING CHINESE SMS MESSAGES

Wavelength Estimation Method Based on Radon Transform and Image Texture

Improving the Efficiency of Fast Using Semantic Similarity Algorithm

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a

FUZZY C-MEANS ALGORITHM BASED ON PRETREATMENT OF SIMILARITY RELATIONTP

The Comparative Study of Machine Learning Algorithms in Text Data Classification*

Quality Assessment of Power Dispatching Data Based on Improved Cloud Model

Remote Monitoring System of Ship Running State under Wireless Network

Performance Degradation Assessment and Fault Diagnosis of Bearing Based on EMD and PCA-SOM

RETRACTED ARTICLE. Web-Based Data Mining in System Design and Implementation. Open Access. Jianhu Gong 1* and Jianzhi Gong 2

Privacy-Preserving of Check-in Services in MSNS Based on a Bit Matrix

Open Access Research on the Data Pre-Processing in the Network Abnormal Intrusion Detection

Using Text Learning to help Web browsing

Multi-Source Spatial Data Distribution Model and System Implementation

Image retrieval based on bag of images

Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing Environments

A Modular k-nearest Neighbor Classification Method for Massively Parallel Text Categorization

Yunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace

Semi-supervised learning and active learning

Video Inter-frame Forgery Identification Based on Optical Flow Consistency

Feature-Guided K-Means Algorithm for Optimal Image Vector Quantizer Design

An Abnormal Data Detection Method Based on the Temporal-spatial Correlation in Wireless Sensor Networks

Accuracy of Matching between Probe-Vehicle and GIS Map

Research on Evaluation Method of Product Style Semantics Based on Neural Network

Evaluation of Meta-Search Engine Merge Algorithms

EFFICIENT ATTRIBUTE REDUCTION ALGORITHM

Design of student information system based on association algorithm and data mining technology. CaiYan, ChenHua

Two Algorithms of Image Segmentation and Measurement Method of Particle s Parameters

An adaptive container code character segmentation algorithm Yajie Zhu1, a, Chenglong Liang2, b

Data Mining Technology Based on Bayesian Network Structure Applied in Learning

2 Ontology evolution algorithm based on web-pages and users behavior logs

Boosting Algorithms for Parallel and Distributed Learning

Research on Remote Sensing Image Template Processing Based on Global Subdivision Theory

RESEARCH ON INTER-SATELLITE COMMUNICATION SYSTEM OF FRACTIONATED SPACECRAFT

ZigBee Routing Algorithm Based on Energy Optimization

Background Motion Video Tracking of the Memory Watershed Disc Gradient Expansion Template

Research on Construction of Road Network Database Based on Video Retrieval Technology

Spatial Variation of Sea-Level Sea level reconstruction

Nodes Energy Conserving Algorithms to prevent Partitioning in Wireless Sensor Networks

Improving Suffix Tree Clustering Algorithm for Web Documents

ON THEORY OF INTUITIONISTIC FUZZY SETS (OR VAGUE SETS)

The application of OLAP and Data mining technology in the analysis of. book lending

CLUSTER ANALYSIS. V. K. Bhatia I.A.S.R.I., Library Avenue, New Delhi

Research on Transmission Based on Collaboration Coding in WSNs

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

Intelligent management of on-line video learning resources supported by Web-mining technology based on the practical application of VOD

Tag Based Image Search by Social Re-ranking

Research on Distributed Video Compression Coding Algorithm for Wireless Sensor Networks

INFORMATION RETRIEVAL SYSTEM USING FUZZY SET THEORY - THE BASIC CONCEPT

A Fuzzy C-means Clustering Algorithm Based on Pseudo-nearest-neighbor Intervals for Incomplete Data

Fuzzy Partitioning with FID3.1

Tendency Mining in Dynamic Association Rules Based on SVM Classifier

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

Information Retrieval. Information Retrieval and Web Search

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating

Automatic Domain Partitioning for Multi-Domain Learning

CSC411 Fall 2014 Machine Learning & Data Mining. Ensemble Methods. Slides by Rich Zemel

Creating Time-Varying Fuzzy Control Rules Based on Data Mining

Restricted Nearest Feature Line with Ellipse for Face Recognition

Selection of Best Web Site by Applying COPRAS-G method Bindu Madhuri.Ch #1, Anand Chandulal.J #2, Padmaja.M #3

Research on Social Relationship Network System based on MongoDB

Study on A Recommendation Algorithm of Crossing Ranking in E- commerce

A New Approach for Automatic Thesaurus Construction and Query Expansion for Document Retrieval

Top-k Keyword Search Over Graphs Based On Backward Search

HFCT: A Hybrid Fuzzy Clustering Method for Collaborative Tagging

A Game Map Complexity Measure Based on Hamming Distance Yan Li, Pan Su, and Wenliang Li

A Bagging Method using Decision Trees in the Role of Base Classifiers

European Network on New Sensing Technologies for Air Pollution Control and Environmental Sustainability - EuNetAir COST Action TD1105

Cluster Validity Classification Approaches Based on Geometric Probability and Application in the Classification of Remotely Sensed Images

HIGH RESOLUTION REMOTE SENSING IMAGE SEGMENTATION BASED ON GRAPH THEORY AND FRACTAL NET EVOLUTION APPROACH

The Design and Application of GIS Mathematical Model Database System with Meta-algorithm Li-Zhijiang

IMPROVING THE RELEVANCY OF DOCUMENT SEARCH USING THE MULTI-TERM ADJACENCY KEYWORD-ORDER MODEL

OPTIMIZATION OF BAGGING CLASSIFIERS BASED ON SBCB ALGORITHM

Alberto Messina, Maurizio Montagnuolo

IMPROVING INFORMATION RETRIEVAL BASED ON QUERY CLASSIFICATION ALGORITHM

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

Keywords Clustering, Goals of clustering, clustering techniques, clustering algorithms.

A Novel Image Classification Model Based on Contourlet Transform and Dynamic Fuzzy Graph Cuts

Machine Learning Techniques for Data Mining

Cost-sensitive C4.5 with post-pruning and competition

A Proposed Model For Forecasting Stock Markets Based On Clustering Algorithm And Fuzzy Time Series

manufacturing process.

TEVI: Text Extraction for Video Indexing

A New Distance Independent Localization Algorithm in Wireless Sensor Network

Unsupervised Learning and Clustering

Slides for Data Mining by I. H. Witten and E. Frank

Integration of Fuzzy Shannon s Entropy with fuzzy TOPSIS for industrial robotic system selection

EFFICIENT ADAPTIVE PREPROCESSING WITH DIMENSIONALITY REDUCTION FOR STREAMING DATA

A Novel Method for Activity Place Sensing Based on Behavior Pattern Mining Using Crowdsourcing Trajectory Data

Reading group on Ontologies and NLP:

Conceptual Review of clustering techniques in data mining field

Information modeling and reengineering for product development process

An Unsupervised Approach for Combining Scores of Outlier Detection Techniques, Based on Similarity Measures

Research on-board LIDAR point cloud data pretreatment

Transcription:

A Multi-Analyzer Machine Learning Model for Marine Heterogeneous Data Schema Mapping Wang Yan 1, 2 Le Jiajin 3, Zhang Yun 2 1 Glorious Sun School of Business and Management Donghua University 2 College of Information Technology Shanghai Ocean University 3 School of Computer Science and Technology Donghua University Shanghai, China ABSTRACT: In heterogeneous data integration, an effective machine learning model plays an important role in schema mapping. Schema mapping machine learning model and its probability learning improvement are analyzed in this paper firstly, and then the concept of multi -analyzer model with the method of fuzzy comprehensive evaluation is put forward to improve machine learning results efficiency and accuracy. Comprehensive experiments on multi-analyzer model confirm the effectiveness of multi -analyzer model. Keywords: Marine Data, Heterogeneity, Machine Learning, Schema Mapping, Fuzzy Comprehensive Evaluation Received: 30 May 2015, Revised 2 July 2015, Accepted 11 July 2015 2015 DLINE. All Rights Reserved 1. Introduction 1.1 Heterogeneity of Marine Data and Integration Mappings Marine data is typical heterogeneous data with multi-scale, multi-temporal and multi-semantic features. Marine rich data include marine resource, marine geography, marine biology, marine chemistry, and many other fields. Due to the difference between acquisition equipments, information processing platforms and data storage formats, even in the same field heterogeneous data still exist. In schema mapping, marine heterogeneous data s features are described as follows: 1 Large-scale Large-scale marine data come from marine large monitor sensor systems which transmit huge amount of marine data. 2 Distribution Uncertainty Distribution uncertainty includes both physical distribution and logical distribution uncertainty. The physical distribution uncertainty is that the data source is not physically concentrated but distributed in multi regions which connected by network between pluralities of data sources. And logical uncertainty is that different properties may even exist in a same data cube s different data sources. 3 Semantic Heterogeneity 110 Journal of Multimedia Processing and Technologies Volume 6 Number 4 December 2015

Naming rules and data types in each marine data source are more or little different. And there is also data semantic heterogeneity in marine databases. When a solid data model is built, different granularity division, different relations between the different entities, as well as entities semantic description difference between different data sources result in heterogeneous semantic data. 4 Semi-Standardized Although semantic heterogeneity of schema and data exist in marine database, the descriptions of marine information have certain specifications and standards, they have some commonality, for example hydrographic tables contain tide, water temperature, and wave height data, while times, location, and wave field in the tables have significant structural features. These loose constraints are called as semi-standardized. Marine heterogeneous integration is an organic integration of marine data in different formats, source, nature and characteristics of logically or physically. The goal is to conversion, adjust, decompose and merge marine data s different formal features (such as units, format, scale etc. and internal features (such as property, etc. and final form compatible seamless marine data sets [1]. In the process of marine heterogeneous data integrating, how to improve the efficiency of heterogeneous data schema mapping should be considered. Schema mapping relation is the combination of elements mapping between heterogeneous data models in the same field or heterogeneous island. Schema mapping problem is a research hotspot in heterogeneous data integration and transformation. Instantiation process of schema mapping is a process of that users select some data or data sets by operate the data entry, determine the data flow direction, and select target database and mapping table to map. How to implement automatic or semiautomatic schema mapping is the problem for machine mapping. 1.2 Automatic Schema Mapping based on Machine Learning Automatic mapping between heterogeneous data can be achieved by the structure of the learning machine. A large number of researches indicate multi-strategy machine learning usually has higher accuracy than single one. Therefore, while increasing the complexity of the system, a lower error rate can be obtained. Currently, the researches of multi- strategy learning machine focus on learning model combination and single learning output approach. In [2, 3] BayesIDF learning and grammar learning method are combined effectively, and multi- strategy approach is proposed for information extraction. The multi-strategy learning model in [4] refers to the framework of combination the result of multiple learning methods, which is also known as meta-learning [5]. These analyses theoretically prove that a result combination of learning is more accurate than the best results of the individual learner. It is combined rule-based and based on maximum entropy model in [6], proposed a hybrid approach to determine an appropriate time contact between a pair of entities. A new multi-domain online learning framework based on parameters is proposed in [7]. EnsembleMatrix is proposed in [8] which can help users understand the relative merits of various classifiers and analyzers, and allow users to explore and establish a communication interact model by direct visualization. Some other integration methods are presented, such as boosting [9], stacking [10], and bagging [11], these methods repeat single learning training, and apply the result to different parts of one problem, and combine these results to obtain performance improvement. However, because of the huge difference between marine samples and heterogeneous data, it is more difficult to determine and give an accurate judgment from the individual output of the learning. Firstly, the paper describes characteristics of machine learning schema mapping model for marine heterogeneous data, and discusses how to improve the efficiency of automatic schema mapping model by probability learn. Then in order to evaluate output results of the automatic schema mapping more accurately, a concept of multi-learning and multi-analyzer is put forward, and the fuzzy comprehensive evaluation is applied to multi-analyzer model to quantify various factors of learner output during heterogeneous sampling and get more accurate combination learning results. Finally, the multi-analyzer model is verified in multi-source marine observation data conversion experiments. 2. Automatic Schema Mapping Based on Machine Learning Automatic mapping model of machine learning generally contains input interface, learner, analyzer and human interface as shown in Figure 1. The input interface checks data, and filters and selects the sample. Learner is responsible for statistical analysis and feature extraction. By training and learning, input data and sample object model generates an internal data model. Journal of Multimedia Processing and Technologies Volume 6 Number 4 December 2015 111

After analysis the learning result, the analyzer can adjust the parameters of the learning machine and reconfigures strategies to improve the efficiency of the schema mapping and real-time matching. Man-machine interface refers to the users operate UI and input feedback adjustment information and parameters to form positive feedback. Figure 1. Flow diagram of automatic schema mapping based on machine learning In the training phase, training data - are set by default, learning machine extracts the character parameters of data sets, and then internal schema mapping prediction model is generated. The analyzer analyzes output data of learning machine and feedbacks parameters. In the schema mapping phase, according to the prediction model, the learner analyzes data source and monitor schema mapping results in real time. In this process, users can interfere with the schema mapping results by UI interface. In the self-learning process, selection of learning sample is an important part. Sample sets should be of the maximum global matching attributes. Set sample set β, β attributes {b 1, b 2,, b n }. As global learning samples, γ has the global data set {b f1, b f2,, b fn }. Where n represents the number of global properties, if there is min ({b 1, b 2,, b n },{b f1, b f2,,b fn } sample set β, can be called the best sample set. For marine data system, assuming there are two kinds of data, data source SOD and target data DOD, some schema mapping constraint and schema mapping constraint relationships in the user interface. We can establish schema mapping with all these things. For example, there are sites, latitudes and longitudes of sensor nodes, sensing time, tidal waves in marine source data. After the target schema field and specified schema relationship are defined a training process table in detail can be obtained. If the samples are extracted according to the probability distribution model, probability properties distribution of the large sample could be calculated and the data selection function of input interface could be optimized. Though enough of samples is required, the training process cannot cover have all the instance sets. If there is a distribution property of statistical sample, prospective study can be obtained in advance. When the sample data sets are selected, attribute tags in the sample data can be previously identified, and these tags are recorded in the real environments, which can mark the distribution density of the attribute, as shown in figure 2. Samples can be searched according to tags in high probability distribution interval, and sorted according to the probability distribution. We can get various properties data sets, including highly matched collection sets, moderate matched collection sets and lowly matched ones. Samples in various properties data sets are selected randomly and the training results then are modifies by manual intervention interface during training. According to the probability distribution learning results are transferred into the learners and analyzers, comprehensive analysis modules in the analyzer feedback the judgment information, inquiry learner s pre-judgments for training. The analyzer s field judgments can be mathematically expressed as follows: 112 Journal of Multimedia Processing and Technologies Volume 6 Number 4 December 2015

Figure 2. The sample filter based on the probability distribution p = m (f (x s, p, t, x d = { 1, p < 0.4 ( Default probability 1 threshold 0, p > 0.8 ( Default probability 0 threshold For any instance of group x s, we get schema mapping values by schema mapping function f, and compare these values to the known target instances, 0 or 1 will be obtained by calculate the probability of matching values can compare they to the probability thresholds. Then learner gives the scores of the schema mapping effect. If there are multi-learners, you need to compare and rank multiple learner output. Finally, the analyzer will output record information table, which covers sample properties and the parameters proposals of learning machine, and dynamically adjust the parameters according to current input and response results. After training, analyzer can begin to the map process for new data sources and new data. An ocean observation data mapping process is shown in Figure 3. The data which matching distribution probability is extracted from instance of observed data source, and used for detect and judgment the information unit. And then the correlation matching probability of learning machine is given for analysis module s 0/1 judgment according to predefine threshold value. In the whole process users give the real-time feedback modify by operating UI. 3. Heterogeneous Data Schema Mapping Optimization basing on Multi-analyzer 3.1 Multi-analyzer Concept The analyzer can selects the best learning path from the output of the learners, so it must take consider analysis theme domain subdivision and analysis major subject identification, and object sorting firstly. When we select subject field, universality, completeness and independency must be considered firstly, so that the selected subject field cover other subject fields. Secondly, evaluation granularity affects the difficulty and parallelism of judgment and calculation. Finally, the selection of the appropriate evaluation method must base on the samples feature. We can select the probability or fuzzy evaluation methods. In automatic machine schema mapping model, multi-analyzer will improve the schema mapping accuracy. In Figure 4, multianalyzer will build a score matrix for the matching correlation of the learner s output results. Multi-analyzer use multi-score board strategy to match different data or learning strategies. Multi-analyzer can expand the match and synthesize process, introduce multi-stage matching and integration, and select the best Analysis Results from the output. Journal of Multimedia Processing and Technologies Volume 6 Number 4 December 2015 113

Figure 3. Machine schema mapping process of ocean observation data Figure 4. Multi-Analyzer automatic schema mapping Structure P n = m 1 ( f ( x s, p,t, x d m 2 ( f ( x s, p,t, x d m 3 ( f ( x s, p,t, x d... m n ( f ( x s, p,t, x d Field analysis can be expressed as a multidimensional function to extend the determination mode. But for multidimensional analysis results, how to convert multi-outputs to single output, and how to select an appropriate function to achieve dimension reduction processing and simplify the analysis are difficult and important. There are huge differences in massive samples and heterogeneous data, so the final output and determination of multi-analyzer be more difficult and confused. 114 Journal of Multimedia Processing and Technologies Volume 6 Number 4 December 2015

So the fuzzy comprehensive evaluation is introduced for multi-analyzer s result to simplify the result by the fuzzy dimension reduction. 3.2 Fuzzy Comprehensive Evaluation Method The basic idea of fuzzy comprehensive evaluation is to consider the various factors associated with objects, and make a reasonable evaluation with the fuzzy linear transformation theory and the maximum membership degree principle. There are m factors related to evaluating objet, it is called as the factors set and denote as below: U = {u 1, u 2,..., u m } There are n comments which are called evaluation set and recorded as below: V = {v 1, v 2,..., v m } Firstly, each factor u i in factors set U is single evaluation factor which determines the membership degree value r ij of the reviews v j and forms a single-actor evaluation set of u i, r i = (r i1, r i2,..., r im, r i μ (V So we can get a fuzzy sets of the reviews set V. Then a fuzzy-value schema mapping function is shown below, f: U F(V, u i f(u i Where f (u i = r ij = (r i1, r i2,..., r in is the comments fuzzy vector on the factors set u i, and r ij is the relationship factor between u j, and v j. Then all the single-factor evaluation factors are integrated together to build a general evaluation of matrix and a fuzzy comprehensive evaluation matrix R from U to V: In other words, fuzzy value function get the relationship from U to V, where matrix., R f is the comprehensive evaluation As the effects of all the factors for evaluation are different, strong or weak, a fuzzy weight factor sets is necessary for matrix U, which is defined below. Where a i is the measure of the impact in the evaluation for the factors u i (i = 1,2,..., m,and A is called importance fuzzy subset on the matrix U, and a i is called importance coefficient for factor u i. Then fuzzy comprehensive evaluation model is given to calculate the fuzzy comprehensive evaluation set. When the factors of importance fuzzy sets A and comprehensive evaluation matrix (fuzzy relations R are known, fuzzy subsets A on evaluation matrix V is got by linear transform from R. Journal of Multimedia Processing and Technologies Volume 6 Number 4 December 2015 115

Where indicates generalized fuzzy and operation, and indicates generalized fuzzy or operation. And B is called as the set of fuzzy comprehensive evaluation set for V, and the formula is known as the comprehensive evaluation model, and denoted as. Finally, according to the maximum principle of membership degree, the maximum largest membership b j in the set (b l, b 2,, b n is the result of comprehensive evaluation, where review element v j is corresponded to b j in fuzzy comprehensive evaluation set. Compared to single-factor ui, the membership factor of evaluation ui for review factor v i is r ij (j = 1, 2,,n, while the results of operations (a i r ij (denoted by r ij * can reflect all the influence factors and obtain more accurate evaluations. That is the fuzzy comprehensive evaluation method s merit. 3.3 Multi-analyzer with Fuzzy Comprehensive Evaluation In order to evaluate the learner s output, during the process of output quantization, fuzzy comprehensive evaluation method can be applied to learning machine for massive heterogeneous marine data schema mapping and take all dynamic, ambiguous, and real-time factors into account. In the schema mapping system with multi-analyzer, fuzzy comprehensive evaluation method will improve automatic map s accuracy and sensitivity. (1 The evaluation factors set of learning model is: U = {U 1, U 2,, U n },the weight set is A ={A 1, A 2,, A n }, and A i is the weight of evaluation factor U i,. And U i ={U i1,u i2,,u n }, n is the number of specific performance factor basing on design factors. The weight set A i ={a i1,a i2,,a in }, where a ij is the weighing of u ij, where.. The performance factors set of the automatic schema mapping learning machine model is shown in Table 1. Objective Level 1 Level 2 evaluation of learner soutput Correlation model evaluation U 1 Correlation model evaluation U 11 Correlation model evaluation U 12 Performance analysis evaluation U 2 Performance analysis evaluation U 21 Performance analysis evaluation U 22 Adaptability analysis evaluation U 3 Adaptability analysis evaluation U 31 Adaptability analysis evaluation U 32 Table 1. Evaluation factor hierarchical table Weight set is determined by the way of matching degree evaluation according to the expert points-scoring system s evaluation for all kind of performance indexes. The single factor performance evaluation of learning model evaluate the membership index of each factor, and a fuzzy schema 116 Journal of Multimedia Processing and Technologies Volume 6 Number 4 December 2015

mapping function is given: :U V. For each U i, R i can be shown as the fuzzy matrix: Where r jk represents the membership grade of factors u ij to the k level evaluation V k. The value of r jk can be determined by expert points-scoring system. The number of v i level reviews for u ij is s i, so we get : The membership vector B i of the evaluation set V, B i =A i R i = (b i1, b i2,, b im, is the summary result of performance single factor fuzzy evaluation result on the factor U i. After the single factor evaluation, fuzzy comprehensive evaluation can extend the evaluation to all levels learning indexes and constitute fuzzy matrix B by a single factor B i. And after the fuzzy performance matrix operations to R, the membership vector B is given for factors set U to comment V level: B = A R = (b 1,b 2,,b m If the normalization expression is According to the principle of maximum membership, performance reviews V k corresponded to the maximum membership degree b k in B is the performance level element in model design stage. The decision-maker determines the performance index of the model design According to the critical index threshold. It should be noted that the weight of the model needs to be adjusted for the different learning machine models and learning strategies. So for multi learning strategies weight parameters sets may be loaded dynamically. 4. Experiments The factors and weights of fuzzy comprehensive evaluation can be adjusted according to the actual requirements during the marine heterogeneous data schema mapping process. For example, in the process of multi-source marine tidal data, the evaluation factors and the weights range are shown in Table 2. We compare the schema mapping conversion accuracy and efficiency of 20-group multi-source marine data with single-analyzer and multi-analyzer based on fuzzy comprehensive evaluation. There are eight types conversion information, including location Journal of Multimedia Processing and Technologies Volume 6 Number 4 December 2015 117

Multi strategies learning machine model is the research hotspot of automatic data schema mapping. Because of marine data s heterogeneity and large capacity, the outputs of learner become more difficult to judge and select accurately. In order to evaluate the learner s output more effectively and accurately, this paper presents the concept of multi-analyzer and evaluates multicoordinates, timestamp information, wave heights, tidal time, observation instrument information, data packet length, data backhaul IP addresses, temperature information. Schema mapping error rate is defined as the ratio of errors and the total number of schema mapping after learning process which measures the schema mapping accuracy and model quality. Different machine learning models are built with Matlab to map 100,000 heterogeneity data filed. We can get the following results in the experiment: Objective Level1 Level2 Weights Evaluat-ion of mul Correlation model evaluation Conversion coefficient score of the 15%~20% source marinetidal data s outputs of tide data in time t 1 ~t n learning machin-e output Matching score of same dimensions 10%~15% tidal data (surface data / underwater data conversion output Output distribution probability matching 5%~10% of key data (dynamic location coordinates, time information, wave height information, tide time information High tide, low tide limit spatial 0%~5% matching degree conversion performance Response time(max0min0avg under 5%~10% analysis evaluation typical long term observation Response time(max0min0avg under 5%~10% typical short term observation Response time(max0min0avg under 0%~5% long term observation tidal day Response time(max0min0avg under 5%~10% short time observation tidal day Adaptability analysis evaluation Data fluctuation under long 5%~10% term observation Output fluctuation of same dimensions 5%~10% (surface data / underwater data tidal data Table 2. Tidal data evaluation factors hierarchy partition Y-axis in Figure 5 and Figure 6 is schema mapping error rates. They drop 5% from single analyzer strategy to multi analyzer strategy. And there is 3% average error rate in multi analyzer with fuzzy comprehensive evaluation method. 5 Conclusion 118 Journal of Multimedia Processing and Technologies Volume 6 Number 4 December 2015

Figure 5. Comparative analysis strategy of single learner Figure 6. Comparative analysis of multi-learner strategies output with fuzzy comprehensive evaluation method and quantify the various factors of output under the mass and heterogeneous samples. At last, we compare three schema mapping machine methods in the observing data conversion experiments. Experimental results show the accuracy improvement of multi-analyzer significantly. Acknowledgement This work is supported by the National Natural Science Foundation of China (Grant No. 41376178, and the Shanghai Science and Technology Committee (Project NO. 11510501300. Journal of Multimedia Processing and Technologies Volume 6 Number 4 December 2015 119

References [1] Huang Dongmei., Zhang Chi., Du Jipeng. (2012. Integration of massive multi-source heterogeneous space-time data in digital sea, Marine Environmental Science, 31 (001, 11-113. [2] Michalski, R. S., Carbonell, J. G., Mitchell,T. M. (1985. Machine Learning: An ArtificialIntelligence Approach. Morgan Kaufmann. [3] Domingos, P. (1996. Unifying Instance-Based and Rule-Based Induction. MachineLearning, 24 (2, 141-168. [4] Freitag, D. (2000. Machine Learning for Information Extraction in Informal Domains. Machine Learning, 39 (2, 169-202. [5] Chan, P. K., Stolfo, S. J. (1993. Experiments on multistrategy learning by meta-learning. In : Proceedings of the second international conference on Information and knowledge management, 314-323. [6] Chang, Y. C., Dai, H. J., Wu, J.C.Y. (2013. et al. TEMPTing System: a hybrid method of rule and machine learning for temporal relation extraction in patient discharge summaries, Journal of BiomedicalInformatics. [7] Dredze, M., Kulesza, A., Crammer, K. (2010. Multi-domain learning by confidence-weighted parameter combination, Machine Learning,79 (1-2, 123-149. [8] Talbot, J., Lee, B., Kapoor, A. (2009. et al. EnsembleMatrix: interactive visualization to support machine learning with multiple classifiers, In : Proceedings of the 27 th international conference on Human factors in computing systems. ACM, 1283-1292. [9] Freund, Y.,Schapire, R.E.(1996. Experiments with a New Boosting Algorithm. MACHINE LEARNING-INTERNATIONAL WORKSHOP THENCONFERENCE, 148-156. 10] Wolpert,D.H. (1992. Stacked generalization, Neural Networks, 5 (2, 241-259. [11] Breiman, L. (1996. Bagging Predictors. Machine Learning, 24 (2, 123-140. 120 Journal of Multimedia Processing and Technologies Volume 6 Number 4 December 2015