A Multi-Analyzer Machine Learning Model for Marine Heterogeneous Data Schema Mapping

Size: px
Start display at page:

Download "A Multi-Analyzer Machine Learning Model for Marine Heterogeneous Data Schema Mapping"

Transcription

1 A Multi-Analyzer Machine Learning Model for Marine Heterogeneous Data Schema Mapping Wang Yan 1, 2 Le Jiajin 3, Zhang Yun 2 1 Glorious Sun School of Business and Management Donghua University 2 College of Information Technology Shanghai Ocean University 3 School of Computer Science and Technology Donghua University Shanghai, China ABSTRACT: In heterogeneous data integration, an effective machine learning model plays an important role in schema mapping. Schema mapping machine learning model and its probability learning improvement are analyzed in this paper firstly, and then the concept of multi -analyzer model with the method of fuzzy comprehensive evaluation is put forward to improve machine learning results efficiency and accuracy. Comprehensive experiments on multi-analyzer model confirm the effectiveness of multi -analyzer model. Keywords: Marine Data, Heterogeneity, Machine Learning, Schema Mapping, Fuzzy Comprehensive Evaluation Received: 30 May 2015, Revised 2 July 2015, Accepted 11 July DLINE. All Rights Reserved 1. Introduction 1.1 Heterogeneity of Marine Data and Integration Mappings Marine data is typical heterogeneous data with multi-scale, multi-temporal and multi-semantic features. Marine rich data include marine resource, marine geography, marine biology, marine chemistry, and many other fields. Due to the difference between acquisition equipments, information processing platforms and data storage formats, even in the same field heterogeneous data still exist. In schema mapping, marine heterogeneous data s features are described as follows: 1 Large-scale Large-scale marine data come from marine large monitor sensor systems which transmit huge amount of marine data. 2 Distribution Uncertainty Distribution uncertainty includes both physical distribution and logical distribution uncertainty. The physical distribution uncertainty is that the data source is not physically concentrated but distributed in multi regions which connected by network between pluralities of data sources. And logical uncertainty is that different properties may even exist in a same data cube s different data sources. 3 Semantic Heterogeneity 110 Journal of Multimedia Processing and Technologies Volume 6 Number 4 December 2015

2 Naming rules and data types in each marine data source are more or little different. And there is also data semantic heterogeneity in marine databases. When a solid data model is built, different granularity division, different relations between the different entities, as well as entities semantic description difference between different data sources result in heterogeneous semantic data. 4 Semi-Standardized Although semantic heterogeneity of schema and data exist in marine database, the descriptions of marine information have certain specifications and standards, they have some commonality, for example hydrographic tables contain tide, water temperature, and wave height data, while times, location, and wave field in the tables have significant structural features. These loose constraints are called as semi-standardized. Marine heterogeneous integration is an organic integration of marine data in different formats, source, nature and characteristics of logically or physically. The goal is to conversion, adjust, decompose and merge marine data s different formal features (such as units, format, scale etc. and internal features (such as property, etc. and final form compatible seamless marine data sets [1]. In the process of marine heterogeneous data integrating, how to improve the efficiency of heterogeneous data schema mapping should be considered. Schema mapping relation is the combination of elements mapping between heterogeneous data models in the same field or heterogeneous island. Schema mapping problem is a research hotspot in heterogeneous data integration and transformation. Instantiation process of schema mapping is a process of that users select some data or data sets by operate the data entry, determine the data flow direction, and select target database and mapping table to map. How to implement automatic or semiautomatic schema mapping is the problem for machine mapping. 1.2 Automatic Schema Mapping based on Machine Learning Automatic mapping between heterogeneous data can be achieved by the structure of the learning machine. A large number of researches indicate multi-strategy machine learning usually has higher accuracy than single one. Therefore, while increasing the complexity of the system, a lower error rate can be obtained. Currently, the researches of multi- strategy learning machine focus on learning model combination and single learning output approach. In [2, 3] BayesIDF learning and grammar learning method are combined effectively, and multi- strategy approach is proposed for information extraction. The multi-strategy learning model in [4] refers to the framework of combination the result of multiple learning methods, which is also known as meta-learning [5]. These analyses theoretically prove that a result combination of learning is more accurate than the best results of the individual learner. It is combined rule-based and based on maximum entropy model in [6], proposed a hybrid approach to determine an appropriate time contact between a pair of entities. A new multi-domain online learning framework based on parameters is proposed in [7]. EnsembleMatrix is proposed in [8] which can help users understand the relative merits of various classifiers and analyzers, and allow users to explore and establish a communication interact model by direct visualization. Some other integration methods are presented, such as boosting [9], stacking [10], and bagging [11], these methods repeat single learning training, and apply the result to different parts of one problem, and combine these results to obtain performance improvement. However, because of the huge difference between marine samples and heterogeneous data, it is more difficult to determine and give an accurate judgment from the individual output of the learning. Firstly, the paper describes characteristics of machine learning schema mapping model for marine heterogeneous data, and discusses how to improve the efficiency of automatic schema mapping model by probability learn. Then in order to evaluate output results of the automatic schema mapping more accurately, a concept of multi-learning and multi-analyzer is put forward, and the fuzzy comprehensive evaluation is applied to multi-analyzer model to quantify various factors of learner output during heterogeneous sampling and get more accurate combination learning results. Finally, the multi-analyzer model is verified in multi-source marine observation data conversion experiments. 2. Automatic Schema Mapping Based on Machine Learning Automatic mapping model of machine learning generally contains input interface, learner, analyzer and human interface as shown in Figure 1. The input interface checks data, and filters and selects the sample. Learner is responsible for statistical analysis and feature extraction. By training and learning, input data and sample object model generates an internal data model. Journal of Multimedia Processing and Technologies Volume 6 Number 4 December

3 After analysis the learning result, the analyzer can adjust the parameters of the learning machine and reconfigures strategies to improve the efficiency of the schema mapping and real-time matching. Man-machine interface refers to the users operate UI and input feedback adjustment information and parameters to form positive feedback. Figure 1. Flow diagram of automatic schema mapping based on machine learning In the training phase, training data - are set by default, learning machine extracts the character parameters of data sets, and then internal schema mapping prediction model is generated. The analyzer analyzes output data of learning machine and feedbacks parameters. In the schema mapping phase, according to the prediction model, the learner analyzes data source and monitor schema mapping results in real time. In this process, users can interfere with the schema mapping results by UI interface. In the self-learning process, selection of learning sample is an important part. Sample sets should be of the maximum global matching attributes. Set sample set β, β attributes {b 1, b 2,, b n }. As global learning samples, γ has the global data set {b f1, b f2,, b fn }. Where n represents the number of global properties, if there is min ({b 1, b 2,, b n },{b f1, b f2,,b fn } sample set β, can be called the best sample set. For marine data system, assuming there are two kinds of data, data source SOD and target data DOD, some schema mapping constraint and schema mapping constraint relationships in the user interface. We can establish schema mapping with all these things. For example, there are sites, latitudes and longitudes of sensor nodes, sensing time, tidal waves in marine source data. After the target schema field and specified schema relationship are defined a training process table in detail can be obtained. If the samples are extracted according to the probability distribution model, probability properties distribution of the large sample could be calculated and the data selection function of input interface could be optimized. Though enough of samples is required, the training process cannot cover have all the instance sets. If there is a distribution property of statistical sample, prospective study can be obtained in advance. When the sample data sets are selected, attribute tags in the sample data can be previously identified, and these tags are recorded in the real environments, which can mark the distribution density of the attribute, as shown in figure 2. Samples can be searched according to tags in high probability distribution interval, and sorted according to the probability distribution. We can get various properties data sets, including highly matched collection sets, moderate matched collection sets and lowly matched ones. Samples in various properties data sets are selected randomly and the training results then are modifies by manual intervention interface during training. According to the probability distribution learning results are transferred into the learners and analyzers, comprehensive analysis modules in the analyzer feedback the judgment information, inquiry learner s pre-judgments for training. The analyzer s field judgments can be mathematically expressed as follows: 112 Journal of Multimedia Processing and Technologies Volume 6 Number 4 December 2015

4 Figure 2. The sample filter based on the probability distribution p = m (f (x s, p, t, x d = { 1, p < 0.4 ( Default probability 1 threshold 0, p > 0.8 ( Default probability 0 threshold For any instance of group x s, we get schema mapping values by schema mapping function f, and compare these values to the known target instances, 0 or 1 will be obtained by calculate the probability of matching values can compare they to the probability thresholds. Then learner gives the scores of the schema mapping effect. If there are multi-learners, you need to compare and rank multiple learner output. Finally, the analyzer will output record information table, which covers sample properties and the parameters proposals of learning machine, and dynamically adjust the parameters according to current input and response results. After training, analyzer can begin to the map process for new data sources and new data. An ocean observation data mapping process is shown in Figure 3. The data which matching distribution probability is extracted from instance of observed data source, and used for detect and judgment the information unit. And then the correlation matching probability of learning machine is given for analysis module s 0/1 judgment according to predefine threshold value. In the whole process users give the real-time feedback modify by operating UI. 3. Heterogeneous Data Schema Mapping Optimization basing on Multi-analyzer 3.1 Multi-analyzer Concept The analyzer can selects the best learning path from the output of the learners, so it must take consider analysis theme domain subdivision and analysis major subject identification, and object sorting firstly. When we select subject field, universality, completeness and independency must be considered firstly, so that the selected subject field cover other subject fields. Secondly, evaluation granularity affects the difficulty and parallelism of judgment and calculation. Finally, the selection of the appropriate evaluation method must base on the samples feature. We can select the probability or fuzzy evaluation methods. In automatic machine schema mapping model, multi-analyzer will improve the schema mapping accuracy. In Figure 4, multianalyzer will build a score matrix for the matching correlation of the learner s output results. Multi-analyzer use multi-score board strategy to match different data or learning strategies. Multi-analyzer can expand the match and synthesize process, introduce multi-stage matching and integration, and select the best Analysis Results from the output. Journal of Multimedia Processing and Technologies Volume 6 Number 4 December

5 Figure 3. Machine schema mapping process of ocean observation data Figure 4. Multi-Analyzer automatic schema mapping Structure P n = m 1 ( f ( x s, p,t, x d m 2 ( f ( x s, p,t, x d m 3 ( f ( x s, p,t, x d... m n ( f ( x s, p,t, x d Field analysis can be expressed as a multidimensional function to extend the determination mode. But for multidimensional analysis results, how to convert multi-outputs to single output, and how to select an appropriate function to achieve dimension reduction processing and simplify the analysis are difficult and important. There are huge differences in massive samples and heterogeneous data, so the final output and determination of multi-analyzer be more difficult and confused. 114 Journal of Multimedia Processing and Technologies Volume 6 Number 4 December 2015

6 So the fuzzy comprehensive evaluation is introduced for multi-analyzer s result to simplify the result by the fuzzy dimension reduction. 3.2 Fuzzy Comprehensive Evaluation Method The basic idea of fuzzy comprehensive evaluation is to consider the various factors associated with objects, and make a reasonable evaluation with the fuzzy linear transformation theory and the maximum membership degree principle. There are m factors related to evaluating objet, it is called as the factors set and denote as below: U = {u 1, u 2,..., u m } There are n comments which are called evaluation set and recorded as below: V = {v 1, v 2,..., v m } Firstly, each factor u i in factors set U is single evaluation factor which determines the membership degree value r ij of the reviews v j and forms a single-actor evaluation set of u i, r i = (r i1, r i2,..., r im, r i μ (V So we can get a fuzzy sets of the reviews set V. Then a fuzzy-value schema mapping function is shown below, f: U F(V, u i f(u i Where f (u i = r ij = (r i1, r i2,..., r in is the comments fuzzy vector on the factors set u i, and r ij is the relationship factor between u j, and v j. Then all the single-factor evaluation factors are integrated together to build a general evaluation of matrix and a fuzzy comprehensive evaluation matrix R from U to V: In other words, fuzzy value function get the relationship from U to V, where matrix., R f is the comprehensive evaluation As the effects of all the factors for evaluation are different, strong or weak, a fuzzy weight factor sets is necessary for matrix U, which is defined below. Where a i is the measure of the impact in the evaluation for the factors u i (i = 1,2,..., m,and A is called importance fuzzy subset on the matrix U, and a i is called importance coefficient for factor u i. Then fuzzy comprehensive evaluation model is given to calculate the fuzzy comprehensive evaluation set. When the factors of importance fuzzy sets A and comprehensive evaluation matrix (fuzzy relations R are known, fuzzy subsets A on evaluation matrix V is got by linear transform from R. Journal of Multimedia Processing and Technologies Volume 6 Number 4 December

7 Where indicates generalized fuzzy and operation, and indicates generalized fuzzy or operation. And B is called as the set of fuzzy comprehensive evaluation set for V, and the formula is known as the comprehensive evaluation model, and denoted as. Finally, according to the maximum principle of membership degree, the maximum largest membership b j in the set (b l, b 2,, b n is the result of comprehensive evaluation, where review element v j is corresponded to b j in fuzzy comprehensive evaluation set. Compared to single-factor ui, the membership factor of evaluation ui for review factor v i is r ij (j = 1, 2,,n, while the results of operations (a i r ij (denoted by r ij * can reflect all the influence factors and obtain more accurate evaluations. That is the fuzzy comprehensive evaluation method s merit. 3.3 Multi-analyzer with Fuzzy Comprehensive Evaluation In order to evaluate the learner s output, during the process of output quantization, fuzzy comprehensive evaluation method can be applied to learning machine for massive heterogeneous marine data schema mapping and take all dynamic, ambiguous, and real-time factors into account. In the schema mapping system with multi-analyzer, fuzzy comprehensive evaluation method will improve automatic map s accuracy and sensitivity. (1 The evaluation factors set of learning model is: U = {U 1, U 2,, U n },the weight set is A ={A 1, A 2,, A n }, and A i is the weight of evaluation factor U i,. And U i ={U i1,u i2,,u n }, n is the number of specific performance factor basing on design factors. The weight set A i ={a i1,a i2,,a in }, where a ij is the weighing of u ij, where.. The performance factors set of the automatic schema mapping learning machine model is shown in Table 1. Objective Level 1 Level 2 evaluation of learner soutput Correlation model evaluation U 1 Correlation model evaluation U 11 Correlation model evaluation U 12 Performance analysis evaluation U 2 Performance analysis evaluation U 21 Performance analysis evaluation U 22 Adaptability analysis evaluation U 3 Adaptability analysis evaluation U 31 Adaptability analysis evaluation U 32 Table 1. Evaluation factor hierarchical table Weight set is determined by the way of matching degree evaluation according to the expert points-scoring system s evaluation for all kind of performance indexes. The single factor performance evaluation of learning model evaluate the membership index of each factor, and a fuzzy schema 116 Journal of Multimedia Processing and Technologies Volume 6 Number 4 December 2015

8 mapping function is given: :U V. For each U i, R i can be shown as the fuzzy matrix: Where r jk represents the membership grade of factors u ij to the k level evaluation V k. The value of r jk can be determined by expert points-scoring system. The number of v i level reviews for u ij is s i, so we get : The membership vector B i of the evaluation set V, B i =A i R i = (b i1, b i2,, b im, is the summary result of performance single factor fuzzy evaluation result on the factor U i. After the single factor evaluation, fuzzy comprehensive evaluation can extend the evaluation to all levels learning indexes and constitute fuzzy matrix B by a single factor B i. And after the fuzzy performance matrix operations to R, the membership vector B is given for factors set U to comment V level: B = A R = (b 1,b 2,,b m If the normalization expression is According to the principle of maximum membership, performance reviews V k corresponded to the maximum membership degree b k in B is the performance level element in model design stage. The decision-maker determines the performance index of the model design According to the critical index threshold. It should be noted that the weight of the model needs to be adjusted for the different learning machine models and learning strategies. So for multi learning strategies weight parameters sets may be loaded dynamically. 4. Experiments The factors and weights of fuzzy comprehensive evaluation can be adjusted according to the actual requirements during the marine heterogeneous data schema mapping process. For example, in the process of multi-source marine tidal data, the evaluation factors and the weights range are shown in Table 2. We compare the schema mapping conversion accuracy and efficiency of 20-group multi-source marine data with single-analyzer and multi-analyzer based on fuzzy comprehensive evaluation. There are eight types conversion information, including location Journal of Multimedia Processing and Technologies Volume 6 Number 4 December

9 Multi strategies learning machine model is the research hotspot of automatic data schema mapping. Because of marine data s heterogeneity and large capacity, the outputs of learner become more difficult to judge and select accurately. In order to evaluate the learner s output more effectively and accurately, this paper presents the concept of multi-analyzer and evaluates multicoordinates, timestamp information, wave heights, tidal time, observation instrument information, data packet length, data backhaul IP addresses, temperature information. Schema mapping error rate is defined as the ratio of errors and the total number of schema mapping after learning process which measures the schema mapping accuracy and model quality. Different machine learning models are built with Matlab to map 100,000 heterogeneity data filed. We can get the following results in the experiment: Objective Level1 Level2 Weights Evaluat-ion of mul Correlation model evaluation Conversion coefficient score of the 15%~20% source marinetidal data s outputs of tide data in time t 1 ~t n learning machin-e output Matching score of same dimensions 10%~15% tidal data (surface data / underwater data conversion output Output distribution probability matching 5%~10% of key data (dynamic location coordinates, time information, wave height information, tide time information High tide, low tide limit spatial 0%~5% matching degree conversion performance Response time(max0min0avg under 5%~10% analysis evaluation typical long term observation Response time(max0min0avg under 5%~10% typical short term observation Response time(max0min0avg under 0%~5% long term observation tidal day Response time(max0min0avg under 5%~10% short time observation tidal day Adaptability analysis evaluation Data fluctuation under long 5%~10% term observation Output fluctuation of same dimensions 5%~10% (surface data / underwater data tidal data Table 2. Tidal data evaluation factors hierarchy partition Y-axis in Figure 5 and Figure 6 is schema mapping error rates. They drop 5% from single analyzer strategy to multi analyzer strategy. And there is 3% average error rate in multi analyzer with fuzzy comprehensive evaluation method. 5 Conclusion 118 Journal of Multimedia Processing and Technologies Volume 6 Number 4 December 2015

10 Figure 5. Comparative analysis strategy of single learner Figure 6. Comparative analysis of multi-learner strategies output with fuzzy comprehensive evaluation method and quantify the various factors of output under the mass and heterogeneous samples. At last, we compare three schema mapping machine methods in the observing data conversion experiments. Experimental results show the accuracy improvement of multi-analyzer significantly. Acknowledgement This work is supported by the National Natural Science Foundation of China (Grant No , and the Shanghai Science and Technology Committee (Project NO Journal of Multimedia Processing and Technologies Volume 6 Number 4 December

11 References [1] Huang Dongmei., Zhang Chi., Du Jipeng. (2012. Integration of massive multi-source heterogeneous space-time data in digital sea, Marine Environmental Science, 31 (001, [2] Michalski, R. S., Carbonell, J. G., Mitchell,T. M. (1985. Machine Learning: An ArtificialIntelligence Approach. Morgan Kaufmann. [3] Domingos, P. (1996. Unifying Instance-Based and Rule-Based Induction. MachineLearning, 24 (2, [4] Freitag, D. (2000. Machine Learning for Information Extraction in Informal Domains. Machine Learning, 39 (2, [5] Chan, P. K., Stolfo, S. J. (1993. Experiments on multistrategy learning by meta-learning. In : Proceedings of the second international conference on Information and knowledge management, [6] Chang, Y. C., Dai, H. J., Wu, J.C.Y. (2013. et al. TEMPTing System: a hybrid method of rule and machine learning for temporal relation extraction in patient discharge summaries, Journal of BiomedicalInformatics. [7] Dredze, M., Kulesza, A., Crammer, K. (2010. Multi-domain learning by confidence-weighted parameter combination, Machine Learning,79 (1-2, [8] Talbot, J., Lee, B., Kapoor, A. (2009. et al. EnsembleMatrix: interactive visualization to support machine learning with multiple classifiers, In : Proceedings of the 27 th international conference on Human factors in computing systems. ACM, [9] Freund, Y.,Schapire, R.E.(1996. Experiments with a New Boosting Algorithm. MACHINE LEARNING-INTERNATIONAL WORKSHOP THENCONFERENCE, ] Wolpert,D.H. (1992. Stacked generalization, Neural Networks, 5 (2, [11] Breiman, L. (1996. Bagging Predictors. Machine Learning, 24 (2, Journal of Multimedia Processing and Technologies Volume 6 Number 4 December 2015

Research Article A Multianalyzer Machine Learning Model for Marine Heterogeneous Data Schema Mapping

Research Article A Multianalyzer Machine Learning Model for Marine Heterogeneous Data Schema Mapping e Scientific World Journal, Article ID 248467, 8 pages http://dxdoiorg/101155/2014/248467 Research Article A Multianalyzer Machine Learning Model for Marine Heterogeneous Data Schema Mapping Wang Yan,

More information

Temperature Calculation of Pellet Rotary Kiln Based on Texture

Temperature Calculation of Pellet Rotary Kiln Based on Texture Intelligent Control and Automation, 2017, 8, 67-74 http://www.scirp.org/journal/ica ISSN Online: 2153-0661 ISSN Print: 2153-0653 Temperature Calculation of Pellet Rotary Kiln Based on Texture Chunli Lin,

More information

Research on Design and Application of Computer Database Quality Evaluation Model

Research on Design and Application of Computer Database Quality Evaluation Model Research on Design and Application of Computer Database Quality Evaluation Model Abstract Hong Li, Hui Ge Shihezi Radio and TV University, Shihezi 832000, China Computer data quality evaluation is the

More information

FSRM Feedback Algorithm based on Learning Theory

FSRM Feedback Algorithm based on Learning Theory Send Orders for Reprints to reprints@benthamscience.ae The Open Cybernetics & Systemics Journal, 2015, 9, 699-703 699 FSRM Feedback Algorithm based on Learning Theory Open Access Zhang Shui-Li *, Dong

More information

Research on Mining Cloud Data Based on Correlation Dimension Feature

Research on Mining Cloud Data Based on Correlation Dimension Feature 2016 4 th International Conference on Advances in Social Science, Humanities, and Management (ASSHM 2016) ISBN: 978-1-60595-412-7 Research on Mining Cloud Data Based on Correlation Dimension Feature Jingwen

More information

CHAPTER 3 MAINTENANCE STRATEGY SELECTION USING AHP AND FAHP

CHAPTER 3 MAINTENANCE STRATEGY SELECTION USING AHP AND FAHP 31 CHAPTER 3 MAINTENANCE STRATEGY SELECTION USING AHP AND FAHP 3.1 INTRODUCTION Evaluation of maintenance strategies is a complex task. The typical factors that influence the selection of maintenance strategy

More information

STUDYING OF CLASSIFYING CHINESE SMS MESSAGES

STUDYING OF CLASSIFYING CHINESE SMS MESSAGES STUDYING OF CLASSIFYING CHINESE SMS MESSAGES BASED ON BAYESIAN CLASSIFICATION 1 LI FENG, 2 LI JIGANG 1,2 Computer Science Department, DongHua University, Shanghai, China E-mail: 1 Lifeng@dhu.edu.cn, 2

More information

Wavelength Estimation Method Based on Radon Transform and Image Texture

Wavelength Estimation Method Based on Radon Transform and Image Texture Journal of Shipping and Ocean Engineering 7 (2017) 186-191 doi 10.17265/2159-5879/2017.05.002 D DAVID PUBLISHING Wavelength Estimation Method Based on Radon Transform and Image Texture LU Ying, ZHUANG

More information

Improving the Efficiency of Fast Using Semantic Similarity Algorithm

Improving the Efficiency of Fast Using Semantic Similarity Algorithm International Journal of Scientific and Research Publications, Volume 4, Issue 1, January 2014 1 Improving the Efficiency of Fast Using Semantic Similarity Algorithm D.KARTHIKA 1, S. DIVAKAR 2 Final year

More information

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a International Conference on Education Technology, Management and Humanities Science (ETMHS 2015) Research on Applications of Data Mining in Electronic Commerce Xiuping YANG 1, a 1 Computer Science Department,

More information

FUZZY C-MEANS ALGORITHM BASED ON PRETREATMENT OF SIMILARITY RELATIONTP

FUZZY C-MEANS ALGORITHM BASED ON PRETREATMENT OF SIMILARITY RELATIONTP Dynamics of Continuous, Discrete and Impulsive Systems Series B: Applications & Algorithms 14 (2007) 103-111 Copyright c 2007 Watam Press FUZZY C-MEANS ALGORITHM BASED ON PRETREATMENT OF SIMILARITY RELATIONTP

More information

The Comparative Study of Machine Learning Algorithms in Text Data Classification*

The Comparative Study of Machine Learning Algorithms in Text Data Classification* The Comparative Study of Machine Learning Algorithms in Text Data Classification* Wang Xin School of Science, Beijing Information Science and Technology University Beijing, China Abstract Classification

More information

Quality Assessment of Power Dispatching Data Based on Improved Cloud Model

Quality Assessment of Power Dispatching Data Based on Improved Cloud Model Quality Assessment of Power Dispatching Based on Improved Cloud Model Zhaoyang Qu, Shaohua Zhou *. School of Information Engineering, Northeast Electric Power University, Jilin, China Abstract. This paper

More information

Remote Monitoring System of Ship Running State under Wireless Network

Remote Monitoring System of Ship Running State under Wireless Network Journal of Shipping and Ocean Engineering 7 (2017) 181-185 doi 10.17265/2159-5879/2017.05.001 D DAVID PUBLISHING Remote Monitoring System of Ship Running State under Wireless Network LI Ning Department

More information

Performance Degradation Assessment and Fault Diagnosis of Bearing Based on EMD and PCA-SOM

Performance Degradation Assessment and Fault Diagnosis of Bearing Based on EMD and PCA-SOM Performance Degradation Assessment and Fault Diagnosis of Bearing Based on EMD and PCA-SOM Lu Chen and Yuan Hang PERFORMANCE DEGRADATION ASSESSMENT AND FAULT DIAGNOSIS OF BEARING BASED ON EMD AND PCA-SOM.

More information

RETRACTED ARTICLE. Web-Based Data Mining in System Design and Implementation. Open Access. Jianhu Gong 1* and Jianzhi Gong 2

RETRACTED ARTICLE. Web-Based Data Mining in System Design and Implementation. Open Access. Jianhu Gong 1* and Jianzhi Gong 2 Send Orders for Reprints to reprints@benthamscience.ae The Open Automation and Control Systems Journal, 2014, 6, 1907-1911 1907 Web-Based Data Mining in System Design and Implementation Open Access Jianhu

More information

Privacy-Preserving of Check-in Services in MSNS Based on a Bit Matrix

Privacy-Preserving of Check-in Services in MSNS Based on a Bit Matrix BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 15, No 2 Sofia 2015 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.1515/cait-2015-0032 Privacy-Preserving of Check-in

More information

Open Access Research on the Data Pre-Processing in the Network Abnormal Intrusion Detection

Open Access Research on the Data Pre-Processing in the Network Abnormal Intrusion Detection Send Orders for Reprints to reprints@benthamscience.ae 1228 The Open Automation and Control Systems Journal, 2014, 6, 1228-1232 Open Access Research on the Data Pre-Processing in the Network Abnormal Intrusion

More information

Using Text Learning to help Web browsing

Using Text Learning to help Web browsing Using Text Learning to help Web browsing Dunja Mladenić J.Stefan Institute, Ljubljana, Slovenia Carnegie Mellon University, Pittsburgh, PA, USA Dunja.Mladenic@{ijs.si, cs.cmu.edu} Abstract Web browsing

More information

Multi-Source Spatial Data Distribution Model and System Implementation

Multi-Source Spatial Data Distribution Model and System Implementation Communications and Network, 2013, 5, 93-98 http://dx.doi.org/10.4236/cn.2013.51009 Published Online February 2013 (http://www.scirp.org/journal/cn) Multi-Source Spatial Data Distribution Model and System

More information

Image retrieval based on bag of images

Image retrieval based on bag of images University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2009 Image retrieval based on bag of images Jun Zhang University of Wollongong

More information

Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing Environments

Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing Environments Send Orders for Reprints to reprints@benthamscience.ae 368 The Open Automation and Control Systems Journal, 2014, 6, 368-373 Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing

More information

A Modular k-nearest Neighbor Classification Method for Massively Parallel Text Categorization

A Modular k-nearest Neighbor Classification Method for Massively Parallel Text Categorization A Modular k-nearest Neighbor Classification Method for Massively Parallel Text Categorization Hai Zhao and Bao-Liang Lu Department of Computer Science and Engineering, Shanghai Jiao Tong University, 1954

More information

Yunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace

Yunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace [Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 20 BioTechnology 2014 An Indian Journal FULL PAPER BTAIJ, 10(20), 2014 [12526-12531] Exploration on the data mining system construction

More information

Semi-supervised learning and active learning

Semi-supervised learning and active learning Semi-supervised learning and active learning Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Combining classifiers Ensemble learning: a machine learning paradigm where multiple learners

More information

Video Inter-frame Forgery Identification Based on Optical Flow Consistency

Video Inter-frame Forgery Identification Based on Optical Flow Consistency Sensors & Transducers 24 by IFSA Publishing, S. L. http://www.sensorsportal.com Video Inter-frame Forgery Identification Based on Optical Flow Consistency Qi Wang, Zhaohong Li, Zhenzhen Zhang, Qinglong

More information

Feature-Guided K-Means Algorithm for Optimal Image Vector Quantizer Design

Feature-Guided K-Means Algorithm for Optimal Image Vector Quantizer Design Journal of Information Hiding and Multimedia Signal Processing c 2017 ISSN 2073-4212 Ubiquitous International Volume 8, Number 6, November 2017 Feature-Guided K-Means Algorithm for Optimal Image Vector

More information

An Abnormal Data Detection Method Based on the Temporal-spatial Correlation in Wireless Sensor Networks

An Abnormal Data Detection Method Based on the Temporal-spatial Correlation in Wireless Sensor Networks An Based on the Temporal-spatial Correlation in Wireless Sensor Networks 1 Department of Computer Science & Technology, Harbin Institute of Technology at Weihai,Weihai, 264209, China E-mail: Liuyang322@hit.edu.cn

More information

Accuracy of Matching between Probe-Vehicle and GIS Map

Accuracy of Matching between Probe-Vehicle and GIS Map Proceedings of the 8th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences Shanghai, P. R. China, June 25-27, 2008, pp. 59-64 Accuracy of Matching between

More information

Research on Evaluation Method of Product Style Semantics Based on Neural Network

Research on Evaluation Method of Product Style Semantics Based on Neural Network Research Journal of Applied Sciences, Engineering and Technology 6(23): 4330-4335, 2013 ISSN: 2040-7459; e-issn: 2040-7467 Maxwell Scientific Organization, 2013 Submitted: September 28, 2012 Accepted:

More information

Evaluation of Meta-Search Engine Merge Algorithms

Evaluation of Meta-Search Engine Merge Algorithms 2008 International Conference on Internet Computing in Science and Engineering Evaluation of Meta-Search Engine Merge Algorithms Chunshuang Liu, Zhiqiang Zhang,2, Xiaoqin Xie 2, TingTing Liang School of

More information

EFFICIENT ATTRIBUTE REDUCTION ALGORITHM

EFFICIENT ATTRIBUTE REDUCTION ALGORITHM EFFICIENT ATTRIBUTE REDUCTION ALGORITHM Zhongzhi Shi, Shaohui Liu, Zheng Zheng Institute Of Computing Technology,Chinese Academy of Sciences, Beijing, China Abstract: Key words: Efficiency of algorithms

More information

Design of student information system based on association algorithm and data mining technology. CaiYan, ChenHua

Design of student information system based on association algorithm and data mining technology. CaiYan, ChenHua 5th International Conference on Mechatronics, Materials, Chemistry and Computer Engineering (ICMMCCE 2017) Design of student information system based on association algorithm and data mining technology

More information

Two Algorithms of Image Segmentation and Measurement Method of Particle s Parameters

Two Algorithms of Image Segmentation and Measurement Method of Particle s Parameters Appl. Math. Inf. Sci. 6 No. 1S pp. 105S-109S (2012) Applied Mathematics & Information Sciences An International Journal @ 2012 NSP Natural Sciences Publishing Cor. Two Algorithms of Image Segmentation

More information

An adaptive container code character segmentation algorithm Yajie Zhu1, a, Chenglong Liang2, b

An adaptive container code character segmentation algorithm Yajie Zhu1, a, Chenglong Liang2, b 6th International Conference on Machinery, Materials, Environment, Biotechnology and Computer (MMEBC 2016) An adaptive container code character segmentation algorithm Yajie Zhu1, a, Chenglong Liang2, b

More information

Data Mining Technology Based on Bayesian Network Structure Applied in Learning

Data Mining Technology Based on Bayesian Network Structure Applied in Learning , pp.67-71 http://dx.doi.org/10.14257/astl.2016.137.12 Data Mining Technology Based on Bayesian Network Structure Applied in Learning Chunhua Wang, Dong Han College of Information Engineering, Huanghuai

More information

2 Ontology evolution algorithm based on web-pages and users behavior logs

2 Ontology evolution algorithm based on web-pages and users behavior logs ISSN 1749-3889 (print), 1749-3897 (online) International Journal of Nonlinear Science Vol.18(2014) No.1,pp.86-91 Ontology Evolution Algorithm for Topic Information Collection Jing Ma 1, Mengyong Sun 1,

More information

Boosting Algorithms for Parallel and Distributed Learning

Boosting Algorithms for Parallel and Distributed Learning Distributed and Parallel Databases, 11, 203 229, 2002 c 2002 Kluwer Academic Publishers. Manufactured in The Netherlands. Boosting Algorithms for Parallel and Distributed Learning ALEKSANDAR LAZAREVIC

More information

Research on Remote Sensing Image Template Processing Based on Global Subdivision Theory

Research on Remote Sensing Image Template Processing Based on Global Subdivision Theory www.ijcsi.org 388 Research on Remote Sensing Image Template Processing Based on Global Subdivision Theory Xiong Delan 1, Du Genyuan 1 1 International School of Education, Xuchang University Xuchang, Henan,

More information

RESEARCH ON INTER-SATELLITE COMMUNICATION SYSTEM OF FRACTIONATED SPACECRAFT

RESEARCH ON INTER-SATELLITE COMMUNICATION SYSTEM OF FRACTIONATED SPACECRAFT RESEARCH ON INTER-SATELLITE COMMUNICATION SYSTEM OF FRACTIONATED SPACECRAFT Qing Chen (1), Jinxiu Zhang (2), Shengchang Lan (3) and Zhigang Zhang (4) (1)(2)(4) Research Center of Satellite Technology,

More information

ZigBee Routing Algorithm Based on Energy Optimization

ZigBee Routing Algorithm Based on Energy Optimization Sensors & Transducers 2013 by IFSA http://www.sensorsportal.com ZigBee Routing Algorithm Based on Energy Optimization Wangang Wang, Yong Peng, Yongyu Peng Chongqing City Management College, No. 151 Daxuecheng

More information

Background Motion Video Tracking of the Memory Watershed Disc Gradient Expansion Template

Background Motion Video Tracking of the Memory Watershed Disc Gradient Expansion Template , pp.26-31 http://dx.doi.org/10.14257/astl.2016.137.05 Background Motion Video Tracking of the Memory Watershed Disc Gradient Expansion Template Yao Nan 1, Shen Haiping 2 1 Department of Jiangsu Electric

More information

Research on Construction of Road Network Database Based on Video Retrieval Technology

Research on Construction of Road Network Database Based on Video Retrieval Technology Research on Construction of Road Network Database Based on Video Retrieval Technology Fengling Wang 1 1 Hezhou University, School of Mathematics and Computer Hezhou Guangxi 542899, China Abstract. Based

More information

Spatial Variation of Sea-Level Sea level reconstruction

Spatial Variation of Sea-Level Sea level reconstruction Spatial Variation of Sea-Level Sea level reconstruction Biao Chang Multimedia Environmental Simulation Laboratory School of Civil and Environmental Engineering Georgia Institute of Technology Advisor:

More information

Nodes Energy Conserving Algorithms to prevent Partitioning in Wireless Sensor Networks

Nodes Energy Conserving Algorithms to prevent Partitioning in Wireless Sensor Networks IJCSNS International Journal of Computer Science and Network Security, VOL.17 No.9, September 2017 139 Nodes Energy Conserving Algorithms to prevent Partitioning in Wireless Sensor Networks MINA MAHDAVI

More information

Improving Suffix Tree Clustering Algorithm for Web Documents

Improving Suffix Tree Clustering Algorithm for Web Documents International Conference on Logistics Engineering, Management and Computer Science (LEMCS 2015) Improving Suffix Tree Clustering Algorithm for Web Documents Yan Zhuang Computer Center East China Normal

More information

ON THEORY OF INTUITIONISTIC FUZZY SETS (OR VAGUE SETS)

ON THEORY OF INTUITIONISTIC FUZZY SETS (OR VAGUE SETS) International Journal of Fuzzy Systems On Theory and Rough of Intuitionistic Systems Fuzzy Sets (Or Vague Sets) 113 4(2), December 2011, pp. 113-117, Serials Publications, ISSN: 0974-858X ON THEORY OF

More information

The application of OLAP and Data mining technology in the analysis of. book lending

The application of OLAP and Data mining technology in the analysis of. book lending 2nd International Conference on Automation, Mechanical Control and Computational Engineering (AMCCE 2017) The application of OLAP and Data mining technology in the analysis of book lending Xiao-Han Zhou1,a,

More information

CLUSTER ANALYSIS. V. K. Bhatia I.A.S.R.I., Library Avenue, New Delhi

CLUSTER ANALYSIS. V. K. Bhatia I.A.S.R.I., Library Avenue, New Delhi CLUSTER ANALYSIS V. K. Bhatia I.A.S.R.I., Library Avenue, New Delhi-110 012 In multivariate situation, the primary interest of the experimenter is to examine and understand the relationship amongst the

More information

Research on Transmission Based on Collaboration Coding in WSNs

Research on Transmission Based on Collaboration Coding in WSNs Research on Transmission Based on Collaboration Coding in WSNs LV Xiao-xing, ZHANG Bai-hai School of Automation Beijing Institute of Technology Beijing 8, China lvxx@mail.btvu.org Journal of Digital Information

More information

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining. About the Tutorial Data Mining is defined as the procedure of extracting information from huge sets of data. In other words, we can say that data mining is mining knowledge from data. The tutorial starts

More information

Intelligent management of on-line video learning resources supported by Web-mining technology based on the practical application of VOD

Intelligent management of on-line video learning resources supported by Web-mining technology based on the practical application of VOD World Transactions on Engineering and Technology Education Vol.13, No.3, 2015 2015 WIETE Intelligent management of on-line video learning resources supported by Web-mining technology based on the practical

More information

Tag Based Image Search by Social Re-ranking

Tag Based Image Search by Social Re-ranking Tag Based Image Search by Social Re-ranking Vilas Dilip Mane, Prof.Nilesh P. Sable Student, Department of Computer Engineering, Imperial College of Engineering & Research, Wagholi, Pune, Savitribai Phule

More information

Research on Distributed Video Compression Coding Algorithm for Wireless Sensor Networks

Research on Distributed Video Compression Coding Algorithm for Wireless Sensor Networks Sensors & Transducers 203 by IFSA http://www.sensorsportal.com Research on Distributed Video Compression Coding Algorithm for Wireless Sensor Networks, 2 HU Linna, 2 CAO Ning, 3 SUN Yu Department of Dianguang,

More information

INFORMATION RETRIEVAL SYSTEM USING FUZZY SET THEORY - THE BASIC CONCEPT

INFORMATION RETRIEVAL SYSTEM USING FUZZY SET THEORY - THE BASIC CONCEPT ABSTRACT INFORMATION RETRIEVAL SYSTEM USING FUZZY SET THEORY - THE BASIC CONCEPT BHASKAR KARN Assistant Professor Department of MIS Birla Institute of Technology Mesra, Ranchi The paper presents the basic

More information

A Fuzzy C-means Clustering Algorithm Based on Pseudo-nearest-neighbor Intervals for Incomplete Data

A Fuzzy C-means Clustering Algorithm Based on Pseudo-nearest-neighbor Intervals for Incomplete Data Journal of Computational Information Systems 11: 6 (2015) 2139 2146 Available at http://www.jofcis.com A Fuzzy C-means Clustering Algorithm Based on Pseudo-nearest-neighbor Intervals for Incomplete Data

More information

Fuzzy Partitioning with FID3.1

Fuzzy Partitioning with FID3.1 Fuzzy Partitioning with FID3.1 Cezary Z. Janikow Dept. of Mathematics and Computer Science University of Missouri St. Louis St. Louis, Missouri 63121 janikow@umsl.edu Maciej Fajfer Institute of Computing

More information

Tendency Mining in Dynamic Association Rules Based on SVM Classifier

Tendency Mining in Dynamic Association Rules Based on SVM Classifier Send Orders for Reprints to reprints@benthamscienceae The Open Mechanical Engineering Journal, 2014, 8, 303-307 303 Open Access Tendency Mining in Dynamic Association Rules Based on SVM Classifier Zhonglin

More information

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Murhaf Fares & Stephan Oepen Language Technology Group (LTG) September 27, 2017 Today 2 Recap Evaluation of classifiers Unsupervised

More information

Information Retrieval. Information Retrieval and Web Search

Information Retrieval. Information Retrieval and Web Search Information Retrieval and Web Search Introduction to IR models and methods Information Retrieval The indexing and retrieval of textual documents. Searching for pages on the World Wide Web is the most recent

More information

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating Dipak J Kakade, Nilesh P Sable Department of Computer Engineering, JSPM S Imperial College of Engg. And Research,

More information

Automatic Domain Partitioning for Multi-Domain Learning

Automatic Domain Partitioning for Multi-Domain Learning Automatic Domain Partitioning for Multi-Domain Learning Di Wang diwang@cs.cmu.edu Chenyan Xiong cx@cs.cmu.edu William Yang Wang ww@cmu.edu Abstract Multi-Domain learning (MDL) assumes that the domain labels

More information

CSC411 Fall 2014 Machine Learning & Data Mining. Ensemble Methods. Slides by Rich Zemel

CSC411 Fall 2014 Machine Learning & Data Mining. Ensemble Methods. Slides by Rich Zemel CSC411 Fall 2014 Machine Learning & Data Mining Ensemble Methods Slides by Rich Zemel Ensemble methods Typical application: classi.ication Ensemble of classi.iers is a set of classi.iers whose individual

More information

Creating Time-Varying Fuzzy Control Rules Based on Data Mining

Creating Time-Varying Fuzzy Control Rules Based on Data Mining Research Journal of Applied Sciences, Engineering and Technology 4(18): 3533-3538, 01 ISSN: 040-7467 Maxwell Scientific Organization, 01 Submitted: April 16, 01 Accepted: May 18, 01 Published: September

More information

Restricted Nearest Feature Line with Ellipse for Face Recognition

Restricted Nearest Feature Line with Ellipse for Face Recognition Journal of Information Hiding and Multimedia Signal Processing c 2012 ISSN 2073-4212 Ubiquitous International Volume 3, Number 3, July 2012 Restricted Nearest Feature Line with Ellipse for Face Recognition

More information

Selection of Best Web Site by Applying COPRAS-G method Bindu Madhuri.Ch #1, Anand Chandulal.J #2, Padmaja.M #3

Selection of Best Web Site by Applying COPRAS-G method Bindu Madhuri.Ch #1, Anand Chandulal.J #2, Padmaja.M #3 Selection of Best Web Site by Applying COPRAS-G method Bindu Madhuri.Ch #1, Anand Chandulal.J #2, Padmaja.M #3 Department of Computer Science & Engineering, Gitam University, INDIA 1. binducheekati@gmail.com,

More information

Research on Social Relationship Network System based on MongoDB

Research on Social Relationship Network System based on MongoDB Research on Social Relationship Network System based on MongoDB Yingyan Long School of Educational Sciences Shaanxi University of Technology Han Zhong, Shaanxi, China Abstract The relationship between

More information

Study on A Recommendation Algorithm of Crossing Ranking in E- commerce

Study on A Recommendation Algorithm of Crossing Ranking in E- commerce International Journal of u-and e-service, Science and Technology, pp.53-62 http://dx.doi.org/10.14257/ijunnesst2014.7.4.6 Study on A Recommendation Algorithm of Crossing Ranking in E- commerce Duan Xueying

More information

A New Approach for Automatic Thesaurus Construction and Query Expansion for Document Retrieval

A New Approach for Automatic Thesaurus Construction and Query Expansion for Document Retrieval Information and Management Sciences Volume 18, Number 4, pp. 299-315, 2007 A New Approach for Automatic Thesaurus Construction and Query Expansion for Document Retrieval Liang-Yu Chen National Taiwan University

More information

Top-k Keyword Search Over Graphs Based On Backward Search

Top-k Keyword Search Over Graphs Based On Backward Search Top-k Keyword Search Over Graphs Based On Backward Search Jia-Hui Zeng, Jiu-Ming Huang, Shu-Qiang Yang 1College of Computer National University of Defense Technology, Changsha, China 2College of Computer

More information

HFCT: A Hybrid Fuzzy Clustering Method for Collaborative Tagging

HFCT: A Hybrid Fuzzy Clustering Method for Collaborative Tagging 007 International Conference on Convergence Information Technology HFCT: A Hybrid Fuzzy Clustering Method for Collaborative Tagging Lixin Han,, Guihai Chen Department of Computer Science and Engineering,

More information

A Game Map Complexity Measure Based on Hamming Distance Yan Li, Pan Su, and Wenliang Li

A Game Map Complexity Measure Based on Hamming Distance Yan Li, Pan Su, and Wenliang Li Physics Procedia 22 (2011) 634 640 2011 International Conference on Physics Science and Technology (ICPST 2011) A Game Map Complexity Measure Based on Hamming Distance Yan Li, Pan Su, and Wenliang Li Collage

More information

A Bagging Method using Decision Trees in the Role of Base Classifiers

A Bagging Method using Decision Trees in the Role of Base Classifiers A Bagging Method using Decision Trees in the Role of Base Classifiers Kristína Machová 1, František Barčák 2, Peter Bednár 3 1 Department of Cybernetics and Artificial Intelligence, Technical University,

More information

European Network on New Sensing Technologies for Air Pollution Control and Environmental Sustainability - EuNetAir COST Action TD1105

European Network on New Sensing Technologies for Air Pollution Control and Environmental Sustainability - EuNetAir COST Action TD1105 European Network on New Sensing Technologies for Air Pollution Control and Environmental Sustainability - EuNetAir COST Action TD1105 A Holistic Approach in the Development and Deployment of WSN-based

More information

Cluster Validity Classification Approaches Based on Geometric Probability and Application in the Classification of Remotely Sensed Images

Cluster Validity Classification Approaches Based on Geometric Probability and Application in the Classification of Remotely Sensed Images Sensors & Transducers 04 by IFSA Publishing, S. L. http://www.sensorsportal.com Cluster Validity ification Approaches Based on Geometric Probability and Application in the ification of Remotely Sensed

More information

HIGH RESOLUTION REMOTE SENSING IMAGE SEGMENTATION BASED ON GRAPH THEORY AND FRACTAL NET EVOLUTION APPROACH

HIGH RESOLUTION REMOTE SENSING IMAGE SEGMENTATION BASED ON GRAPH THEORY AND FRACTAL NET EVOLUTION APPROACH HIGH RESOLUTION REMOTE SENSING IMAGE SEGMENTATION BASED ON GRAPH THEORY AND FRACTAL NET EVOLUTION APPROACH Yi Yang, Haitao Li, Yanshun Han, Haiyan Gu Key Laboratory of Geo-informatics of State Bureau of

More information

The Design and Application of GIS Mathematical Model Database System with Meta-algorithm Li-Zhijiang

The Design and Application of GIS Mathematical Model Database System with Meta-algorithm Li-Zhijiang 4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015) The Design and Application of GIS Mathematical Model Database System with Meta-algorithm Li-Zhijiang Yishui College,

More information

IMPROVING THE RELEVANCY OF DOCUMENT SEARCH USING THE MULTI-TERM ADJACENCY KEYWORD-ORDER MODEL

IMPROVING THE RELEVANCY OF DOCUMENT SEARCH USING THE MULTI-TERM ADJACENCY KEYWORD-ORDER MODEL IMPROVING THE RELEVANCY OF DOCUMENT SEARCH USING THE MULTI-TERM ADJACENCY KEYWORD-ORDER MODEL Lim Bee Huang 1, Vimala Balakrishnan 2, Ram Gopal Raj 3 1,2 Department of Information System, 3 Department

More information

OPTIMIZATION OF BAGGING CLASSIFIERS BASED ON SBCB ALGORITHM

OPTIMIZATION OF BAGGING CLASSIFIERS BASED ON SBCB ALGORITHM OPTIMIZATION OF BAGGING CLASSIFIERS BASED ON SBCB ALGORITHM XIAO-DONG ZENG, SAM CHAO, FAI WONG Faculty of Science and Technology, University of Macau, Macau, China E-MAIL: ma96506@umac.mo, lidiasc@umac.mo,

More information

Alberto Messina, Maurizio Montagnuolo

Alberto Messina, Maurizio Montagnuolo A Generalised Cross-Modal Clustering Method Applied to Multimedia News Semantic Indexing and Retrieval Alberto Messina, Maurizio Montagnuolo RAI Centre for Research and Technological Innovation Madrid,

More information

IMPROVING INFORMATION RETRIEVAL BASED ON QUERY CLASSIFICATION ALGORITHM

IMPROVING INFORMATION RETRIEVAL BASED ON QUERY CLASSIFICATION ALGORITHM IMPROVING INFORMATION RETRIEVAL BASED ON QUERY CLASSIFICATION ALGORITHM Myomyo Thannaing 1, Ayenandar Hlaing 2 1,2 University of Technology (Yadanarpon Cyber City), near Pyin Oo Lwin, Myanmar ABSTRACT

More information

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering Erik Velldal University of Oslo Sept. 18, 2012 Topics for today 2 Classification Recap Evaluating classifiers Accuracy, precision,

More information

Keywords Clustering, Goals of clustering, clustering techniques, clustering algorithms.

Keywords Clustering, Goals of clustering, clustering techniques, clustering algorithms. Volume 3, Issue 5, May 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Survey of Clustering

More information

A Novel Image Classification Model Based on Contourlet Transform and Dynamic Fuzzy Graph Cuts

A Novel Image Classification Model Based on Contourlet Transform and Dynamic Fuzzy Graph Cuts Appl. Math. Inf. Sci. 6 No. 1S pp. 93S-97S (2012) Applied Mathematics & Information Sciences An International Journal @ 2012 NSP Natural Sciences Publishing Cor. A Novel Image Classification Model Based

More information

Machine Learning Techniques for Data Mining

Machine Learning Techniques for Data Mining Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already

More information

Cost-sensitive C4.5 with post-pruning and competition

Cost-sensitive C4.5 with post-pruning and competition Cost-sensitive C4.5 with post-pruning and competition Zilong Xu, Fan Min, William Zhu Lab of Granular Computing, Zhangzhou Normal University, Zhangzhou 363, China Abstract Decision tree is an effective

More information

A Proposed Model For Forecasting Stock Markets Based On Clustering Algorithm And Fuzzy Time Series

A Proposed Model For Forecasting Stock Markets Based On Clustering Algorithm And Fuzzy Time Series Journal of Multidisciplinary Engineering Science Studies (JMESS) A Proposed Model For Forecasting Stock Markets Based On Clustering Algorithm And Fuzzy Time Series Nghiem Van Tinh Thai Nguyen University

More information

manufacturing process.

manufacturing process. Send Orders for Reprints to reprints@benthamscience.ae The Open Automation and Control Systems Journal, 2014, 6, 203-207 203 Open Access Identifying Method for Key Quality Characteristics in Series-Parallel

More information

TEVI: Text Extraction for Video Indexing

TEVI: Text Extraction for Video Indexing TEVI: Text Extraction for Video Indexing Hichem KARRAY, Mohamed SALAH, Adel M. ALIMI REGIM: Research Group on Intelligent Machines, EIS, University of Sfax, Tunisia hichem.karray@ieee.org mohamed_salah@laposte.net

More information

A New Distance Independent Localization Algorithm in Wireless Sensor Network

A New Distance Independent Localization Algorithm in Wireless Sensor Network A New Distance Independent Localization Algorithm in Wireless Sensor Network Siwei Peng 1, Jihui Li 2, Hui Liu 3 1 School of Information Science and Engineering, Yanshan University, Qinhuangdao 2 The Key

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-

More information

Integration of Fuzzy Shannon s Entropy with fuzzy TOPSIS for industrial robotic system selection

Integration of Fuzzy Shannon s Entropy with fuzzy TOPSIS for industrial robotic system selection JIEM, 2012 5(1):102-114 Online ISSN: 2013-0953 Print ISSN: 2013-8423 http://dx.doi.org/10.3926/jiem.397 Integration of Fuzzy Shannon s Entropy with fuzzy TOPSIS for industrial robotic system selection

More information

EFFICIENT ADAPTIVE PREPROCESSING WITH DIMENSIONALITY REDUCTION FOR STREAMING DATA

EFFICIENT ADAPTIVE PREPROCESSING WITH DIMENSIONALITY REDUCTION FOR STREAMING DATA INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 EFFICIENT ADAPTIVE PREPROCESSING WITH DIMENSIONALITY REDUCTION FOR STREAMING DATA Saranya Vani.M 1, Dr. S. Uma 2,

More information

A Novel Method for Activity Place Sensing Based on Behavior Pattern Mining Using Crowdsourcing Trajectory Data

A Novel Method for Activity Place Sensing Based on Behavior Pattern Mining Using Crowdsourcing Trajectory Data A Novel Method for Activity Place Sensing Based on Behavior Pattern Mining Using Crowdsourcing Trajectory Data Wei Yang 1, Tinghua Ai 1, Wei Lu 1, Tong Zhang 2 1 School of Resource and Environment Sciences,

More information

Reading group on Ontologies and NLP:

Reading group on Ontologies and NLP: Reading group on Ontologies and NLP: Machine Learning27th infebruary Automated 2014 1 / 25 Te Reading group on Ontologies and NLP: Machine Learning in Automated Text Categorization, by Fabrizio Sebastianini.

More information

Conceptual Review of clustering techniques in data mining field

Conceptual Review of clustering techniques in data mining field Conceptual Review of clustering techniques in data mining field Divya Shree ABSTRACT The marvelous amount of data produced nowadays in various application domains such as molecular biology or geography

More information

Information modeling and reengineering for product development process

Information modeling and reengineering for product development process ISSN 1 746-7233, England, UK International Journal of Management Science and Engineering Management Vol. 2 (2007) No. 1, pp. 64-74 Information modeling and reengineering for product development process

More information

An Unsupervised Approach for Combining Scores of Outlier Detection Techniques, Based on Similarity Measures

An Unsupervised Approach for Combining Scores of Outlier Detection Techniques, Based on Similarity Measures An Unsupervised Approach for Combining Scores of Outlier Detection Techniques, Based on Similarity Measures José Ramón Pasillas-Díaz, Sylvie Ratté Presenter: Christoforos Leventis 1 Basic concepts Outlier

More information

Research on-board LIDAR point cloud data pretreatment

Research on-board LIDAR point cloud data pretreatment Acta Technica 62, No. 3B/2017, 1 16 c 2017 Institute of Thermomechanics CAS, v.v.i. Research on-board LIDAR point cloud data pretreatment Peng Cang 1, Zhenglin Yu 1, Bo Yu 2, 3 Abstract. In view of the

More information