Study of Attribute Word Extraction in Chinese Product Reviews

Size: px
Start display at page:

Download "Study of Attribute Word Extraction in Chinese Product Reviews"

Transcription

1 The 5th Annual IEEE International Conference on Cyber Technology in Automation, Control and Intelligent Systems June 8-12, 2015, Shenyang, China Study of Extraction in Chinese Product Reviews Ma Rui University of Chinese Academy of Sciences Wuxi CAS Ubiquitous Technology R&D Center CO., LTD Beijing, China Pan Fucheng, Zhou Xiaofeng Shenyang Institute of Automation Chinese Academy of Sciences Shenyang, China Abstract As e-commerce develops quickly and online product reviews dramatically increases, extraction of attribute words in the product review is of significance. In this paper, authors presented an efficient extraction method, with categorization by synonyms, based on ontology library, of product reviews attribute words under the big data environments. The experiment validated the approach and it improves the extraction efficiency of attribute words. Keywords ontology; text mining; product reviews; attribute word extraction I. INTRODUCTION With the rise of the Internet industry, businesses involved in the e-commerce have increased every year, while consumers increasingly incline to online shopping. From 2011 to 2014, China's B2C e-commerce sales grew from $5.537 billion to $11.74 billion [1]. Only on November 11th, 2014, Alibaba China e-commerce platform achieved more than 6.8 million orders [2]. The spring up of shopping online generates massive product reviews that help consumers get quality and service information of some online goods, and help businesses and producers benefit from the feedbacks. But the amount of produce reviews is of such a mass that consumers and businesses is difficult to read one by one. There is a need to extract useful information from the large number of comments. Currently a great quantity of researches are on methods for English text information extraction. With Pointwise Mutual Information method Kamps J et al. identified evaluative words by calculating correlations between all the adjectives in Net and two categories of representative words including seed commendatory term good and seed derogatory term bad [3]. Pancholi G used semi-supervised learning techniques to extract attribute word-evaluative word pairs [4]. Many scholars studied text information extraction in Chinese. Yu Xinsheng, Li Jiyun et al. who used the association rules to extract high-frequency attribute words, used the Pruning Rules to improve accuracy and coverage rate, and then found low-frequency attribute words, which were finally added into attribute word list, by using the adjectives neighboring them [5-6]. Yang Qin made a visual prototype system of evaluation information, in which attribute words, which were separated into two group made of general ones and special ones, were obtained with TF-IDF mothed [7]. Fan Yuehua et al. extracted attribute words through unsupervised methods such as preprocessing, clustering, similarity calculation, noun phrases gather, trim, etc. [8]. However, relevant researches are still undergoing exploratory stage which have not yet progressed to standardized methods of research and experiment. On the account of comparative dispersion on project design, technology utilization, absence of integrated platform of experiment and resources especially in the area of ontology construction regarding product reviews [9], together with a deficiency of corresponding researches under the circumstance of very large data, under which many researches have high computational complexity, therefore, it is necessary to develop methods of extracting information in Chinese product reviews from the large amount of data. This paper proposes a method of classification and extraction of attribute words in product reviews from the Internet, based on a sort of ontology, under the background of the big data. words are the core of commodities commented, recognition and classification of attribute words are important parts of sentiment label extraction and sentiment analysis. Firstly, the method of constructing ontology is introduced; then elaborate how to extract attribute words base on the ontology and how to classify them by synonyms; next, based on this method experiments are carried out and finally conclusions are given. From another perspective, this paper first discusses the method in relation to a single type of commodity, which can be seen as a uniform method for the massive comments of different types of commodities. Reviews in different sorts of commodities are processed using a parallel processing way which are shown in Fig. 1. The dotted line in the Fig. 1 shows the whole procedure depends on the ontology. II. CONSTRUCT THE ATTRIBUTE WORD ONTOLOGY Construction of Chinese ontology research lacks of uniform methods and standards. Different research purposes and different applications have different construction methods. The ontology structure concerned in this paper is shown in Fig. 2. The features of comments between different kinds of online goods vary greatly, so the attribute word libraries are divided according to product categories. This work is supported by the National High Technology Research and Development Program of China (No. 2013AA ) /15/$ IEEE 1730

2 Reviews of Product Type 1 Reviews of Product Type n Ontology Library Parallel Process Fig. 1. Parallel process of attribute word extraction Set 1 Set n To facilitate subsequent work, the attribute word thesauruses are constructed following different product categories. Extract attribute words in the mass product comments from currently popular e-commerce websites with the help of Chinese word segmentation, POS tagging and association rules [6]. To build an attribute word thesaurus, in which new words are added with manual work, categorize all the reviews by product types, such as cellphone women's clothing. Due to the limited number of product attributes, attribute words can easily got through artificial way. Each word library has some characteristics. According to the concept of objects in reality world, each thesaurus should have a hierarchical structure. Node on the upper level contains concept of its underlying child nodes, such as the lower subnode of cellphone can be shell, screen, color and so on; lower child node of screen can be color resolution, etc., as shown in Fig. 3. Besides, attribute words also have synonyms. s in a certain synonym class, which have a center word each, are in the same level. A center word can represent a synonym class in that it has a high frequency in the commodity comments. An example of synonyms is shown in Fig. 4, in which appearance is the center word, and texture, shell, design, model are synonymous. Categorizing words based on similarity can rely on the same center word. Part of the word records in Fig. 3 are shown in TABLE I. Using different coding to distinguish the same word in different classes or levels. Material and material* are two different attribute words in the different levels. Material is a child node of cellphone shown in Fig. 3 whereas material* is a child node of screen. III. CATEGORIZE AND EXTRACT ATTRIBUTE WORDS A. Process on a Single Kind of Product After Chinese word segmentation, stop words removing and POS tagging, attribute word candidate sets are composed by the nouns in the sentences. words, which are more generally nouns, are recognized by matching nouns in the sentences with the words that are of the same to the nouns in the ontology. Ontology Library Library 1 Library 2 Fig. 2. Structure of ontology library Library n Cellphone Shell Screen Color Color Resolution Fig. 3. Hierarchical structure of attribute word library Texture Shell Appearance Fig. 4. Synonymous and their center word Design Model On the basis of the ontology, categorize attribute words in the product reviews regarding a single kind of commodity. Categorization process is divided into two operations: classification and merging. Classification operation put the word into a categories of its center word by traversing all the attribute word candidate sets of a single kind of commodity; merge operation: if two classes both contain fewer words than a certain threshold, then merge them. The purpose of the merging operation is to make sure each class reaches a certain size in order to identify high-frequency attribute words. The overall process of classification and merging is shown in Fig. 5. If a sentence contains multiple attribute words, e.g., there are three attribute words cellphone, screen and color" in The screen of the cellphone is colorful, then use the word color, which is on the lowest level of concept hierarchy in the ontology, instead of the other attribute words in the sentence, to provide a basis for the follow-up merging operation. Merging process is as follows: If the center words of two classes A and B are at the same level and have the same father node in the ontology, and the numbers of words contained in A and B are less than a certain threshold k, then merge them, and the new center word w of the new class is the word in the upper level to the center words of A and B; if the center word of A is the ancestor of the center word of B, and the size class B is less than a certain threshold k, then A and B are merged, the new center word w is the center word of A. Merge until each size of the classes meet the minimum threshold or conditions are not met for the next merging. To formalize the descriptions above, some operations are defined in TABLE II. According to the sub-operations, classification operation can be expressed as: U(C(w), w) Merging operation can be expressed in Fig. 6. Wherein, in line 4 w satisfies the following equation: GT(w, E(A)) AND GT(w, E(B)) = TRUE And in line 7 w satisfies the following formula: w = E(A) 1731

3 TABLE I. TABLE II. EXAMPLES OF ATTRIBUTE WORD RECORD Upper Center Center Cellphone NULL Cellphone Appearance Cellphone Appearance Material Cellphone Appearance Material* Screen Material* Operation C(w) U(A, w) M(A,B) Q(A) L(w1, w2) GT(w1, w2) E(A) S(w1, w2) SUB-OPERATIONS OF CLASSIFICATION AND MERGING Definition Get the center word of the word w Put word w into class A Merge class A and class B Get the size of class A If the words w1 and w2 are on the same level If word w1 is ancestor of word w2 Get the center word of class A If word w1 and w2 have the same father node B. Parallel Process The method described above is how to extract and categorize attribute words in comments in relation to a single type of product. In practice, to process different types of largescale product comments we can make use of the way of parallel computing. In 2004, Google first proposed MapReduce [10] technology, as a parallel computing model for large data analysis and processing, which has aroused widespread concern in industry and academia. MapReduce parallel programming model for the calculation process is divided into two main phases, namely Phase Map and Reduce phases. Map function handles the Key/Value pairs, generating a series of intermediate Key/Value pair; Reduce function is used to merge all intermediate values with the same Key of key-value pairs to calculate the final result. Following the aforementioned method, perform on comments associated with different sorts of commodity simultaneously taking advantage of MapReduce platform to make them parallelized. IV. EXPERIMENTS We show how we get the dataset for experiments and the experiment evaluation indicators we choose. Then the general procedure of experiment, in which we make comparisons based on the indicators, is proposed in the part of experimental result and finally the result is shown. Classify Merge Conditions A. Get the Source Data Use Breadth-first method to obtain a large number of web pages containing product comments from the main Chinese e- commerce sites. Breadth-first approach: Fig. 5. Classification and merging procedure PROCEDURE MERGING IF L( E(A), E(B) ) = TRUE IF S( E(A), E(B) ) = TRUE AND Q(A)<k AND Q(B)<k THEN w = M(A, B) END IF ELSE IF GT( E(A), E(B) ) = TURE AND Q(B)<k THEN w = M(A, B) END IF END PROCEDURE Fig. 6. Merging operation In Fig. 7 (a), screen, color, resolution are three word nodes shown in Fig. 3. Color is the center word of the synonym class of color. Merge color and resolution finally get a class screen ; Merge color and screen eventually get screen as well. In both cases the final result of the merging is shown in Fig. 7 (b). Starting from a vertex V 0, access this vertex; Starting from V 0, access all V 0 s never visited neighbors W 1,,, individually; Starting from W 1,,, successively, access their neighbors that have not been accessed; Repeat the step above until all vertices have been visited, or until the termination condition is satisfied. View a web page as a vertex, then get pages and comments from the initial resource URL, and then by breadth-first method acquire web page URLs to have access to other pages and gain comment resources. Before the experiments the data of reviews linked to different brands of goods, such as digital cameras, and skin care products, are obtained. B. Experimental Evaluation Indicators In the field of big data mining, due to a variety of intended purposes, data sets and evaluation methods used are not exactly the same, it is difficult to seek a totally precise evaluation technique. In this paper, Recall Rate, Precision Rate, their Macro averages and parallel Speedup are used to measure results. 1) Recall and Precision Rates Provided that a comment text with an attribute word that can be extracted actually belongs to a certain class in which the word is included. 1732

4 Hue Color (a) Screen Fig. 7. An example of merging operation Resolution Screen Recall Rate is the ratio of the number of the comment texts that are correctly extracted attribute words and the quantity of texts included in actual classes. It reflects the degree of agreement with the actual categories, regarding avoiding undetected related text. Precision refers to the ratio of the count of the comment texts that are properly disposed and the number of texts that are processed. It reflects the degree of accurate of the method, mainly in connection with avoiding error detection and noise. There is a text set D, in which there are the amount of text processed correctly recorded as TT, the number of text that is not processed properly but belongs to a certain class recorded as FT, the count of text that does not belong to any class but is classified to a class recorded as TF, and the number of text that does not belong to any class and not classified to any class recorded as FF. The formulas of the two evaluation measurements are as follows: TT Recall = TT + FT TT Precision = TT + TF The numerators of the two formulas above are the same, so in order to improve Recall and Precision, we must make FT, TF as small as possible. The ideal situation is to make Recall = 1 (FT = 0), Precision = 1 (TF = 0). Although the formulas have been proposed, it does not mean that those are the most accurate evaluation ways. These two indicators we have presented are only in relate to the results of comments processing regarding a single type of commodity. Because each product attribute word thesaurus in the ontology library is obtained from product reviews of one kind of commodity category rather than just one kind of commodity, we need to consider the Macro average. Macro average is considered when the results of processing about each type of commodity affect the overall results related to the whole category of commodity. Assume that a certain category containing n types of goods, then for each type of goods calculate parameter values, and make a sum. As to Recall, first calculate Recall i for each sort of product reviews, then compute the Macro average for the whole commodity category. The formula is as follows: i Recall Macro-Recall = i n For accuracy, calculate Precision i for each type of product reviews, then by commodity category we get the Macro average of Precision. The formula is as follows: (b) i Precision Macro-Precision = i n 2) Speedup Rate Speedup is the ratio of time consumed, running the same task, in a single processing system and a parallel processing system. It is used to measure performance of parallel systems and program parallelization effects. Formula as follows: S p = T 1 / T p S p is Speedup; T 1 is the running time under a single processing system; T p is the time consumed in a parallel system with p processing units. C. Experimental Results The results of the experiments are divided into two parts: First, based on small-scale review data, compare automatic processing results to manual tagging results then calculate the Macro average for the results regarding each category of goods; In the other part, based on large-scale review data, observe the results by statistical sampling, and then calculate the Speedup of parallel method. 1) Small-scale Datasets Experimental Results The overall amount of reviews collected manually is shown in TABLE III. After these three categories of product reviews are conducted classification and extraction of attribute words, results are shown in TABLE IV. As can be seen from the results, the Macro average Recall and Precision rates for small-scale datasets are all more than 80%, which has been able to basically meet the demands and purpose of the paper. 2) The Results of Large-scale Datasets Perform data acquisition and attribute words classification and extraction of product reviews on Hadoop MapReduce parallel processing platform. In order to carry out the largescale data analysis experiment, we get one billion product reviews, which is equivalent to the amount about 115GB. Liu Hongyu et al. studied extraction technique of objects evaluated in product reviews based on syntactic analysis [11]. In the large-scale experiment, on the basis of the same datasets, we make comparisons on both Macro averages and Speedup. TABLE III. THE AMOUNT OF REVIEWS COLLECTED MANUALLY Type of Product Review Count Skin care 1000 Women s clothing 1900 Digital camera 1500 TABLE IV. RESULT OF SMALL-SCALE DATASETS EXPERIMENT Category of Product Macro-Recall Macro-Precision Skin care Women s clothing Digital camera

5 After sampling part of the processing results (each specimen, which comes from a certain category of product, has a volume of 1000) of the product reviews, calculate the Macro average Precision and Recall rates with comparison of machine processing method to manual annotations. The result of the method this paper propose is shown in Fig. 8. From the sampling results, we can see the values of Macro-Recall and Macro-Precision are all above 0.7. However, the average value is slightly lower than the small-scale datasets, and the variance is greater. The reason may be under the condition of smallscale datasets, the selected goods are relatively hot commodity, of which the number of comments is more, with higher quality; what s more in the large-scale data condition, may be due to the presence of some of the not so popular categories of goods with lower average quality product reviews. In the large-scale data processing, the Speedup rates with the same amount of data running are shown in Fig. 10. Based on our method, the Speedup rate is close to 5.5 when the number of task nodes reaches eight. The experiment shows that the method in this paper can be effectively fast in big data environments, and be capable of analyzing large data processing tasks. To make a comparison, we make a test through the method, presented in [11], of which results are shown in Fig. 9 and Fig. 10. The method we proposed has a better performance of accuracy, mainly as a result of the ontology library, which is precision enough, specialized for extraction of attribute words in Chinese product reviews. It also has a greater Speedup rate under large-scale background, because the use of ontology as well, which makes algorithms more simple. V. CONCLUSION This paper presents a method on attribute words classification and extraction of Internet commodity comments under large data environment, in which the core approach is to build the ontology library. Take account of the simple and subject-related features of product reviews, it is a good way to construct an ontology to go a step further to decrease the computational complex. After verification of the experiments, in terms of reliability and efficiency the results are good enough. It is to be tested in practical applications. Fig. 8. Macro averages of the method used by the authors Fig. 9. Results of method compared Fig. 10. Speedup rates comparison under large-scale experiments REFERENCES [1] China e-commerce research center, [2] Official microblog of Alibaba, [3] Kamps J., Marx M., Mokken R.J., Using Net to measure semantic orientation of adjectives, In: Calzolari N, et al., eds. Proc. of the LREC, 2013, pp [4] Pancholi G., Jacob J., Malavalli K, Distributed computing environment using DCOM, Arlington, USA: University of Texas at Arlington, [5] Yu Xinsheng, Rong Zhiwei, Application of binary-tree under multicommunication network, computer engineering, vol. 32, pp , May [6] Li Jiyun, Dong Xiaoshe, Tong Duan, Research on distributed objects middleware, Computer Engineering and design, vol. 25, pp , February [7] Yang Qin, The three-layer client/server model based on COM, Computer Application and Research, 2011, pp [8] Fan Yuehua, Liu Bailin, Architecture of distribute software system with object oriented component, Journal of Xi an Institute of Technology, vol. 22, pp , March [9] Yu Shuhao, Su Shoubao, Guo Xuejun, Methodology of domainspecific ontology construction, Journal of West Anhui University, vol. 21, pp , April [10] J. Dean, S. Ghemawat, MapReduce: simplified data processing on large clusters, Communications of the ACM, 2008 vol. 51, pp [11] Liu Hongyu, Zhao Yanyan, Qin Bing, Liu Ting, Comment target extraction and sentiment classification, Journal of Chinese Information Processing, vol.24, pp , January

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent

More information

A Comparative study of Clustering Algorithms using MapReduce in Hadoop

A Comparative study of Clustering Algorithms using MapReduce in Hadoop A Comparative study of Clustering Algorithms using MapReduce in Hadoop Dweepna Garg 1, Khushboo Trivedi 2, B.B.Panchal 3 1 Department of Computer Science and Engineering, Parul Institute of Engineering

More information

The Establishment of Large Data Mining Platform Based on Cloud Computing. Wei CAI

The Establishment of Large Data Mining Platform Based on Cloud Computing. Wei CAI 2017 International Conference on Electronic, Control, Automation and Mechanical Engineering (ECAME 2017) ISBN: 978-1-60595-523-0 The Establishment of Large Data Mining Platform Based on Cloud Computing

More information

Improving Suffix Tree Clustering Algorithm for Web Documents

Improving Suffix Tree Clustering Algorithm for Web Documents International Conference on Logistics Engineering, Management and Computer Science (LEMCS 2015) Improving Suffix Tree Clustering Algorithm for Web Documents Yan Zhuang Computer Center East China Normal

More information

Discovering Advertisement Links by Using URL Text

Discovering Advertisement Links by Using URL Text 017 3rd International Conference on Computational Systems and Communications (ICCSC 017) Discovering Advertisement Links by Using URL Text Jing-Shan Xu1, a, Peng Chang, b,* and Yong-Zheng Zhang, c 1 School

More information

CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING

CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING Amol Jagtap ME Computer Engineering, AISSMS COE Pune, India Email: 1 amol.jagtap55@gmail.com Abstract Machine learning is a scientific discipline

More information

Obtaining Rough Set Approximation using MapReduce Technique in Data Mining

Obtaining Rough Set Approximation using MapReduce Technique in Data Mining Obtaining Rough Set Approximation using MapReduce Technique in Data Mining Varda Dhande 1, Dr. B. K. Sarkar 2 1 M.E II yr student, Dept of Computer Engg, P.V.P.I.T Collage of Engineering Pune, Maharashtra,

More information

Classifying Twitter Data in Multiple Classes Based On Sentiment Class Labels

Classifying Twitter Data in Multiple Classes Based On Sentiment Class Labels Classifying Twitter Data in Multiple Classes Based On Sentiment Class Labels Richa Jain 1, Namrata Sharma 2 1M.Tech Scholar, Department of CSE, Sushila Devi Bansal College of Engineering, Indore (M.P.),

More information

Improvements and Implementation of Hierarchical Clustering based on Hadoop Jun Zhang1, a, Chunxiao Fan1, Yuexin Wu2,b, Ao Xiao1

Improvements and Implementation of Hierarchical Clustering based on Hadoop Jun Zhang1, a, Chunxiao Fan1, Yuexin Wu2,b, Ao Xiao1 3rd International Conference on Machinery, Materials and Information Technology Applications (ICMMITA 2015) Improvements and Implementation of Hierarchical Clustering based on Hadoop Jun Zhang1, a, Chunxiao

More information

Preliminary Research on Distributed Cluster Monitoring of G/S Model

Preliminary Research on Distributed Cluster Monitoring of G/S Model Available online at www.sciencedirect.com Physics Procedia 25 (2012 ) 860 867 2012 International Conference on Solid State Devices and Materials Science Preliminary Research on Distributed Cluster Monitoring

More information

AN EFFECTIVE DETECTION OF SATELLITE IMAGES VIA K-MEANS CLUSTERING ON HADOOP SYSTEM. Mengzhao Yang, Haibin Mei and Dongmei Huang

AN EFFECTIVE DETECTION OF SATELLITE IMAGES VIA K-MEANS CLUSTERING ON HADOOP SYSTEM. Mengzhao Yang, Haibin Mei and Dongmei Huang International Journal of Innovative Computing, Information and Control ICIC International c 2017 ISSN 1349-4198 Volume 13, Number 3, June 2017 pp. 1037 1046 AN EFFECTIVE DETECTION OF SATELLITE IMAGES VIA

More information

Comment Extraction from Blog Posts and Its Applications to Opinion Mining

Comment Extraction from Blog Posts and Its Applications to Opinion Mining Comment Extraction from Blog Posts and Its Applications to Opinion Mining Huan-An Kao, Hsin-Hsi Chen Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan

More information

Remotely Sensed Image Processing Service Automatic Composition

Remotely Sensed Image Processing Service Automatic Composition Remotely Sensed Image Processing Service Automatic Composition Xiaoxia Yang Supervised by Qing Zhu State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University

More information

Text clustering based on a divide and merge strategy

Text clustering based on a divide and merge strategy Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 55 (2015 ) 825 832 Information Technology and Quantitative Management (ITQM 2015) Text clustering based on a divide and

More information

Study on A Recommendation Algorithm of Crossing Ranking in E- commerce

Study on A Recommendation Algorithm of Crossing Ranking in E- commerce International Journal of u-and e-service, Science and Technology, pp.53-62 http://dx.doi.org/10.14257/ijunnesst2014.7.4.6 Study on A Recommendation Algorithm of Crossing Ranking in E- commerce Duan Xueying

More information

An Improved Frequent Pattern-growth Algorithm Based on Decomposition of the Transaction Database

An Improved Frequent Pattern-growth Algorithm Based on Decomposition of the Transaction Database Algorithm Based on Decomposition of the Transaction Database 1 School of Management Science and Engineering, Shandong Normal University,Jinan, 250014,China E-mail:459132653@qq.com Fei Wei 2 School of Management

More information

Improving the Efficiency of Fast Using Semantic Similarity Algorithm

Improving the Efficiency of Fast Using Semantic Similarity Algorithm International Journal of Scientific and Research Publications, Volume 4, Issue 1, January 2014 1 Improving the Efficiency of Fast Using Semantic Similarity Algorithm D.KARTHIKA 1, S. DIVAKAR 2 Final year

More information

A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2

A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2 A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2 1 Student, M.E., (Computer science and Engineering) in M.G University, India, 2 Associate Professor

More information

Improvement of SURF Feature Image Registration Algorithm Based on Cluster Analysis

Improvement of SURF Feature Image Registration Algorithm Based on Cluster Analysis Sensors & Transducers 2014 by IFSA Publishing, S. L. http://www.sensorsportal.com Improvement of SURF Feature Image Registration Algorithm Based on Cluster Analysis 1 Xulin LONG, 1,* Qiang CHEN, 2 Xiaoya

More information

Text Clustering Incremental Algorithm in Sensitive Topic Detection

Text Clustering Incremental Algorithm in Sensitive Topic Detection International Journal of Information and Communication Sciences 2018; 3(3): 88-95 http://www.sciencepublishinggroup.com/j/ijics doi: 10.11648/j.ijics.20180303.12 ISSN: 2575-1700 (Print); ISSN: 2575-1719

More information

Improvement of Web Search Results using Genetic Algorithm on Word Sense Disambiguation

Improvement of Web Search Results using Genetic Algorithm on Word Sense Disambiguation Volume 3, No.5, May 24 International Journal of Advances in Computer Science and Technology Pooja Bassin et al., International Journal of Advances in Computer Science and Technology, 3(5), May 24, 33-336

More information

Research and Application of E-Commerce Recommendation System Based on Association Rules Algorithm

Research and Application of E-Commerce Recommendation System Based on Association Rules Algorithm Research and Application of E-Commerce Recommendation System Based on Association Rules Algorithm Qingting Zhu 1*, Haifeng Lu 2 and Xinliang Xu 3 1 School of Computer Science and Software Engineering,

More information

Research Article Apriori Association Rule Algorithms using VMware Environment

Research Article Apriori Association Rule Algorithms using VMware Environment Research Journal of Applied Sciences, Engineering and Technology 8(2): 16-166, 214 DOI:1.1926/rjaset.8.955 ISSN: 24-7459; e-issn: 24-7467 214 Maxwell Scientific Publication Corp. Submitted: January 2,

More information

Making Sense Out of the Web

Making Sense Out of the Web Making Sense Out of the Web Rada Mihalcea University of North Texas Department of Computer Science rada@cs.unt.edu Abstract. In the past few years, we have witnessed a tremendous growth of the World Wide

More information

News Filtering and Summarization System Architecture for Recognition and Summarization of News Pages

News Filtering and Summarization System Architecture for Recognition and Summarization of News Pages Bonfring International Journal of Data Mining, Vol. 7, No. 2, May 2017 11 News Filtering and Summarization System Architecture for Recognition and Summarization of News Pages Bamber and Micah Jason Abstract---

More information

AUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS

AUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS AUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS Nilam B. Lonkar 1, Dinesh B. Hanchate 2 Student of Computer Engineering, Pune University VPKBIET, Baramati, India Computer Engineering, Pune University VPKBIET,

More information

2 Proposed Methodology

2 Proposed Methodology 3rd International Conference on Multimedia Technology(ICMT 2013) Object Detection in Image with Complex Background Dong Li, Yali Li, Fei He, Shengjin Wang 1 State Key Laboratory of Intelligent Technology

More information

Chapter 27 Introduction to Information Retrieval and Web Search

Chapter 27 Introduction to Information Retrieval and Web Search Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval

More information

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets 2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets Tao Xiao Chunfeng Yuan Yihua Huang Department

More information

Analysis on the technology improvement of the library network information retrieval efficiency

Analysis on the technology improvement of the library network information retrieval efficiency Available online www.jocpr.com Journal of Chemical and Pharmaceutical Research, 2014, 6(6):2198-2202 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 Analysis on the technology improvement of the

More information

Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing Environments

Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing Environments Send Orders for Reprints to reprints@benthamscience.ae 368 The Open Automation and Control Systems Journal, 2014, 6, 368-373 Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing

More information

Keyword Extraction by KNN considering Similarity among Features

Keyword Extraction by KNN considering Similarity among Features 64 Int'l Conf. on Advances in Big Data Analytics ABDA'15 Keyword Extraction by KNN considering Similarity among Features Taeho Jo Department of Computer and Information Engineering, Inha University, Incheon,

More information

Keywords Data alignment, Data annotation, Web database, Search Result Record

Keywords Data alignment, Data annotation, Web database, Search Result Record Volume 5, Issue 8, August 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Annotating Web

More information

FSRM Feedback Algorithm based on Learning Theory

FSRM Feedback Algorithm based on Learning Theory Send Orders for Reprints to reprints@benthamscience.ae The Open Cybernetics & Systemics Journal, 2015, 9, 699-703 699 FSRM Feedback Algorithm based on Learning Theory Open Access Zhang Shui-Li *, Dong

More information

Ontology Extraction from Heterogeneous Documents

Ontology Extraction from Heterogeneous Documents Vol.3, Issue.2, March-April. 2013 pp-985-989 ISSN: 2249-6645 Ontology Extraction from Heterogeneous Documents Kirankumar Kataraki, 1 Sumana M 2 1 IV sem M.Tech/ Department of Information Science & Engg

More information

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.

More information

RiMOM Results for OAEI 2010

RiMOM Results for OAEI 2010 RiMOM Results for OAEI 2010 Zhichun Wang 1, Xiao Zhang 1, Lei Hou 1, Yue Zhao 2, Juanzi Li 1, Yu Qi 3, Jie Tang 1 1 Tsinghua University, Beijing, China {zcwang,zhangxiao,greener,ljz,tangjie}@keg.cs.tsinghua.edu.cn

More information

The Comparative Study of Machine Learning Algorithms in Text Data Classification*

The Comparative Study of Machine Learning Algorithms in Text Data Classification* The Comparative Study of Machine Learning Algorithms in Text Data Classification* Wang Xin School of Science, Beijing Information Science and Technology University Beijing, China Abstract Classification

More information

Ontology-Based Web Query Classification for Research Paper Searching

Ontology-Based Web Query Classification for Research Paper Searching Ontology-Based Web Query Classification for Research Paper Searching MyoMyo ThanNaing University of Technology(Yatanarpon Cyber City) Mandalay,Myanmar Abstract- In web search engines, the retrieval of

More information

Research and Design of Key Technology of Vertical Search Engine for Educational Resources

Research and Design of Key Technology of Vertical Search Engine for Educational Resources 2017 International Conference on Arts and Design, Education and Social Sciences (ADESS 2017) ISBN: 978-1-60595-511-7 Research and Design of Key Technology of Vertical Search Engine for Educational Resources

More information

Correlation Based Feature Selection with Irrelevant Feature Removal

Correlation Based Feature Selection with Irrelevant Feature Removal Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,

More information

SQL Query Optimization on Cross Nodes for Distributed System

SQL Query Optimization on Cross Nodes for Distributed System 2016 International Conference on Power, Energy Engineering and Management (PEEM 2016) ISBN: 978-1-60595-324-3 SQL Query Optimization on Cross Nodes for Distributed System Feng ZHAO 1, Qiao SUN 1, Yan-bin

More information

AN ENHANCED ATTRIBUTE RERANKING DESIGN FOR WEB IMAGE SEARCH

AN ENHANCED ATTRIBUTE RERANKING DESIGN FOR WEB IMAGE SEARCH AN ENHANCED ATTRIBUTE RERANKING DESIGN FOR WEB IMAGE SEARCH Sai Tejaswi Dasari #1 and G K Kishore Babu *2 # Student,Cse, CIET, Lam,Guntur, India * Assistant Professort,Cse, CIET, Lam,Guntur, India Abstract-

More information

A Fast and High Throughput SQL Query System for Big Data

A Fast and High Throughput SQL Query System for Big Data A Fast and High Throughput SQL Query System for Big Data Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing, China 100190

More information

Empirical Analysis of Single and Multi Document Summarization using Clustering Algorithms

Empirical Analysis of Single and Multi Document Summarization using Clustering Algorithms Engineering, Technology & Applied Science Research Vol. 8, No. 1, 2018, 2562-2567 2562 Empirical Analysis of Single and Multi Document Summarization using Clustering Algorithms Mrunal S. Bewoor Department

More information

Application of Individualized Service System for Scientific and Technical Literature In Colleges and Universities

Application of Individualized Service System for Scientific and Technical Literature In Colleges and Universities Journal of Applied Science and Engineering Innovation, Vol.6, No.1, 2019, pp.26-30 ISSN (Print): 2331-9062 ISSN (Online): 2331-9070 Application of Individualized Service System for Scientific and Technical

More information

Question Answering Approach Using a WordNet-based Answer Type Taxonomy

Question Answering Approach Using a WordNet-based Answer Type Taxonomy Question Answering Approach Using a WordNet-based Answer Type Taxonomy Seung-Hoon Na, In-Su Kang, Sang-Yool Lee, Jong-Hyeok Lee Department of Computer Science and Engineering, Electrical and Computer Engineering

More information

Open Access Research on the Prediction Model of Material Cost Based on Data Mining

Open Access Research on the Prediction Model of Material Cost Based on Data Mining Send Orders for Reprints to reprints@benthamscience.ae 1062 The Open Mechanical Engineering Journal, 2015, 9, 1062-1066 Open Access Research on the Prediction Model of Material Cost Based on Data Mining

More information

Design and Implementation of Agricultural Information Resources Vertical Search Engine Based on Nutch

Design and Implementation of Agricultural Information Resources Vertical Search Engine Based on Nutch 619 A publication of CHEMICAL ENGINEERING TRANSACTIONS VOL. 51, 2016 Guest Editors: Tichun Wang, Hongyang Zhang, Lei Tian Copyright 2016, AIDIC Servizi S.r.l., ISBN 978-88-95608-43-3; ISSN 2283-9216 The

More information

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 1 Department of Electronics & Comp. Sc, RTMNU, Nagpur, India 2 Department of Computer Science, Hislop College, Nagpur,

More information

Web Information Retrieval using WordNet

Web Information Retrieval using WordNet Web Information Retrieval using WordNet Jyotsna Gharat Asst. Professor, Xavier Institute of Engineering, Mumbai, India Jayant Gadge Asst. Professor, Thadomal Shahani Engineering College Mumbai, India ABSTRACT

More information

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A.

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. Knowledge Retrieval Franz J. Kurfess Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. 1 Acknowledgements This lecture series has been sponsored by the European

More information

News-Oriented Keyword Indexing with Maximum Entropy Principle.

News-Oriented Keyword Indexing with Maximum Entropy Principle. News-Oriented Keyword Indexing with Maximum Entropy Principle. Li Sujian' Wang Houfeng' Yu Shiwen' Xin Chengsheng2 'Institute of Computational Linguistics, Peking University, 100871, Beijing, China Ilisujian,

More information

Domain-specific Concept-based Information Retrieval System

Domain-specific Concept-based Information Retrieval System Domain-specific Concept-based Information Retrieval System L. Shen 1, Y. K. Lim 1, H. T. Loh 2 1 Design Technology Institute Ltd, National University of Singapore, Singapore 2 Department of Mechanical

More information

Semantic Website Clustering

Semantic Website Clustering Semantic Website Clustering I-Hsuan Yang, Yu-tsun Huang, Yen-Ling Huang 1. Abstract We propose a new approach to cluster the web pages. Utilizing an iterative reinforced algorithm, the model extracts semantic

More information

Latest development in image feature representation and extraction

Latest development in image feature representation and extraction International Journal of Advanced Research and Development ISSN: 2455-4030, Impact Factor: RJIF 5.24 www.advancedjournal.com Volume 2; Issue 1; January 2017; Page No. 05-09 Latest development in image

More information

An Intelligent Retrieval Platform for Distributional Agriculture Science and Technology Data

An Intelligent Retrieval Platform for Distributional Agriculture Science and Technology Data An Intelligent Retrieval Platform for Distributional Agriculture Science and Technology Data Xiaorong Yang 1,2, Wensheng Wang 1,2, Qingtian Zeng 3, and Nengfu Xie 1,2 1 Agriculture Information Institute,

More information

Web-page Indexing based on the Prioritize Ontology Terms

Web-page Indexing based on the Prioritize Ontology Terms Web-page Indexing based on the Prioritize Ontology Terms Sukanta Sinha 1, 4, Rana Dattagupta 2, Debajyoti Mukhopadhyay 3, 4 1 Tata Consultancy Services Ltd., Victoria Park Building, Salt Lake, Kolkata

More information

A New Model of Search Engine based on Cloud Computing

A New Model of Search Engine based on Cloud Computing A New Model of Search Engine based on Cloud Computing DING Jian-li 1,2, YANG Bo 1 1. College of Computer Science and Technology, Civil Aviation University of China, Tianjin 300300, China 2. Tianjin Key

More information

Nearest Clustering Algorithm for Satellite Image Classification in Remote Sensing Applications

Nearest Clustering Algorithm for Satellite Image Classification in Remote Sensing Applications Nearest Clustering Algorithm for Satellite Image Classification in Remote Sensing Applications Anil K Goswami 1, Swati Sharma 2, Praveen Kumar 3 1 DRDO, New Delhi, India 2 PDM College of Engineering for

More information

Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India

Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India Abstract - The primary goal of the web site is to provide the

More information

A Novel Image Semantic Understanding and Feature Extraction Algorithm. and Wenzhun Huang

A Novel Image Semantic Understanding and Feature Extraction Algorithm. and Wenzhun Huang A Novel Image Semantic Understanding and Feature Extraction Algorithm Xinxin Xie 1, a 1, b* and Wenzhun Huang 1 School of Information Engineering, Xijing University, Xi an 710123, China a 346148500@qq.com,

More information

An Adaptive Threshold LBP Algorithm for Face Recognition

An Adaptive Threshold LBP Algorithm for Face Recognition An Adaptive Threshold LBP Algorithm for Face Recognition Xiaoping Jiang 1, Chuyu Guo 1,*, Hua Zhang 1, and Chenghua Li 1 1 College of Electronics and Information Engineering, Hubei Key Laboratory of Intelligent

More information

WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY

WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.4, April 2009 349 WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY Mohammed M. Sakre Mohammed M. Kouta Ali M. N. Allam Al Shorouk

More information

A Component Retrieval Tree Matching Algorithm Based on a Faceted Classification Scheme

A Component Retrieval Tree Matching Algorithm Based on a Faceted Classification Scheme BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 15, No 1 Sofia 2015 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.1515/cait-2015-0002 A Component Retrieval Tree Matching

More information

An Efficient Methodology for Image Rich Information Retrieval

An Efficient Methodology for Image Rich Information Retrieval An Efficient Methodology for Image Rich Information Retrieval 56 Ashwini Jaid, 2 Komal Savant, 3 Sonali Varma, 4 Pushpa Jat, 5 Prof. Sushama Shinde,2,3,4 Computer Department, Siddhant College of Engineering,

More information

CHAPTER 6 PROPOSED HYBRID MEDICAL IMAGE RETRIEVAL SYSTEM USING SEMANTIC AND VISUAL FEATURES

CHAPTER 6 PROPOSED HYBRID MEDICAL IMAGE RETRIEVAL SYSTEM USING SEMANTIC AND VISUAL FEATURES 188 CHAPTER 6 PROPOSED HYBRID MEDICAL IMAGE RETRIEVAL SYSTEM USING SEMANTIC AND VISUAL FEATURES 6.1 INTRODUCTION Image representation schemes designed for image retrieval systems are categorized into two

More information

Supervised Web Forum Crawling

Supervised Web Forum Crawling Supervised Web Forum Crawling 1 Priyanka S. Bandagale, 2 Dr. Lata Ragha 1 Student, 2 Professor and HOD 1 Computer Department, 1 Terna college of Engineering, Navi Mumbai, India Abstract - In this paper,

More information

Research on the Application of Bank Transaction Data Stream Storage based on HBase Xiaoguo Wang*, Yuxiang Liu and Lin Zhang

Research on the Application of Bank Transaction Data Stream Storage based on HBase Xiaoguo Wang*, Yuxiang Liu and Lin Zhang International Conference on Engineering Management (Iconf-EM 2016) Research on the Application of Bank Transaction Data Stream Storage based on HBase Xiaoguo Wang*, Yuxiang Liu and Lin Zhang School of

More information

Mining Social Media Users Interest

Mining Social Media Users Interest Mining Social Media Users Interest Presenters: Heng Wang,Man Yuan April, 4 th, 2016 Agenda Introduction to Text Mining Tool & Dataset Data Pre-processing Text Mining on Twitter Summary & Future Improvement

More information

The Research of A multi-language supporting description-oriented Clustering Algorithm on Meta-Search Engine Result Wuling Ren 1, a and Lijuan Liu 2,b

The Research of A multi-language supporting description-oriented Clustering Algorithm on Meta-Search Engine Result Wuling Ren 1, a and Lijuan Liu 2,b Applied Mechanics and Materials Online: 2012-01-24 ISSN: 1662-7482, Vol. 151, pp 549-553 doi:10.4028/www.scientific.net/amm.151.549 2012 Trans Tech Publications, Switzerland The Research of A multi-language

More information

The Design of Distributed File System Based on HDFS Yannan Wang 1, a, Shudong Zhang 2, b, Hui Liu 3, c

The Design of Distributed File System Based on HDFS Yannan Wang 1, a, Shudong Zhang 2, b, Hui Liu 3, c Applied Mechanics and Materials Online: 2013-09-27 ISSN: 1662-7482, Vols. 423-426, pp 2733-2736 doi:10.4028/www.scientific.net/amm.423-426.2733 2013 Trans Tech Publications, Switzerland The Design of Distributed

More information

Enhancing Cluster Quality by Using User Browsing Time

Enhancing Cluster Quality by Using User Browsing Time Enhancing Cluster Quality by Using User Browsing Time Rehab Duwairi Dept. of Computer Information Systems Jordan Univ. of Sc. and Technology Irbid, Jordan rehab@just.edu.jo Khaleifah Al.jada' Dept. of

More information

CHAPTER THREE INFORMATION RETRIEVAL SYSTEM

CHAPTER THREE INFORMATION RETRIEVAL SYSTEM CHAPTER THREE INFORMATION RETRIEVAL SYSTEM 3.1 INTRODUCTION Search engine is one of the most effective and prominent method to find information online. It has become an essential part of life for almost

More information

RETRACTED ARTICLE. Web-Based Data Mining in System Design and Implementation. Open Access. Jianhu Gong 1* and Jianzhi Gong 2

RETRACTED ARTICLE. Web-Based Data Mining in System Design and Implementation. Open Access. Jianhu Gong 1* and Jianzhi Gong 2 Send Orders for Reprints to reprints@benthamscience.ae The Open Automation and Control Systems Journal, 2014, 6, 1907-1911 1907 Web-Based Data Mining in System Design and Implementation Open Access Jianhu

More information

An Improved Document Clustering Approach Using Weighted K-Means Algorithm

An Improved Document Clustering Approach Using Weighted K-Means Algorithm An Improved Document Clustering Approach Using Weighted K-Means Algorithm 1 Megha Mandloi; 2 Abhay Kothari 1 Computer Science, AITR, Indore, M.P. Pin 453771, India 2 Computer Science, AITR, Indore, M.P.

More information

The Design of Model for Tibetan Language Search System

The Design of Model for Tibetan Language Search System International Conference on Chemical, Material and Food Engineering (CMFE-2015) The Design of Model for Tibetan Language Search System Wang Zhong School of Information Science and Engineering Lanzhou University

More information

Web Product Ranking Using Opinion Mining

Web Product Ranking Using Opinion Mining Web Product Ranking Using Opinion Mining Yin-Fu Huang and Heng Lin Department of Computer Science and Information Engineering National Yunlin University of Science and Technology Yunlin, Taiwan {huangyf,

More information

Ontology Molecule Theory-based Information Integrated Service for Agricultural Risk Management

Ontology Molecule Theory-based Information Integrated Service for Agricultural Risk Management 2154 JOURNAL OF SOFTWARE, VOL. 6, NO. 11, NOVEMBER 2011 Ontology Molecule Theory-based Information Integrated Service for Agricultural Risk Management Qin Pan College of Economics Management, Huazhong

More information

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) CONTEXT SENSITIVE TEXT SUMMARIZATION USING HIERARCHICAL CLUSTERING ALGORITHM

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) CONTEXT SENSITIVE TEXT SUMMARIZATION USING HIERARCHICAL CLUSTERING ALGORITHM INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & 6367(Print), ISSN 0976 6375(Online) Volume 3, Issue 1, January- June (2012), TECHNOLOGY (IJCET) IAEME ISSN 0976 6367(Print) ISSN 0976 6375(Online) Volume

More information

2 Ontology evolution algorithm based on web-pages and users behavior logs

2 Ontology evolution algorithm based on web-pages and users behavior logs ISSN 1749-3889 (print), 1749-3897 (online) International Journal of Nonlinear Science Vol.18(2014) No.1,pp.86-91 Ontology Evolution Algorithm for Topic Information Collection Jing Ma 1, Mengyong Sun 1,

More information

Fast and Effective System for Name Entity Recognition on Big Data

Fast and Effective System for Name Entity Recognition on Big Data International Journal of Computer Sciences and Engineering Open Access Research Paper Volume-3, Issue-2 E-ISSN: 2347-2693 Fast and Effective System for Name Entity Recognition on Big Data Jigyasa Nigam

More information

Yunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace

Yunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace [Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 20 BioTechnology 2014 An Indian Journal FULL PAPER BTAIJ, 10(20), 2014 [12526-12531] Exploration on the data mining system construction

More information

Clustering Analysis based on Data Mining Applications Xuedong Fan

Clustering Analysis based on Data Mining Applications Xuedong Fan Applied Mechanics and Materials Online: 203-02-3 ISSN: 662-7482, Vols. 303-306, pp 026-029 doi:0.4028/www.scientific.net/amm.303-306.026 203 Trans Tech Publications, Switzerland Clustering Analysis based

More information

Information Retrieval System Based on Context-aware in Internet of Things. Ma Junhong 1, a *

Information Retrieval System Based on Context-aware in Internet of Things. Ma Junhong 1, a * Information Retrieval System Based on Context-aware in Internet of Things Ma Junhong 1, a * 1 Xi an International University, Shaanxi, China, 710000 a sufeiya913@qq.com Keywords: Context-aware computing,

More information

70 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 6, NO. 1, FEBRUARY ClassView: Hierarchical Video Shot Classification, Indexing, and Accessing

70 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 6, NO. 1, FEBRUARY ClassView: Hierarchical Video Shot Classification, Indexing, and Accessing 70 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 6, NO. 1, FEBRUARY 2004 ClassView: Hierarchical Video Shot Classification, Indexing, and Accessing Jianping Fan, Ahmed K. Elmagarmid, Senior Member, IEEE, Xingquan

More information

TEXT PREPROCESSING FOR TEXT MINING USING SIDE INFORMATION

TEXT PREPROCESSING FOR TEXT MINING USING SIDE INFORMATION TEXT PREPROCESSING FOR TEXT MINING USING SIDE INFORMATION Ms. Nikita P.Katariya 1, Prof. M. S. Chaudhari 2 1 Dept. of Computer Science & Engg, P.B.C.E., Nagpur, India, nikitakatariya@yahoo.com 2 Dept.

More information

Research Article Mobile Storage and Search Engine of Information Oriented to Food Cloud

Research Article Mobile Storage and Search Engine of Information Oriented to Food Cloud Advance Journal of Food Science and Technology 5(10): 1331-1336, 2013 DOI:10.19026/ajfst.5.3106 ISSN: 2042-4868; e-issn: 2042-4876 2013 Maxwell Scientific Publication Corp. Submitted: May 29, 2013 Accepted:

More information

Implementation of Parallel CASINO Algorithm Based on MapReduce. Li Zhang a, Yijie Shi b

Implementation of Parallel CASINO Algorithm Based on MapReduce. Li Zhang a, Yijie Shi b International Conference on Artificial Intelligence and Engineering Applications (AIEA 2016) Implementation of Parallel CASINO Algorithm Based on MapReduce Li Zhang a, Yijie Shi b State key laboratory

More information

Improved Balanced Parallel FP-Growth with MapReduce Qing YANG 1,a, Fei-Yang DU 2,b, Xi ZHU 1,c, Cheng-Gong JIANG *

Improved Balanced Parallel FP-Growth with MapReduce Qing YANG 1,a, Fei-Yang DU 2,b, Xi ZHU 1,c, Cheng-Gong JIANG * 2016 Joint International Conference on Artificial Intelligence and Computer Engineering (AICE 2016) and International Conference on Network and Communication Security (NCS 2016) ISBN: 978-1-60595-362-5

More information

Research and Application of Unstructured Data Acquisition and Retrieval Technology

Research and Application of Unstructured Data Acquisition and Retrieval Technology 2018 2nd International Conference on Systems, Computing, and Applications (SYSTCA 2018) Research and Application of Unstructured Data Acquisition and Retrieval Technology Zhenjiang Lei1,*, Lin Qiao2, Lina

More information

A Feature Selection Method to Handle Imbalanced Data in Text Classification

A Feature Selection Method to Handle Imbalanced Data in Text Classification A Feature Selection Method to Handle Imbalanced Data in Text Classification Fengxiang Chang 1*, Jun Guo 1, Weiran Xu 1, Kejun Yao 2 1 School of Information and Communication Engineering Beijing University

More information

Top-k Keyword Search Over Graphs Based On Backward Search

Top-k Keyword Search Over Graphs Based On Backward Search Top-k Keyword Search Over Graphs Based On Backward Search Jia-Hui Zeng, Jiu-Ming Huang, Shu-Qiang Yang 1College of Computer National University of Defense Technology, Changsha, China 2College of Computer

More information

Research on the New Image De-Noising Methodology Based on Neural Network and HMM-Hidden Markov Models

Research on the New Image De-Noising Methodology Based on Neural Network and HMM-Hidden Markov Models Research on the New Image De-Noising Methodology Based on Neural Network and HMM-Hidden Markov Models Wenzhun Huang 1, a and Xinxin Xie 1, b 1 School of Information Engineering, Xijing University, Xi an

More information

A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK

A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK Qing Guo 1, 2 1 Nanyang Technological University, Singapore 2 SAP Innovation Center Network,Singapore ABSTRACT Literature review is part of scientific

More information

AN EFFICIENT GRADUAL PRUNING TECHNIQUE FOR UTILITY MINING. Received April 2011; revised October 2011

AN EFFICIENT GRADUAL PRUNING TECHNIQUE FOR UTILITY MINING. Received April 2011; revised October 2011 International Journal of Innovative Computing, Information and Control ICIC International c 2012 ISSN 1349-4198 Volume 8, Number 7(B), July 2012 pp. 5165 5178 AN EFFICIENT GRADUAL PRUNING TECHNIQUE FOR

More information

Encoding Words into String Vectors for Word Categorization

Encoding Words into String Vectors for Word Categorization Int'l Conf. Artificial Intelligence ICAI'16 271 Encoding Words into String Vectors for Word Categorization Taeho Jo Department of Computer and Information Communication Engineering, Hongik University,

More information

Open Access The Three-dimensional Coding Based on the Cone for XML Under Weaving Multi-documents

Open Access The Three-dimensional Coding Based on the Cone for XML Under Weaving Multi-documents Send Orders for Reprints to reprints@benthamscience.ae 676 The Open Automation and Control Systems Journal, 2014, 6, 676-683 Open Access The Three-dimensional Coding Based on the Cone for XML Under Weaving

More information

CLUSTERING BIG DATA USING NORMALIZATION BASED k-means ALGORITHM

CLUSTERING BIG DATA USING NORMALIZATION BASED k-means ALGORITHM Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,

More information

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,

More information