Document Summarization using Semantic Feature based on Cloud

Size: px
Start display at page:

Download "Document Summarization using Semantic Feature based on Cloud"

Transcription

1 Advanced Science and echnology Letters, pp Document Summarization using Semantic Feature based on Cloud Yoo-Kang Ji 1, Yong-Il Kim 2, Sun Park 3 * 1 Dept. of Information & Communication Engineering, Dongshin Univ., Korea 2 onam University, South Korea 3 Mokpo National University, South Korea 1 neobacje@gmail.com 2 yikim@honam.ac.kr 3 sunpark@mokpo.ac.kr Abstract. his paper proposes a document summarization method using the extracted semantic feature which it is extracted by distributed parallel processing of NMF based cloud technique of adoop. he proposed method can well represent the inherent structure of documents using the semantic feature by the non-negative matrix factorization (NMF). In addition, it can summarize the big data document using adoop. he experimental results demonstrate that the proposed method can summarize the big data document which a single computer cannot summarize those. Keywords: document summarization, semantic features, distributed parallel processing, NMF 1 Introduction ith the fast growth of the Internet access by user, has increased the necessity of the information seeking. owever, it is difficult to find suitable information for net surfers from cyber space. Summary information can help to users, which the user can save time not only in deciding whether it is interesting or not but also in finding the information without having to read the full information. Document summarization is the process of reducing the sizes of documents while maintaining their basic outlines. hat is, it should distill the most important information (i.e., topics of document) from the document. he summarization method can involve either generic summaries or query-based summaries. A generic summary distills an overall sense of a document s contents, whereas a query-based summary distills only the contents of a document that is relevant to a user s query. It can also divide into single-document summarization or multi-document summarization according to the scope of the summary target. he purpose of multi-document summarization is to produce a single summary from a set of related documents, whereas single-document summarization is intended to summarize only one document [1]. raditional document summarization methods are restricted to summarize suitable information from the exploding cyber data (i.e., SNS, , message, blog, etc.), * Corresponding author ISSN: ASL Copyright 2013 SERSC

2 Advanced Science and echnology Letters since it have been studying for enhancing the summarization precision which it uses various statistical or natural language processing methods on single computer or server. In order to resolve the limitations of the traditional document summarizations, this paper study document summarization method which the information is summarized from a big document data. he proposed method uses the extracted semantic feature of document by distributed parallel processing of NMF based cloud technique of adoop [2] to summarize a big document data on the cyber space. he proposed method can well represent the inherent structure of documents using the semantic feature by the non-negative matrix factorization (NMF). In addition, it can summarize the big data document using adoop. he experimental results demonstrate that the proposed method can summarize the big data document which a single computer cannot summarize those. he rest of the paper is organized as follows: Section 2 describe the NMF algorithm in detail. In Section 3, adoop framework is introduced. In Section 4 explains the proposed methodlts. Finally, we conclude in Section 5. 2 Non-negative Matrix Factorization his section reviews NMF theory. In this paper, we define the matrix notation as follows: Let X *j be j th column vector of matrix X, X i * be i th row vector, and X ij be the element of i th row and j th column. NMF is to decompose a given m n matrix A into a non-negative semantic feature matrix and a non-negative semantic variable matrix as shown in Equation (1) [3]. A (1) where is a m r non-negative matrix and is a r n non-negative matrix. Usually r is chosen to be smaller than m or n, so that the total sizes of and are smaller than that of the original matrix A. he objective function is used minimizing the Euclidean distance between each ~ column of A and its approximation A =, which was proposed by Lee and Seung [10]. As an objective function, the Frobenius norm is used: Θ E (, ) A 2 F m n A ij i= 1 j= 1 l= 1 r il lj 2 (2) Updating and is kept until ΘE (, ) converges under the predefined threshold or exceeds the number of repetition. he update rules are as follows: 52 Copyright 2013 SERSC

3 Advanced Science and echnology Letters ( A), ( ) ( A ) ( ) (3) 3 adoop Framework he Apache adoop project develops open-source software for reliable, scalable, distributed computing. he adoop project includes the Apache adoop software library which is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures [2]. 4 Proposed Document Clustering Method his paper proposes a document summarization method using semantic feature by NMF based adoop. he proposed method consists of two phases: summarization module, and distributed parallel processing module, as shown in Figure 1. In the subsections below, each phase is explained in full. Internet SNS User 1. Summarization module (a) preprocessing (b) summary algorithm results (c) adoop 2. Distributed parallel processing module Fig. 1. Document summarization method using adoop and semantic features Copyright 2013 SERSC 53

4 Advanced Science and echnology Letters In the summarization module consists of preprocessing and summary algorithm. In the preprocessing phase of Figure 1(a), Rijsbergen s stop words list is used to remove all stop words, and word stemming is removed using Porter s stemming algorithm [4, 5]. hen, the term document frequency matrix A is constructed from the document set. he term document frequency matrix is saved by distributed parallel processing to adoop framework in Figure 1(c). In the summary algorithm phase of Figure 1(b), Semantic features of document for summarizing are extracted by Liu s [6] NMF method based on distributed parallel processing on adoop MapReduce programming. he similarity between query and semantic feature vectors is calculated by cosine similarity. he semantic feature vector having the largest similarity value is selected. he semantic variable vector corresponding to the selected semantic feature vector is selected. he sentence corresponding to the largest value of semantic variable is extracted. hese steps are repeated until the predefined number of sentences to be summarized is reached. able 1 shows Liu s NMF method using MapReduce on adoop. able 1. Liu s NMF method using MapReduce [6] Stage ( A) ( ) ( A ) ( ) 1 X 1 = is computed to Map 2 Y 1 = is compute to Map 3 = X 1 is computed to Map Y 1 X 2 = A is computed to Map Y 2 = is computed to Map = X 2 is computed to Map Y 2 5 Conclusion raditional document summarization methods are restricted to summarize suitable information from the big document data on Internet (i.e., SNS, , message, blog, etc.), since it have been studying for enhancing the summarization precision which it uses various statistical or natural language processing methods on single computer or server. In order to resolve the limitations of the summarizations, this paper proposed document summarization method which the information is summarized from a big document data. he proposed method uses the extracted semantic feature of document by distributed parallel processing of NMF based adoop MapReduce [2] to summarize a big document data on Internet. he proposed method can well represent the inherent structure of documents using the semantic feature by the non-negative 54 Copyright 2013 SERSC

5 Advanced Science and echnology Letters matrix factorization (NMF). In addition, it can summarize the big data document using adoop. References 1. Mani, Automatic Summarization, John Benjamins Publishing Company, he Apache adoop project, (2013) 3. D. D. Lee,. S. Seung, Learning the parts of objects by non-negative matrix factorization, Nature, 401, pp , Oct. (1999) 4. B. Y. Ricardo, R. N. Berthier, Moden Information Retrieval: the concepts and technology behind search Second edition, ACM Press, (2011) 5.. B. Frankes, B. Y. Ricardo, Information Retrieval: Data Structure & Algorithms, Prentice-all, (1992) 6. C. Liu,. C. Yang, J. Fan, L.. e, Y. M. ang, "Distributed Nonnegative Matrix Factorization for eb-scale Dyadic Data Analysis on MapReduce," in Proceeding of the International orld ide eb Conferene Comittee, USA, pp.1-10, (2010) Copyright 2013 SERSC 55

Keyword Extraction by KNN considering Similarity among Features

Keyword Extraction by KNN considering Similarity among Features 64 Int'l Conf. on Advances in Big Data Analytics ABDA'15 Keyword Extraction by KNN considering Similarity among Features Taeho Jo Department of Computer and Information Engineering, Inha University, Incheon,

More information

String Vector based KNN for Text Categorization

String Vector based KNN for Text Categorization 458 String Vector based KNN for Text Categorization Taeho Jo Department of Computer and Information Communication Engineering Hongik University Sejong, South Korea tjo018@hongik.ac.kr Abstract This research

More information

Encoding Words into String Vectors for Word Categorization

Encoding Words into String Vectors for Word Categorization Int'l Conf. Artificial Intelligence ICAI'16 271 Encoding Words into String Vectors for Word Categorization Taeho Jo Department of Computer and Information Communication Engineering, Hongik University,

More information

Non-negative Matrix Factorization for Multimodal Image Retrieval

Non-negative Matrix Factorization for Multimodal Image Retrieval Non-negative Matrix Factorization for Multimodal Image Retrieval Fabio A. González PhD Machine Learning 2015-II Universidad Nacional de Colombia F. González NMF for MM IR ML 2015-II 1 / 54 Outline 1 The

More information

Concept-Based Document Similarity Based on Suffix Tree Document

Concept-Based Document Similarity Based on Suffix Tree Document Concept-Based Document Similarity Based on Suffix Tree Document *P.Perumal Sri Ramakrishna Engineering College Associate Professor Department of CSE, Coimbatore perumalsrec@gmail.com R. Nedunchezhian Sri

More information

Research on Cloud Resource Scheduling Algorithm based on Ant-cycle Model

Research on Cloud Resource Scheduling Algorithm based on Ant-cycle Model , pp.427-432 http://dx.doi.org/10.14257/astl.2016.139.85 Research on Cloud Resource Scheduling Algorithm based on Ant-cycle Model Yang Zhaofeng, Fan Aiwan Computer School, Pingdingshan University, Pingdingshan,

More information

Big Data Service Combination for Efficient Energy Data Analytics

Big Data Service Combination for Efficient Energy Data Analytics , pp.455-459 http://dx.doi.org/10.14257/astl.2016.139.90 Big Data Service Combination for Efficient Energy Data Analytics Tai-Yeon Ku, Wan-ki Park, Il-Woo Lee Energy IT Technology Research Section Hyper-connected

More information

Lecture Video Indexing and Retrieval Using Topic Keywords

Lecture Video Indexing and Retrieval Using Topic Keywords Lecture Video Indexing and Retrieval Using Topic Keywords B. J. Sandesh, Saurabha Jirgi, S. Vidya, Prakash Eljer, Gowri Srinivasa International Science Index, Computer and Information Engineering waset.org/publication/10007915

More information

Non-negative Matrix Factorization for Multimodal Image Retrieval

Non-negative Matrix Factorization for Multimodal Image Retrieval Non-negative Matrix Factorization for Multimodal Image Retrieval Fabio A. González PhD Bioingenium Research Group Computer Systems and Industrial Engineering Department Universidad Nacional de Colombia

More information

Text Mining With Lucene And Hadoop: Document Clustering With Updated Rules Of NMF Non- Negative Matrix Factorization

Text Mining With Lucene And Hadoop: Document Clustering With Updated Rules Of NMF Non- Negative Matrix Factorization Volume 118 No. 7 2018, 191-198 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu Text Mining With Lucene And Hadoop: Document Clustering With Updated

More information

Data Imbalance Problem solving for SMOTE Based Oversampling: Study on Fault Detection Prediction Model in Semiconductor Manufacturing Process

Data Imbalance Problem solving for SMOTE Based Oversampling: Study on Fault Detection Prediction Model in Semiconductor Manufacturing Process Vol.133 (Information Technology and Computer Science 2016), pp.79-84 http://dx.doi.org/10.14257/astl.2016. Data Imbalance Problem solving for SMOTE Based Oversampling: Study on Fault Detection Prediction

More information

Abnormal Data Detection with CEP Engine for Smart Factory

Abnormal Data Detection with CEP Engine for Smart Factory , pp.1-5 http://dx.doi.org/10.14257/astl.2017.145.01 Abnormal Data Detection with CEP Engine for Smart Factory Won-chang Lee 1, Jae-Han Cho 2 and LeeSub Lee 3 1,2,3 Kumoh National Institute of Technology

More information

SQL-to-MapReduce Translation for Efficient OLAP Query Processing

SQL-to-MapReduce Translation for Efficient OLAP Query Processing , pp.61-70 http://dx.doi.org/10.14257/ijdta.2017.10.6.05 SQL-to-MapReduce Translation for Efficient OLAP Query Processing with MapReduce Hyeon Gyu Kim Department of Computer Engineering, Sahmyook University,

More information

Robot localization method based on visual features and their geometric relationship

Robot localization method based on visual features and their geometric relationship , pp.46-50 http://dx.doi.org/10.14257/astl.2015.85.11 Robot localization method based on visual features and their geometric relationship Sangyun Lee 1, Changkyung Eem 2, and Hyunki Hong 3 1 Department

More information

Deep Web Content Mining

Deep Web Content Mining Deep Web Content Mining Shohreh Ajoudanian, and Mohammad Davarpanah Jazi Abstract The rapid expansion of the web is causing the constant growth of information, leading to several problems such as increased

More information

Clean Living: Eliminating Near-Duplicates in Lifetime Personal Storage

Clean Living: Eliminating Near-Duplicates in Lifetime Personal Storage Clean Living: Eliminating Near-Duplicates in Lifetime Personal Storage Zhe Wang Princeton University Jim Gemmell Microsoft Research September 2005 Technical Report MSR-TR-2006-30 Microsoft Research Microsoft

More information

Improvement of Matrix Factorization-based Recommender Systems Using Similar User Index

Improvement of Matrix Factorization-based Recommender Systems Using Similar User Index , pp. 71-78 http://dx.doi.org/10.14257/ijseia.2015.9.3.08 Improvement of Matrix Factorization-based Recommender Systems Using Similar User Index Haesung Lee 1 and Joonhee Kwon 2* 1,2 Department of Computer

More information

Tag-based Social Interest Discovery

Tag-based Social Interest Discovery Tag-based Social Interest Discovery Xin Li / Lei Guo / Yihong (Eric) Zhao Yahoo!Inc 2008 Presented by: Tuan Anh Le (aletuan@vub.ac.be) 1 Outline Introduction Data set collection & Pre-processing Architecture

More information

Study on the Distributed Crawling for Processing Massive Data in the Distributed Network Environment

Study on the Distributed Crawling for Processing Massive Data in the Distributed Network Environment , pp.375-384 http://dx.doi.org/10.14257/ijmue.2015.10.10.37 Study on the Distributed Crawling for Processing Massive Data in the Distributed Network Environment Chang-Su Kim PaiChai University, 155-40,

More information

An Efficient Provable Data Possession Scheme based on Counting Bloom Filter for Dynamic Data in the Cloud Storage

An Efficient Provable Data Possession Scheme based on Counting Bloom Filter for Dynamic Data in the Cloud Storage , pp. 9-16 http://dx.doi.org/10.14257/ijmue.2016.11.4.02 An Efficient Provable Data Possession Scheme based on Counting Bloom Filter for Dynamic Data in the Cloud Storage Eunmi Jung 1 and Junho Jeong 2

More information

WEB STRUCTURE MINING USING PAGERANK, IMPROVED PAGERANK AN OVERVIEW

WEB STRUCTURE MINING USING PAGERANK, IMPROVED PAGERANK AN OVERVIEW ISSN: 9 694 (ONLINE) ICTACT JOURNAL ON COMMUNICATION TECHNOLOGY, MARCH, VOL:, ISSUE: WEB STRUCTURE MINING USING PAGERANK, IMPROVED PAGERANK AN OVERVIEW V Lakshmi Praba and T Vasantha Department of Computer

More information

Semantic Video Indexing and Summarization Using Subtitles

Semantic Video Indexing and Summarization Using Subtitles Semantic Video Indexing and Summarization Using Subtitles Haoran Yi, Deepu Rajan, and Liang-Tien Chia Center for Multimedia and Network Technology School of Computer Engineering Nanyang Technological University,

More information

A Practical Camera Calibration System on Mobile Phones

A Practical Camera Calibration System on Mobile Phones Advanced Science and echnolog Letters Vol.7 (Culture and Contents echnolog 0), pp.6-0 http://dx.doi.org/0.57/astl.0.7. A Practical Camera Calibration Sstem on Mobile Phones Lu Bo, aegkeun hangbo Department

More information

IMPROVING INFORMATION RETRIEVAL BASED ON QUERY CLASSIFICATION ALGORITHM

IMPROVING INFORMATION RETRIEVAL BASED ON QUERY CLASSIFICATION ALGORITHM IMPROVING INFORMATION RETRIEVAL BASED ON QUERY CLASSIFICATION ALGORITHM Myomyo Thannaing 1, Ayenandar Hlaing 2 1,2 University of Technology (Yadanarpon Cyber City), near Pyin Oo Lwin, Myanmar ABSTRACT

More information

Byte Index Chunking Approach for Data Compression

Byte Index Chunking Approach for Data Compression Ider Lkhagvasuren 1, Jung Min So 1, Jeong Gun Lee 1, Chuck Yoo 2, Young Woong Ko 1 1 Dept. of Computer Engineering, Hallym University Chuncheon, Korea {Ider555, jso, jeonggun.lee, yuko}@hallym.ac.kr 2

More information

Integrated Framework for Keyword-based Text Data Collection and Analysis

Integrated Framework for Keyword-based Text Data Collection and Analysis Sensors and Materials, Vol. 30, No. 3 (2018) 439 445 MYU Tokyo 439 S & M 1506 Integrated Framework for Keyword-based Text Data Collection and Analysis Minki Cha, 1 Jung-Hyok Kwon, 1 Sol-Bee Lee, 1 Jaehoon

More information

International Journal of Advanced Computer Technology (IJACT) ISSN: CLUSTERING OF WEB QUERY RESULTS USING ENHANCED K-MEANS ALGORITHM

International Journal of Advanced Computer Technology (IJACT) ISSN: CLUSTERING OF WEB QUERY RESULTS USING ENHANCED K-MEANS ALGORITHM CLUSTERING OF WEB QUERY RESULTS USING ENHANCED K-MEANS ALGORITHM M.Manikantan, Assistant Professor (Senior Grade), Department of MCA, Kumaraguru College of Technology, Coimbatore, Tamilnadu. Abstract :

More information

Ontology-Based Web Query Classification for Research Paper Searching

Ontology-Based Web Query Classification for Research Paper Searching Ontology-Based Web Query Classification for Research Paper Searching MyoMyo ThanNaing University of Technology(Yatanarpon Cyber City) Mandalay,Myanmar Abstract- In web search engines, the retrieval of

More information

Implementation of GP-GPU with SIMT Architecture in the Embedded Environment

Implementation of GP-GPU with SIMT Architecture in the Embedded Environment , pp.221-226 http://dx.doi.org/10.14257/ijmue.2014.9.4.23 Implementation of GP-GPU with SIMT Architecture in the Embedded Environment Kwang-yeob Lee and Jae-chang Kwak 1 * Dept. of Computer Engineering,

More information

A Novel Model for Home Media Streaming Service in Cloud Computing Environment

A Novel Model for Home Media Streaming Service in Cloud Computing Environment , pp.265-274 http://dx.doi.org/10.14257/ijsh.2013.7.6.26 A Novel Model for Home Media Streaming Service in Cloud Computing Environment Yun Cui 1, Myoungjin Kim 1 and Hanku Lee1, 2,* 1 Department of Internet

More information

Automatic Pipeline Generation by the Sequential Segmentation and Skelton Construction of Point Cloud

Automatic Pipeline Generation by the Sequential Segmentation and Skelton Construction of Point Cloud , pp.43-47 http://dx.doi.org/10.14257/astl.2014.67.11 Automatic Pipeline Generation by the Sequential Segmentation and Skelton Construction of Point Cloud Ashok Kumar Patil, Seong Sill Park, Pavitra Holi,

More information

A Hierarchical Document Clustering Approach with Frequent Itemsets

A Hierarchical Document Clustering Approach with Frequent Itemsets A Hierarchical Document Clustering Approach with Frequent Itemsets Cheng-Jhe Lee, Chiun-Chieh Hsu, and Da-Ren Chen Abstract In order to effectively retrieve required information from the large amount of

More information

A modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems

A modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems A modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems Anestis Gkanogiannis and Theodore Kalamboukis Department of Informatics Athens University

More information

Analyzing and Improving Load Balancing Algorithm of MooseFS

Analyzing and Improving Load Balancing Algorithm of MooseFS , pp. 169-176 http://dx.doi.org/10.14257/ijgdc.2014.7.4.16 Analyzing and Improving Load Balancing Algorithm of MooseFS Zhang Baojun 1, Pan Ruifang 1 and Ye Fujun 2 1. New Media Institute, Zhejiang University

More information

A Simple Model for Estimating Power Consumption of a Multicore Server System

A Simple Model for Estimating Power Consumption of a Multicore Server System , pp.153-160 http://dx.doi.org/10.14257/ijmue.2014.9.2.15 A Simple Model for Estimating Power Consumption of a Multicore Server System Minjoong Kim, Yoondeok Ju, Jinseok Chae and Moonju Park School of

More information

Minoru SASAKI and Kenji KITA. Department of Information Science & Intelligent Systems. Faculty of Engineering, Tokushima University

Minoru SASAKI and Kenji KITA. Department of Information Science & Intelligent Systems. Faculty of Engineering, Tokushima University Information Retrieval System Using Concept Projection Based on PDDP algorithm Minoru SASAKI and Kenji KITA Department of Information Science & Intelligent Systems Faculty of Engineering, Tokushima University

More information

Cluster Analysis (b) Lijun Zhang

Cluster Analysis (b) Lijun Zhang Cluster Analysis (b) Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Grid-Based and Density-Based Algorithms Graph-Based Algorithms Non-negative Matrix Factorization Cluster Validation Summary

More information

A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems

A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems Anestis Gkanogiannis and Theodore Kalamboukis Department of Informatics Athens University of Economics

More information

Research Article Apriori Association Rule Algorithms using VMware Environment

Research Article Apriori Association Rule Algorithms using VMware Environment Research Journal of Applied Sciences, Engineering and Technology 8(2): 16-166, 214 DOI:1.1926/rjaset.8.955 ISSN: 24-7459; e-issn: 24-7467 214 Maxwell Scientific Publication Corp. Submitted: January 2,

More information

MATRIX BASED INDEXING TECHNIQUE FOR VIDEO DATA

MATRIX BASED INDEXING TECHNIQUE FOR VIDEO DATA Journal of Computer Science, 9 (5): 534-542, 2013 ISSN 1549-3636 2013 doi:10.3844/jcssp.2013.534.542 Published Online 9 (5) 2013 (http://www.thescipub.com/jcs.toc) MATRIX BASED INDEXING TECHNIQUE FOR VIDEO

More information

Balancing Manual and Automatic Indexing for Retrieval of Paper Abstracts

Balancing Manual and Automatic Indexing for Retrieval of Paper Abstracts Balancing Manual and Automatic Indexing for Retrieval of Paper Abstracts Kwangcheol Shin 1, Sang-Yong Han 1, and Alexander Gelbukh 1,2 1 Computer Science and Engineering Department, Chung-Ang University,

More information

Learning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search

Learning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search 1 / 33 Learning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search Bernd Wittefeld Supervisor Markus Löckelt 20. July 2012 2 / 33 Teaser - Google Web History http://www.google.com/history

More information

Study on the Signboard Region Detection in Natural Image

Study on the Signboard Region Detection in Natural Image , pp.179-184 http://dx.doi.org/10.14257/astl.2016.140.34 Study on the Signboard Region Detection in Natural Image Daeyeong Lim 1, Youngbaik Kim 2, Incheol Park 1, Jihoon seung 1, Kilto Chong 1,* 1 1567

More information

Research on Autonomic Control System Connection Goal-model and Fault-tree

Research on Autonomic Control System Connection Goal-model and Fault-tree , pp.47-53 http://dx.doi.org/10.14257/astl.2016.129.10 Research on Autonomic Control System Connection Goal-model and Fault-tree Dongbeom Ko 1, Teayoung Kim 1, Sungjoo Kang 2, Ingeol Chun 2, Jeongmin Park

More information

Automatic New Topic Identification in Search Engine Transaction Log Using Goal Programming

Automatic New Topic Identification in Search Engine Transaction Log Using Goal Programming Proceedings of the 2012 International Conference on Industrial Engineering and Operations Management Istanbul, Turkey, July 3 6, 2012 Automatic New Topic Identification in Search Engine Transaction Log

More information

Text Analytics (Text Mining)

Text Analytics (Text Mining) CSE 6242 / CX 4242 Apr 1, 2014 Text Analytics (Text Mining) Concepts and Algorithms Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer,

More information

Research on Heterogeneous Communication Network for Power Distribution Automation

Research on Heterogeneous Communication Network for Power Distribution Automation 3rd International Conference on Material, Mechanical and Manufacturing Engineering (IC3ME 2015) Research on Heterogeneous Communication Network for Power Distribution Automation Qiang YU 1,a*, Hui HUANG

More information

Sentiment Analysis for Customer Review Sites

Sentiment Analysis for Customer Review Sites Sentiment Analysis for Customer Review Sites Chi-Hwan Choi 1, Jeong-Eun Lee 2, Gyeong-Su Park 2, Jonghwa Na 3, Wan-Sup Cho 4 1 Dept. of Bio-Information Technology 2 Dept. of Business Data Convergence 3

More information

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN:

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN: IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 20131 Improve Search Engine Relevance with Filter session Addlin Shinney R 1, Saravana Kumar T

More information

A Personal Information Retrieval System in a Web Environment

A Personal Information Retrieval System in a Web Environment Vol.87 (Art, Culture, Game, Graphics, Broadcasting and Digital Contents 2015), pp.42-46 http://dx.doi.org/10.14257/astl.2015.87.10 A Personal Information Retrieval System in a Web Environment YoungDeok

More information

Distributed similarity search algorithm in distributed heterogeneous multimedia databases

Distributed similarity search algorithm in distributed heterogeneous multimedia databases Information Processing Letters 75 (2000) 35 42 Distributed similarity search algorithm in distributed heterogeneous multimedia databases Ju-Hong Lee a,1, Deok-Hwan Kim a,2, Seok-Lyong Lee a,3, Chin-Wan

More information

Facial expression recognition using shape and texture information

Facial expression recognition using shape and texture information 1 Facial expression recognition using shape and texture information I. Kotsia 1 and I. Pitas 1 Aristotle University of Thessaloniki pitas@aiia.csd.auth.gr Department of Informatics Box 451 54124 Thessaloniki,

More information

Lane Detection using Fuzzy C-Means Clustering

Lane Detection using Fuzzy C-Means Clustering Lane Detection using Fuzzy C-Means Clustering Kwang-Baek Kim, Doo Heon Song 2, Jae-Hyun Cho 3 Dept. of Computer Engineering, Silla University, Busan, Korea 2 Dept. of Computer Games, Yong-in SongDam University,

More information

Massive Scalability With InterSystems IRIS Data Platform

Massive Scalability With InterSystems IRIS Data Platform Massive Scalability With InterSystems IRIS Data Platform Introduction Faced with the enormous and ever-growing amounts of data being generated in the world today, software architects need to pay special

More information

A Model for Information Retrieval Agent System Based on Keywords Distribution

A Model for Information Retrieval Agent System Based on Keywords Distribution A Model for Information Retrieval Agent System Based on Keywords Distribution Jae-Woo LEE Dept of Computer Science, Kyungbok College, 3, Sinpyeong-ri, Pocheon-si, 487-77, Gyeonggi-do, Korea It2c@koreaackr

More information

A Load Balancing Scheme for Games in Wireless Sensor Networks

A Load Balancing Scheme for Games in Wireless Sensor Networks , pp.89-94 http://dx.doi.org/10.14257/astl.2013.42.21 A Load Balancing Scheme for Games in Wireless Sensor Networks Hye-Young Kim 1 1 Major in Game Software, School of Games, Hongik University, Chungnam,

More information

Bread Water Content Measurement Based on Hyperspectral Imaging

Bread Water Content Measurement Based on Hyperspectral Imaging 93 Bread Water Content Measurement Based on Hyperspectral Imaging Zhi Liu 1, Flemming Møller 1.2 1 Department of Informatics and Mathematical Modelling, Technical University of Denmark, Kgs. Lyngby, Denmark

More information

Department of Computer Science and Engineering B.E/B.Tech/M.E/M.Tech : B.E. Regulation: 2013 PG Specialisation : _

Department of Computer Science and Engineering B.E/B.Tech/M.E/M.Tech : B.E. Regulation: 2013 PG Specialisation : _ COURSE DELIVERY PLAN - THEORY Page 1 of 6 Department of Computer Science and Engineering B.E/B.Tech/M.E/M.Tech : B.E. Regulation: 2013 PG Specialisation : _ LP: CS6007 Rev. No: 01 Date: 27/06/2017 Sub.

More information

SPARSE COMPONENT ANALYSIS FOR BLIND SOURCE SEPARATION WITH LESS SENSORS THAN SOURCES. Yuanqing Li, Andrzej Cichocki and Shun-ichi Amari

SPARSE COMPONENT ANALYSIS FOR BLIND SOURCE SEPARATION WITH LESS SENSORS THAN SOURCES. Yuanqing Li, Andrzej Cichocki and Shun-ichi Amari SPARSE COMPONENT ANALYSIS FOR BLIND SOURCE SEPARATION WITH LESS SENSORS THAN SOURCES Yuanqing Li, Andrzej Cichocki and Shun-ichi Amari Laboratory for Advanced Brain Signal Processing Laboratory for Mathematical

More information

Vector Space Models: Theory and Applications

Vector Space Models: Theory and Applications Vector Space Models: Theory and Applications Alexander Panchenko Centre de traitement automatique du langage (CENTAL) Université catholique de Louvain FLTR 2620 Introduction au traitement automatique du

More information

Online Version Only. Book made by this file is ILLEGAL. Design and Implementation of Binary File Similarity Evaluation System. 1.

Online Version Only. Book made by this file is ILLEGAL. Design and Implementation of Binary File Similarity Evaluation System. 1. , pp.1-10 http://dx.doi.org/10.14257/ijmue.2014.9.1.01 Design and Implementation of Binary File Similarity Evaluation System Sun-Jung Kim 2, Young Jun Yoo, Jungmin So 1, Jeong Gun Lee 1, Jin Kim 1 and

More information

FSRM Feedback Algorithm based on Learning Theory

FSRM Feedback Algorithm based on Learning Theory Send Orders for Reprints to reprints@benthamscience.ae The Open Cybernetics & Systemics Journal, 2015, 9, 699-703 699 FSRM Feedback Algorithm based on Learning Theory Open Access Zhang Shui-Li *, Dong

More information

E-Training Content Delivery Networking System for Augmented Reality Car Maintenance Training Application

E-Training Content Delivery Networking System for Augmented Reality Car Maintenance Training Application E-Training Content Delivery Networking System for Augmented Reality Car Maintenance Training Application Yu-Doo Kim and Il-Young Moon Korea University of Technology and Education kydman@koreatech.ac.kr

More information

Nonnegative Matrix Factorization with Orthogonality Constraints

Nonnegative Matrix Factorization with Orthogonality Constraints Nonnegative Matrix Factorization with Orthogonality Constraints Jiho Yoo and Seungjin Choi Department of Computer Science Pohang University of Science and Technology San 31 Hyoja-dong, Nam-gu, Pohang 790-784,

More information

Scalable Hybrid Search on Distributed Databases

Scalable Hybrid Search on Distributed Databases Scalable Hybrid Search on Distributed Databases Jungkee Kim 1,2 and Geoffrey Fox 2 1 Department of Computer Science, Florida State University, Tallahassee FL 32306, U.S.A., jungkkim@cs.fsu.edu, 2 Community

More information

networks data threads

networks data threads Informatics Department Aristotle University of Thessaloniki A distributed framework for early trending topics detection on big social networks data threads AT HE NA VA KA L I, N I KOL AOS KI T MER IDIS,

More information

A Study on the IoT Sensor Interaction Transmission System based on BigData

A Study on the IoT Sensor Interaction Transmission System based on BigData Vol.123 (SoftTech 2016), pp.220-224 http://dx.doi.org/10.14257/astl.2016.123.41 A Study on the IoT Sensor Interaction Transmission System based on BigData Jin-Tae Park 1, Gyung-Soo Phyo 1 and Il-Young

More information

A Schedulability-Preserving Transformation Scheme from Boolean- Controlled Dataflow Networks to Petri Nets

A Schedulability-Preserving Transformation Scheme from Boolean- Controlled Dataflow Networks to Petri Nets Schedulability-Preserving ransformation Scheme from oolean- ontrolled Dataflow Networks to Petri Nets ong Liu Edward. Lee University of alifornia at erkeley erkeley,, 94720, US {congliu,eal}@eecs. berkeley.edu

More information

Semantic Estimation for Texts in Software Engineering

Semantic Estimation for Texts in Software Engineering Semantic Estimation for Texts in Software Engineering 汇报人 : Reporter:Xiaochen Li Dalian University of Technology, China 大连理工大学 2016 年 11 月 29 日 Oscar Lab 2 Ph.D. candidate at OSCAR Lab, in Dalian University

More information

A hardware design of optimized ORB algorithm with reduced hardware cost

A hardware design of optimized ORB algorithm with reduced hardware cost , pp.58-62 http://dx.doi.org/10.14257/astl.2013 A hardware design of optimized ORB algorithm with reduced hardware cost Kwang-yeob Lee 1, Kyung-jin Byun 2 1 Dept. of Computer Engineering, Seokyenog University,

More information

Dimensionality Reduction using Relative Attributes

Dimensionality Reduction using Relative Attributes Dimensionality Reduction using Relative Attributes Mohammadreza Babaee 1, Stefanos Tsoukalas 1, Maryam Babaee Gerhard Rigoll 1, and Mihai Datcu 1 Institute for Human-Machine Communication, Technische Universität

More information

BayesTH-MCRDR Algorithm for Automatic Classification of Web Document

BayesTH-MCRDR Algorithm for Automatic Classification of Web Document BayesTH-MCRDR Algorithm for Automatic Classification of Web Document Woo-Chul Cho and Debbie Richards Department of Computing, Macquarie University, Sydney, NSW 2109, Australia {wccho, richards}@ics.mq.edu.au

More information

CLEF-IP 2009: Exploring Standard IR Techniques on Patent Retrieval

CLEF-IP 2009: Exploring Standard IR Techniques on Patent Retrieval DCU @ CLEF-IP 2009: Exploring Standard IR Techniques on Patent Retrieval Walid Magdy, Johannes Leveling, Gareth J.F. Jones Centre for Next Generation Localization School of Computing Dublin City University,

More information

Bruno Martins. 1 st Semester 2012/2013

Bruno Martins. 1 st Semester 2012/2013 Link Analysis Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2012/2013 Slides baseados nos slides oficiais do livro Mining the Web c Soumen Chakrabarti. Outline 1 2 3 4

More information

A Graph-based Interpretation for Finding Solution Strategies of Contradiction Problems in the Butterfly Diagram

A Graph-based Interpretation for Finding Solution Strategies of Contradiction Problems in the Butterfly Diagram , pp.220-224 http://dx.doi.org/10.14257/astl.2016.139.47 A Graph-based Interpretation for Finding Solution Strategies of Contradiction Problems in the Butterfly Diagram Jung Suk Hyun 1 and Chan Jung Park

More information

Proximity Prestige using Incremental Iteration in Page Rank Algorithm

Proximity Prestige using Incremental Iteration in Page Rank Algorithm Indian Journal of Science and Technology, Vol 9(48), DOI: 10.17485/ijst/2016/v9i48/107962, December 2016 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Proximity Prestige using Incremental Iteration

More information

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data American Journal of Applied Sciences (): -, ISSN -99 Science Publications Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data Ibrahiem M.M. El Emary and Ja'far

More information

An Efficient Approach for Color Pattern Matching Using Image Mining

An Efficient Approach for Color Pattern Matching Using Image Mining An Efficient Approach for Color Pattern Matching Using Image Mining * Manjot Kaur Navjot Kaur Master of Technology in Computer Science & Engineering, Sri Guru Granth Sahib World University, Fatehgarh Sahib,

More information

MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti

MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16 MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti 1 Department

More information

Triple Indexing: An Efficient Technique for Fast Phrase Query Evaluation

Triple Indexing: An Efficient Technique for Fast Phrase Query Evaluation Triple Indexing: An Efficient Technique for Fast Phrase Query Evaluation Shashank Gugnani BITS-Pilani, K.K. Birla Goa Campus Goa, India - 403726 Rajendra Kumar Roul BITS-Pilani, K.K. Birla Goa Campus Goa,

More information

A Spatial Point Pattern Analysis to Recognize Fail Bit Patterns in Semiconductor Manufacturing

A Spatial Point Pattern Analysis to Recognize Fail Bit Patterns in Semiconductor Manufacturing A Spatial Point Pattern Analysis to Recognize Fail Bit Patterns in Semiconductor Manufacturing Youngji Yoo, Seung Hwan Park, Daewoong An, Sung-Shick Shick Kim, Jun-Geol Baek Abstract The yield management

More information

Equi-sized, Homogeneous Partitioning

Equi-sized, Homogeneous Partitioning Equi-sized, Homogeneous Partitioning Frank Klawonn and Frank Höppner 2 Department of Computer Science University of Applied Sciences Braunschweig /Wolfenbüttel Salzdahlumer Str 46/48 38302 Wolfenbüttel,

More information

Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web

Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web Thaer Samar 1, Alejandro Bellogín 2, and Arjen P. de Vries 1 1 Centrum Wiskunde & Informatica, {samar,arjen}@cwi.nl

More information

Improvements and Implementation of Hierarchical Clustering based on Hadoop Jun Zhang1, a, Chunxiao Fan1, Yuexin Wu2,b, Ao Xiao1

Improvements and Implementation of Hierarchical Clustering based on Hadoop Jun Zhang1, a, Chunxiao Fan1, Yuexin Wu2,b, Ao Xiao1 3rd International Conference on Machinery, Materials and Information Technology Applications (ICMMITA 2015) Improvements and Implementation of Hierarchical Clustering based on Hadoop Jun Zhang1, a, Chunxiao

More information

Collaborative Filtering Recommender System

Collaborative Filtering Recommender System International Journal of Emerging Trends in Science and Technology Collaborative Filtering Recommender System Authors Anvitha Hegde 1, Savitha K Shetty 2 1 M.Tech, Dept. of ISE, MSRIT, Bangalore 2 Assistant

More information

A Kinect Sensor based Windows Control Interface

A Kinect Sensor based Windows Control Interface , pp.113-124 http://dx.doi.org/10.14257/ijca.2014.7.3.12 A Kinect Sensor based Windows Control Interface Sang-Hyuk Lee 1 and Seung-Hyun Oh 2 Department of Computer Science, Dongguk University, Gyeongju,

More information

DOCUMENT INDEXING USING INDEPENDENT TOPIC EXTRACTION. Yu-Hwan Kim and Byoung-Tak Zhang

DOCUMENT INDEXING USING INDEPENDENT TOPIC EXTRACTION. Yu-Hwan Kim and Byoung-Tak Zhang DOCUMENT INDEXING USING INDEPENDENT TOPIC EXTRACTION Yu-Hwan Kim and Byoung-Tak Zhang School of Computer Science and Engineering Seoul National University Seoul 5-7, Korea yhkim,btzhang bi.snu.ac.kr ABSTRACT

More information

Index Terms:- Document classification, document clustering, similarity measure, accuracy, classifiers, clustering algorithms.

Index Terms:- Document classification, document clustering, similarity measure, accuracy, classifiers, clustering algorithms. International Journal of Scientific & Engineering Research, Volume 5, Issue 10, October-2014 559 DCCR: Document Clustering by Conceptual Relevance as a Factor of Unsupervised Learning Annaluri Sreenivasa

More information

Compression of View Dependent Displacement Maps

Compression of View Dependent Displacement Maps J. Wang and K. J. Dana: Compression of view dependent displacement maps. In Texture 2005: Proceedings of the 4th International Workshop on Texture Analysis and Synthesis, pp. 143 148, 2005. Compression

More information

A Two-phase Distributed Training Algorithm for Linear SVM in WSN

A Two-phase Distributed Training Algorithm for Linear SVM in WSN Proceedings of the World Congress on Electrical Engineering and Computer Systems and Science (EECSS 015) Barcelona, Spain July 13-14, 015 Paper o. 30 A wo-phase Distributed raining Algorithm for Linear

More information

Context Based Web Indexing For Semantic Web

Context Based Web Indexing For Semantic Web IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 12, Issue 4 (Jul. - Aug. 2013), PP 89-93 Anchal Jain 1 Nidhi Tyagi 2 Lecturer(JPIEAS) Asst. Professor(SHOBHIT

More information

Improving Suffix Tree Clustering Algorithm for Web Documents

Improving Suffix Tree Clustering Algorithm for Web Documents International Conference on Logistics Engineering, Management and Computer Science (LEMCS 2015) Improving Suffix Tree Clustering Algorithm for Web Documents Yan Zhuang Computer Center East China Normal

More information

Open Access Self-Growing RBF Neural Network Approach for Semantic Image Retrieval

Open Access Self-Growing RBF Neural Network Approach for Semantic Image Retrieval Send Orders for Reprints to reprints@benthamscience.ae The Open Automation and Control Systems Journal, 2014, 6, 1505-1509 1505 Open Access Self-Growing RBF Neural Networ Approach for Semantic Image Retrieval

More information

An Effective Hardware Architecture for Bump Mapping Using Angular Operation

An Effective Hardware Architecture for Bump Mapping Using Angular Operation An Effective Hardware Architecture for Bump Mapping Using Angular Operation Seung-Gi Lee, Woo-Chan Park, Won-Jong Lee, Tack-Don Han, and Sung-Bong Yang Media System Lab. (National Research Lab.) Dept.

More information

XML Clustering by Bit Vector

XML Clustering by Bit Vector XML Clustering by Bit Vector WOOSAENG KIM Department of Computer Science Kwangwoon University 26 Kwangwoon St. Nowongu, Seoul KOREA kwsrain@kw.ac.kr Abstract: - XML is increasingly important in data exchange

More information

Medical Records Clustering Based on the Text Fetched from Records

Medical Records Clustering Based on the Text Fetched from Records Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,

More information

Inverted Index for Fast Nearest Neighbour

Inverted Index for Fast Nearest Neighbour Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,

More information

Web Information Retrieval using WordNet

Web Information Retrieval using WordNet Web Information Retrieval using WordNet Jyotsna Gharat Asst. Professor, Xavier Institute of Engineering, Mumbai, India Jayant Gadge Asst. Professor, Thadomal Shahani Engineering College Mumbai, India ABSTRACT

More information

CHAPTER 3 ASSOCIATON RULE BASED CLUSTERING

CHAPTER 3 ASSOCIATON RULE BASED CLUSTERING 41 CHAPTER 3 ASSOCIATON RULE BASED CLUSTERING 3.1 INTRODUCTION This chapter describes the clustering process based on association rule mining. As discussed in the introduction, clustering algorithms have

More information

LRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier

LRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier LRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier Wang Ding, Songnian Yu, Shanqing Yu, Wei Wei, and Qianfeng Wang School of Computer Engineering and Science, Shanghai University, 200072

More information