ijade Reporter An Intelligent Multi-agent Based Context Aware News Reporting System

Size: px
Start display at page:

Download "ijade Reporter An Intelligent Multi-agent Based Context Aware News Reporting System"

Transcription

1 ijade Reporter An Intelligent Multi-agent Based Context Aware Reporting System Eddie C.L. Chan and Raymond S.T. Lee The Department of Computing, The Hong Kong Polytechnic University, Hung Hong, Kowloon, Hong Kong Abstract. In this paper, an Intelligent Context Aware Reporting System called ijade Reporter is presented. This system focuses on how context mining techniques are applied on news reporting under a multi-agent architecture, and categorize news content by information retrieval algorithm. This paper also investigates how to improve the similarity measurement between documents by ontology with WordNet graph structure of words. In a web querying case, a common information retrieval algorithm, Term Frequency with Inverse Document Frequency (TFIDF) is used to cluster news contents. The proposed system provides a simple, fast and efficient query in WWW. The proposed system makes use of multi-agent technology to increase the scalability and efficiency of the system. By using TFIDF algorithm and multi-agent based techniques, an online updating news reporter from popular Website, such as BBC & CNN Website is developed. 1 Introduction Retrieving, categorizing and reporting useful news from the web is one of the most challenging problems in machine learning. The common interest among researchers working in diverse fields is motivated by our remarkable innate ability to study and to report news in daily life. The current search engines provided by Google or Yahoo! on news retrieval do not have a logical categorization which is difficult for reading. In this paper, an intelligent multi-agent based context aware news reporting agents system called ijade Reporter is presented. For system implementation, ijade [8] (intelligent Java Agent Development Environment) is adopted to provide an intelligent agent-based platform for the implementation of various AI functionalities. 2 Web Context Mining (WCM) An Overview In web content mining, popular search engines (such as Lycos, WebCrawler, Infoseek, and Alta Vista) provide some basic web searching functionalities. However, they fail to provide concrete and structural information [10]. In recent years, interest has been focused on how to provide a higher level (semantic level) organization for semi-structured or even unstructured information on the Web using AI-based Web mining techniques. Agent-based systems such as Harvest [2], FAQ-Finder [4], Information Manifold [11], OCCAM [9], and Parasite [12] rely either on pre-specified domain specific in- R. Khosla et al. (Eds.): KES 2005, LNAI 3681, pp , Springer-Verlag Berlin Heidelberg 2005

2 694 Eddie C.L. Chan and Raymond S.T. Lee formation, or on hard-coded information sources to retrieve and interpret documents. For instance, Harvest system [4] relies on semi-structured documents to improve its ability to extract information. Although it knows how to find author and title information in Latex documents and how to strip position information from postscript files, it fails to discover new documents or to learn new document structures. Similarly, FAQ-Finder [6] extracts answers to frequently asked questions (FAQs) from FAQ files available on the web with priori knowledge. Web page ontology [3] can be defined in different ways depending on the objective of the ontology. Most of the web sources have its semantic meaning. Techniques for Ontology Generation, Ontology Mediation, Ontology Population and Reasoning from the Semantic Web have all been major areas of focus. Most web documents are organized in a content hierarchy, with more general nodes placed closer to the root of hierarchy. Each node is labeled by a set of keywords describing the content of documents that are placed in the node. Each document is described by a one-sentence summary including a hyperlink that points to the actual Web document located somewhere on the Web. 3 ijade Reporter A System Overview The ijade Reporter proposed in this paper consists of 4 types of ijade agents (figure 1): 1) ijade Search Agent 2) ijade Categorize Agent 3) ijade Update Agent 4) ijade Report Agent. BBC Online Search Engine + By user request, Servlet invoke Report agent Wordnet Database ijade Search Agent Final Search Result ijade Reporter Network../ shtml../ shtml../ shtml../ shtml../ shtml../ shtml Categorized Read online Website Extract useful Links ijade Categorize Agent Uncategorized Categorized Store Content ijade Update Agent Database Fig. 1. System Overview of ijade Reporter

3 ijade Reporter An Intelligent Multi-agent Based Context ijade Search Agent A mobile ijade agent aims at searching news from popular news websites such as BBC [1] and CNN [3] news websites. It connects to several different popular news search engines; combines the result into search lists and integrates all news search engines using WordNet [14] Dictionary to provide reconstruction and understanding of news query. 3.2 ijade Categorize Agent A stationery ijade agent aims at categorizing and clustering news into different regions. categorization is based on calculating the similarity between the web documents by using TFIDF (Term Frequency with Inverse Document Frequency method). TFIDF is a simple but powerful algorithm for machine learning to understand semantic document. It exhibits strong characteristics of word frequencies presented in a document. Vector Space Model (VSM) is used for document representation Term Frequency with Inverse Document Frequency (TFIDF) Algorithm TFIDF [13] is an information retrieval algorithm which aims at calculating a specific value of the semantic meaning among words and documents. TFIDF is simple but powerful to express the abstract idea of semantic meaning. Vector Space Model (VSM) is adopted to represent Web documents. The documents constitute the whole vector space. TFIDF is being used as a weight of term in document. If a term t occurs in document d, di di ( N idf ) w = tf log / (1) where t i is a word (or a term) in document collection, w di is the weight of t i, tf di is term frequency (term count of each word in a document) of t i, N is the number of total documents in the collection and idf di is the number of document in which t i appears Similarity Between Two Documents! Each document d is represented by a vector: V d = ( t1, wd1;...; ti, wdi ;...; t n, wdn ) where t i is a word (or a term) in document collection and w di is TFIDF value of t i in d. By calculating the Euclidean distance of two vectors of two documents, the similarity can be computed. ( d ) Sim d1, di d1 d 2 = d d (2) Modified TFIDF by Ontology In this paper, we proposed ontology-based term frequency (otf) to construct the web ontology on news categorization and retrieval. Assume each word is related to other word, a relationship graph can be constructed as shown in Fig. 3.

4 696 Eddie C.L. Chan and Raymond S.T. Lee Fig. 2. Euclidean distance of two vectors Fig. 3. The word graph example of Destroy and Damage In Fig.3., destroy" and "damage" are similar from the ontology point of view as they are at the same level of the hierarchal structure and have relation between each other. The similarity of two words can be measured by the distance of the tree structure. By comparing the meanings of two terms, the ontology-based term frequency (otf) can be obtained, otf 1 = tf 1 x (1+(1/D(t 1,t 2 )) tf 2 (3) where t 1, t 2, are different terms; otf 1 is ontology-based term frequency of t 1; tf 1, tf 2 are term frequency respectively to t 1 and t 2 ; D(t 1,t 2 ) is the depth between t 1 and t 2. D(t 1,t 2 ) can be calculated by using WordNet. For example, assume the term frequencies of "destroy" and damage are 3 and 2 respectively and the depth between "destroy" and "damage" is 3. The ontology-based term frequency of "destroy" will be otf 1 = 3 x (1+(1/3)) 2 = The ontology-based term frequency of "damage" will be otf 2 = 2 x (1+(1/3)) 3 = After adjusting each term frequency, their term frequency value could be increased, so that two terms will become more significant after computing TFIDF.

5 ijade Reporter An Intelligent Multi-agent Based Context Clustering Technique In this system, we adopt hierarchical clustering [7] technique to cluster the uncategorized news into the shortest distance of particular news inside same category or region. In hierarchical clustering, there is a set of document W={w1, w2 wi}, every document wi in W are considered to be a cluster ci, such that C={c1,c2 ci}. Two clusters ci and cj are randomly chosen and their similarity sim(ci,cj) is calculated. They are merged together if the similarity value is greater than the threshold value. Otherwise, this step repeats until reaching the termination condition. Fig. 4. Cluster Process 3.3 ijade Update Agent A mobile ijade agent aims at updating and collecting the news from popular news websites. When user clicks the update button at different categories, news from different news websites can be obtained. Also, the news will be re-categorized and stored in the user local storage. In additional, this agent calculates the preliminary analysis result based on the semantic relationship between words in a document. The formulation is to simply extract html tag by using web structural mining techniques. The metadata (keywords and title of news) of the html documents is captured, which is used to obtain related news links. After a list of links is given, the news will be further explored to capture related picture links, contents and the term frequency of news is then calculated. In each update process, a region or a category is chosen for information update. Even when the client-side goes offline, the agent will continue to perform its job until the job is finished. 3.4 ijade Report Agent A stationary ijade agent aims at news reporting. This agent provides a vector list of news with headers, short introduction and content with highlight keywords. Graphics and sound are added to increase the attractiveness in news reporting. 4 Experiments In this section, the precision rate for news categorization is tested based on the news database consists of over 5000 records. The news articles are subdivided into six

6 698 Eddie C.L. Chan and Raymond S.T. Lee categories. They are: Business, Health, Education, Science, Technology and Entertainment. A test set contains over 100 news items without categorization is used to evaluate the performance of the proposed model. To determine whether the categorization or not, human judgment is used. Fig. 5. Reporting Result Table 1 and 2 revealed that by using ijade Reporter, the precision rate is over 95% with clustering time around 1185 seconds. Compared with other methods, such as FFBP neural network as well as TFIDF with hierarchical clustering technique, ijade Reporter gives a better precision rate for categorizing news with reasonable time taken for clustering. Table 1. Comparison of the Precision Rate class/category FFBP NN TFIDF+hierarchical ijade (3-Layer) clustering Reporter Business 68.40% 90.30% 95.13% Health 58.45% 93.37% 96.72% Education 73.43% 93.88% 95.14% Science/Nature 67.00% 94.33% 97.44% Technology 70.22% 93.78% 94.38% Entertainment 50.23% 91.67% 95.24% Average 64.62% 93.72% 95.68% Table 2. Comparison of time taken for clustering 3 different sets of total 100 news class/category FFBP NN TFIDF+hierarchical ijade (3-Layer) Clustering Reporter Set seconds 1114 seconds 1175 seconds Set seconds 1123 seconds 1186 seconds Set seconds 1176 seconds 1195 seconds Average 5295 seconds 1138 seconds 1185 seconds

7 ijade Reporter An Intelligent Multi-agent Based Context Conclusion In this paper, an intelligent agent-based context aware news reporting system ijade Reporter is proposed. Experiments show that ijade Reporter provides both an effective and efficient solution for news categorization. By integrating with different popular news websites (e.g. BBC, CNN news) to collect, categorize and analysis news, it provides a convenient and semantic-based news retrieval and reporting solution. Acknowledgment This work was partially supported by the ijade projects B-Q569, A-PF74 and Cogito ijade project PG50 of the Hong Kong Polytechnic University. References 1. BBC Website, 2. C. M. Brown and B. B. Danzig, The harvest information discovery and access system, In Proc. 2nd International World Wide Web Conference, pp , CNN Website, 4. R. B. Doorenbos, O. Etzioni and D. S. weld, A scalable comparison shopping agent for the world wide web, Technical Report TR , University of Minnesiota, Etzioni, D. S. Weld, and R. B. Doorenbos, A Scalable Comparison - Shopping Agent for the World Wide Web, Univ. Washington, Dept. Comput. Sci., Seattle, Tech. Rep. TR, P17, , J. O. Everett, D. G. Bobrow, R. Stolle, R. S. Crouch, V. Paiva, C. Condoravdi, M. Berg, L. Polanyi, Making ontologies work for resolving redundancies across documents, Communications of the ACM 45 (2): 55-60, J. Ham and M. Kamber, Data Mining Concepts and Techniques, Morgan Kaufmann Publishers, ijade official site: 9. C. Kwok and D. Weld, Planning to gather information, In Proc. 14th Nat. Conf. AI, pp , H. V. Leighton and J. Srivastava, Precision among WWW Search Services (Search Engines): Alta Vista, Excite, Hotbot, Infoseek, Lycos, A.Y. Levy, T. Kirk and Y. Sagiv, The information manifold, AAAI Spring Symposium on Information Gathering From Heterogeneous Distributed Environments, pp , E. Spertus, Parasite: Mining structural information on the web, In Proc. 6th WWW Conf., pp , C. W. Wen, H. Liu, W. X. Wen and J. Zheng, A Distributed Hierarchical Clustering System for Web Mining, WAIM2002, LNCS2118, pp , Springer-Verlag Berlin Heidelberg, WordNet,

Automated Online News Classification with Personalization

Automated Online News Classification with Personalization Automated Online News Classification with Personalization Chee-Hong Chan Aixin Sun Ee-Peng Lim Center for Advanced Information Systems, Nanyang Technological University Nanyang Avenue, Singapore, 639798

More information

Deep Web Crawling and Mining for Building Advanced Search Application

Deep Web Crawling and Mining for Building Advanced Search Application Deep Web Crawling and Mining for Building Advanced Search Application Zhigang Hua, Dan Hou, Yu Liu, Xin Sun, Yanbing Yu {hua, houdan, yuliu, xinsun, yyu}@cc.gatech.edu College of computing, Georgia Tech

More information

! " # Formal Classification. Logics for Data and Knowledge Representation. Classification Hierarchies (1) Classification Hierarchies (2)

!  # Formal Classification. Logics for Data and Knowledge Representation. Classification Hierarchies (1) Classification Hierarchies (2) ,!((,.+#$),%$(-&.& *,(-$)%&.'&%!&, Logics for Data and Knowledge Representation Alessandro Agostini agostini@dit.unitn.it University of Trento Fausto Giunchiglia fausto@dit.unitn.it Formal Classification!$%&'()*$#)

More information

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.

More information

Overview of Web Mining Techniques and its Application towards Web

Overview of Web Mining Techniques and its Application towards Web Overview of Web Mining Techniques and its Application towards Web *Prof.Pooja Mehta Abstract The World Wide Web (WWW) acts as an interactive and popular way to transfer information. Due to the enormous

More information

The Application Research of Semantic Web Technology and Clickstream Data Mart in Tourism Electronic Commerce Website Bo Liu

The Application Research of Semantic Web Technology and Clickstream Data Mart in Tourism Electronic Commerce Website Bo Liu International Conference on Education Technology, Management and Humanities Science (ETMHS 2015) The Application Research of Semantic Web Technology and Clickstream Data Mart in Tourism Electronic Commerce

More information

A Planning-Based Approach for the Automated Configuration of the Enterprise Service Bus

A Planning-Based Approach for the Automated Configuration of the Enterprise Service Bus A Planning-Based Approach for the Automated Configuration of the Enterprise Service Bus Zhen Liu, Anand Ranganathan, and Anton Riabov IBM T.J. Watson Research Center {zhenl,arangana,riabov}@us.ibm.com

More information

KEYWORD EXTRACTION FROM DESKTOP USING TEXT MINING TECHNIQUES

KEYWORD EXTRACTION FROM DESKTOP USING TEXT MINING TECHNIQUES KEYWORD EXTRACTION FROM DESKTOP USING TEXT MINING TECHNIQUES Dr. S.Vijayarani R.Janani S.Saranya Assistant Professor Ph.D.Research Scholar, P.G Student Department of CSE, Department of CSE, Department

More information

EXTRACTION INFORMATION ADAPTIVE WEB. The Amorphic system works to extract Web information for use in business intelligence applications.

EXTRACTION INFORMATION ADAPTIVE WEB. The Amorphic system works to extract Web information for use in business intelligence applications. By Dawn G. Gregg and Steven Walczak ADAPTIVE WEB INFORMATION EXTRACTION The Amorphic system works to extract Web information for use in business intelligence applications. Web mining has the potential

More information

Mining Quantitative Association Rules on Overlapped Intervals

Mining Quantitative Association Rules on Overlapped Intervals Mining Quantitative Association Rules on Overlapped Intervals Qiang Tong 1,3, Baoping Yan 2, and Yuanchun Zhou 1,3 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China {tongqiang,

More information

An Adaptive Agent for Web Exploration Based on Concept Hierarchies

An Adaptive Agent for Web Exploration Based on Concept Hierarchies An Adaptive Agent for Web Exploration Based on Concept Hierarchies Scott Parent, Bamshad Mobasher, Steve Lytinen School of Computer Science, Telecommunication and Information Systems DePaul University

More information

Enhancing Cluster Quality by Using User Browsing Time

Enhancing Cluster Quality by Using User Browsing Time Enhancing Cluster Quality by Using User Browsing Time Rehab Duwairi Dept. of Computer Information Systems Jordan Univ. of Sc. and Technology Irbid, Jordan rehab@just.edu.jo Khaleifah Al.jada' Dept. of

More information

Quagmire or Goldmine?

Quagmire or Goldmine? The World-Wide Wide Web: Quagmire or Goldmine? Oren Etzioni [Comm. of the ACM, Nov 1996] Presentation Credits: Shabnam Sobti 30 - OCT - 2002 WWW - Quagmire or Goldmine? 1 Agenda Prelude: The Internet Story

More information

A New Context Based Indexing in Search Engines Using Binary Search Tree

A New Context Based Indexing in Search Engines Using Binary Search Tree A New Context Based Indexing in Search Engines Using Binary Search Tree Aparna Humad Department of Computer science and Engineering Mangalayatan University, Aligarh, (U.P) Vikas Solanki Department of Computer

More information

Ontology Extraction from Heterogeneous Documents

Ontology Extraction from Heterogeneous Documents Vol.3, Issue.2, March-April. 2013 pp-985-989 ISSN: 2249-6645 Ontology Extraction from Heterogeneous Documents Kirankumar Kataraki, 1 Sumana M 2 1 IV sem M.Tech/ Department of Information Science & Engg

More information

Keywords Data alignment, Data annotation, Web database, Search Result Record

Keywords Data alignment, Data annotation, Web database, Search Result Record Volume 5, Issue 8, August 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Annotating Web

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK REVIEW PAPER ON IMPLEMENTATION OF DOCUMENT ANNOTATION USING CONTENT AND QUERYING

More information

TSS: A Hybrid Web Searches

TSS: A Hybrid Web Searches 410 TSS: A Hybrid Web Searches Li-Xin Han 1,2,3, Gui-Hai Chen 3, and Li Xie 3 1 Department of Mathematics, Nanjing University, Nanjing 210093, P.R. China 2 Department of Computer Science and Engineering,

More information

Semantic Web Mining and its application in Human Resource Management

Semantic Web Mining and its application in Human Resource Management International Journal of Computer Science & Management Studies, Vol. 11, Issue 02, August 2011 60 Semantic Web Mining and its application in Human Resource Management Ridhika Malik 1, Kunjana Vasudev 2

More information

[Gidhane* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116

[Gidhane* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY AN EFFICIENT APPROACH FOR TEXT MINING USING SIDE INFORMATION Kiran V. Gaidhane*, Prof. L. H. Patil, Prof. C. U. Chouhan DOI: 10.5281/zenodo.58632

More information

A Clustering Framework to Build Focused Web Crawlers for Automatic Extraction of Cultural Information

A Clustering Framework to Build Focused Web Crawlers for Automatic Extraction of Cultural Information A Clustering Framework to Build Focused Web Crawlers for Automatic Extraction of Cultural Information George E. Tsekouras *, Damianos Gavalas, Stefanos Filios, Antonios D. Niros, and George Bafaloukas

More information

A Parallel Computing Architecture for Information Processing Over the Internet

A Parallel Computing Architecture for Information Processing Over the Internet A Parallel Computing Architecture for Information Processing Over the Internet Wendy A. Lawrence-Fowler, Xiannong Meng, Richard H. Fowler, Zhixiang Chen Department of Computer Science, University of Texas

More information

GoNTogle: A Tool for Semantic Annotation and Search

GoNTogle: A Tool for Semantic Annotation and Search GoNTogle: A Tool for Semantic Annotation and Search Giorgos Giannopoulos 1,2, Nikos Bikakis 1,2, Theodore Dalamagas 2, and Timos Sellis 1,2 1 KDBSL Lab, School of ECE, Nat. Tech. Univ. of Athens, Greece

More information

Profile Based Information Retrieval

Profile Based Information Retrieval Profile Based Information Retrieval Athar Shaikh, Pravin Bhjantri, Shankar Pendse,V.K.Parvati Department of Information Science and Engineering, S.D.M.College of Engineering & Technology, Dharwad Abstract-This

More information

Integrating Image Content and its Associated Text in a Web Image Retrieval Agent

Integrating Image Content and its Associated Text in a Web Image Retrieval Agent From: AAAI Technical Report SS-97-03. Compilation copyright 1997, AAAI (www.aaai.org). All rights reserved. Integrating Image Content and its Associated Text in a Web Image Retrieval Agent Victoria Meza

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering Volume 3, Issue 3, March 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue:

More information

Behaviour Recovery and Complicated Pattern Definition in Web Usage Mining

Behaviour Recovery and Complicated Pattern Definition in Web Usage Mining Behaviour Recovery and Complicated Pattern Definition in Web Usage Mining Long Wang and Christoph Meinel Computer Department, Trier University, 54286 Trier, Germany {wang, meinel@}ti.uni-trier.de Abstract.

More information

Directory Search Engines Searching the Yahoo Directory

Directory Search Engines Searching the Yahoo Directory Searching on the WWW Directory Oriented Search Engines Often looking for some specific information WWW has a growing collection of Search Engines to aid in locating information The Search Engines return

More information

A Machine Learning Approach for Displaying Query Results in Search Engines

A Machine Learning Approach for Displaying Query Results in Search Engines A Machine Learning Approach for Displaying Query Results in Search Engines Tunga Güngör 1,2 1 Boğaziçi University, Computer Engineering Department, Bebek, 34342 İstanbul, Turkey 2 Visiting Professor at

More information

The Interestingness Tool for Search in the Web

The Interestingness Tool for Search in the Web The Interestingness Tool for Search in the Web e-print, Gilad Amar and Ran Shaltiel Software Engineering Department, The Jerusalem College of Engineering Azrieli POB 3566, Jerusalem, 91035, Israel iaakov@jce.ac.il,

More information

Competitive Intelligence and Web Mining:

Competitive Intelligence and Web Mining: Competitive Intelligence and Web Mining: Domain Specific Web Spiders American University in Cairo (AUC) CSCE 590: Seminar1 Report Dr. Ahmed Rafea 2 P age Khalid Magdy Salama 3 P age Table of Contents Introduction

More information

Ranking Web Pages by Associating Keywords with Locations

Ranking Web Pages by Associating Keywords with Locations Ranking Web Pages by Associating Keywords with Locations Peiquan Jin, Xiaoxiang Zhang, Qingqing Zhang, Sheng Lin, and Lihua Yue University of Science and Technology of China, 230027, Hefei, China jpq@ustc.edu.cn

More information

Data Mining of Web Access Logs Using Classification Techniques

Data Mining of Web Access Logs Using Classification Techniques Data Mining of Web Logs Using Classification Techniques Md. Azam 1, Asst. Prof. Md. Tabrez Nafis 2 1 M.Tech Scholar, Department of Computer Science & Engineering, Al-Falah School of Engineering & Technology,

More information

Multiterm Keyword Searching For Key Value Based NoSQL System

Multiterm Keyword Searching For Key Value Based NoSQL System Multiterm Keyword Searching For Key Value Based NoSQL System Pallavi Mahajan 1, Arati Deshpande 2 Department of Computer Engineering, PICT, Pune, Maharashtra, India. Pallavinarkhede88@gmail.com 1, ardeshpande@pict.edu

More information

Performance Improvement of Hardware-Based Packet Classification Algorithm

Performance Improvement of Hardware-Based Packet Classification Algorithm Performance Improvement of Hardware-Based Packet Classification Algorithm Yaw-Chung Chen 1, Pi-Chung Wang 2, Chun-Liang Lee 2, and Chia-Tai Chan 2 1 Department of Computer Science and Information Engineering,

More information

INTERNET FOR TEACHING. N.M.Tuan

INTERNET FOR TEACHING. N.M.Tuan INTERNET FOR TEACHING N.M.Tuan Agenda Chatting What is the Internet? Application of the Internet in teaching Information searching techniques Search engines Searching techniques Some useful links Chatting

More information

Annotation for the Semantic Web During Website Development

Annotation for the Semantic Web During Website Development Annotation for the Semantic Web During Website Development Peter Plessers and Olga De Troyer Vrije Universiteit Brussel, Department of Computer Science, WISE, Pleinlaan 2, 1050 Brussel, Belgium {Peter.Plessers,

More information

Text Classification and Clustering Using Kernels for Structured Data

Text Classification and Clustering Using Kernels for Structured Data Text Mining SVM Conclusion Text Classification and Clustering Using, pgeibel@uos.de DGFS Institut für Kognitionswissenschaft Universität Osnabrück February 2005 Outline Text Mining SVM Conclusion 1 Text

More information

BayesTH-MCRDR Algorithm for Automatic Classification of Web Document

BayesTH-MCRDR Algorithm for Automatic Classification of Web Document BayesTH-MCRDR Algorithm for Automatic Classification of Web Document Woo-Chul Cho and Debbie Richards Department of Computing, Macquarie University, Sydney, NSW 2109, Australia {wccho, richards}@ics.mq.edu.au

More information

Associating Terms with Text Categories

Associating Terms with Text Categories Associating Terms with Text Categories Osmar R. Zaïane Department of Computing Science University of Alberta Edmonton, AB, Canada zaiane@cs.ualberta.ca Maria-Luiza Antonie Department of Computing Science

More information

Log Information Mining Using Association Rules Technique: A Case Study Of Utusan Education Portal

Log Information Mining Using Association Rules Technique: A Case Study Of Utusan Education Portal Log Information Mining Using Association Rules Technique: A Case Study Of Utusan Education Portal Mohd Helmy Ab Wahab 1, Azizul Azhar Ramli 2, Nureize Arbaiy 3, Zurinah Suradi 4 1 Faculty of Electrical

More information

IJMIE Volume 2, Issue 9 ISSN:

IJMIE Volume 2, Issue 9 ISSN: WEB USAGE MINING: LEARNER CENTRIC APPROACH FOR E-BUSINESS APPLICATIONS B. NAVEENA DEVI* Abstract Emerging of web has put forward a great deal of challenges to web researchers for web based information

More information

Letter Pair Similarity Classification and URL Ranking Based on Feedback Approach

Letter Pair Similarity Classification and URL Ranking Based on Feedback Approach Letter Pair Similarity Classification and URL Ranking Based on Feedback Approach P.T.Shijili 1 P.G Student, Department of CSE, Dr.Nallini Institute of Engineering & Technology, Dharapuram, Tamilnadu, India

More information

Weighted Suffix Tree Document Model for Web Documents Clustering

Weighted Suffix Tree Document Model for Web Documents Clustering ISBN 978-952-5726-09-1 (Print) Proceedings of the Second International Symposium on Networking and Network Security (ISNNS 10) Jinggangshan, P. R. China, 2-4, April. 2010, pp. 165-169 Weighted Suffix Tree

More information

Incorporating Hyperlink Analysis in Web Page Clustering

Incorporating Hyperlink Analysis in Web Page Clustering Incorporating Hyperlink Analysis in Web Page Clustering Michael Chau School of Business The University of Hong Kong Pokfulam, Hong Kong +852 2859-1014 mchau@business.hku.hk Patrick Y. K. Chau School of

More information

Enhancing Cluster Quality by Using User Browsing Time

Enhancing Cluster Quality by Using User Browsing Time Enhancing Cluster Quality by Using User Browsing Time Rehab M. Duwairi* and Khaleifah Al.jada'** * Department of Computer Information Systems, Jordan University of Science and Technology, Irbid 22110,

More information

MetaNews: An Information Agent for Gathering News Articles On the Web

MetaNews: An Information Agent for Gathering News Articles On the Web MetaNews: An Information Agent for Gathering News Articles On the Web Dae-Ki Kang 1 and Joongmin Choi 2 1 Department of Computer Science Iowa State University Ames, IA 50011, USA dkkang@cs.iastate.edu

More information

A Universal Model for XML Information Retrieval

A Universal Model for XML Information Retrieval A Universal Model for XML Information Retrieval Maria Izabel M. Azevedo 1, Lucas Pantuza Amorim 2, and Nívio Ziviani 3 1 Department of Computer Science, State University of Montes Claros, Montes Claros,

More information

Web Mining. Data Mining and Text Mining (UIC Politecnico di Milano) Daniele Loiacono

Web Mining. Data Mining and Text Mining (UIC Politecnico di Milano) Daniele Loiacono Web Mining Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References q Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann Series in Data Management

More information

Using Shallow Natural Language Processing in a Just-In-Time Information Retrieval Assistant for Bloggers

Using Shallow Natural Language Processing in a Just-In-Time Information Retrieval Assistant for Bloggers Using Shallow Natural Language Processing in a Just-In-Time Information Retrieval Assistant for Bloggers Ang Gao and Derek Bridge Department of Computer Science, University College Cork, Ireland ang.gao87@gmail.com,

More information

Concept-Based Document Similarity Based on Suffix Tree Document

Concept-Based Document Similarity Based on Suffix Tree Document Concept-Based Document Similarity Based on Suffix Tree Document *P.Perumal Sri Ramakrishna Engineering College Associate Professor Department of CSE, Coimbatore perumalsrec@gmail.com R. Nedunchezhian Sri

More information

Semantic Web Mining. Diana Cerbu

Semantic Web Mining. Diana Cerbu Semantic Web Mining Diana Cerbu Contents Semantic Web Data mining Web mining Content web mining Structure web mining Usage web mining Semantic Web Mining Semantic web "The Semantic Web is a vision: the

More information

Web Mining. Data Mining and Text Mining (UIC Politecnico di Milano) Daniele Loiacono

Web Mining. Data Mining and Text Mining (UIC Politecnico di Milano) Daniele Loiacono Web Mining Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann Series in Data Management

More information

Extraction of Web Image Information: Semantic or Visual Cues?

Extraction of Web Image Information: Semantic or Visual Cues? Extraction of Web Image Information: Semantic or Visual Cues? Georgina Tryfou and Nicolas Tsapatsoulis Cyprus University of Technology, Department of Communication and Internet Studies, Limassol, Cyprus

More information

Information Discovery, Extraction and Integration for the Hidden Web

Information Discovery, Extraction and Integration for the Hidden Web Information Discovery, Extraction and Integration for the Hidden Web Jiying Wang Department of Computer Science University of Science and Technology Clear Water Bay, Kowloon Hong Kong cswangjy@cs.ust.hk

More information

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS 1 WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS BRUCE CROFT NSF Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts,

More information

Your Website as a Marketing Tool. Randy L. Martin R. L. Martin and Associates

Your Website as a Marketing Tool. Randy L. Martin R. L. Martin and Associates Your Website as a Marketing Tool Randy L. Martin R. L. Martin and Associates Getting Started Register Your Domain Name Pick something that people can associate with your company Pick something easy to

More information

SYSTEMS FOR NON STRUCTURED INFORMATION MANAGEMENT

SYSTEMS FOR NON STRUCTURED INFORMATION MANAGEMENT SYSTEMS FOR NON STRUCTURED INFORMATION MANAGEMENT Prof. Dipartimento di Elettronica e Informazione Politecnico di Milano INFORMATION SEARCH AND RETRIEVAL Inf. retrieval 1 PRESENTATION SCHEMA GOALS AND

More information

PROVIDING PRIVACY AND PERSONALIZATION IN SEARCH

PROVIDING PRIVACY AND PERSONALIZATION IN SEARCH PROVIDING PRIVACY AND PERSONALIZATION IN SEARCH T. Mercy Priya and R. M. Suresh Department of Computer Science and Engineering, Sri Muthukumaran Institute of Technology, Chikkarayapuram, Chennai, Tamil

More information

Multi-Dimensional Text Classification

Multi-Dimensional Text Classification Multi-Dimensional Text Classification Thanaruk THEERAMUNKONG IT Program, SIIT, Thammasat University P.O. Box 22 Thammasat Rangsit Post Office, Pathumthani, Thailand, 12121 ping@siit.tu.ac.th Verayuth LERTNATTEE

More information

A New Technique to Optimize User s Browsing Session using Data Mining

A New Technique to Optimize User s Browsing Session using Data Mining Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,

More information

Ontology based Web Page Topic Identification

Ontology based Web Page Topic Identification Ontology based Web Page Topic Identification Abhishek Singh Rathore Department of Computer Science & Engineering Maulana Azad National Institute of Technology Bhopal, India Devshri Roy Department of Computer

More information

Domain Specific Search Engine for Students

Domain Specific Search Engine for Students Domain Specific Search Engine for Students Domain Specific Search Engine for Students Wai Yuen Tang The Department of Computer Science City University of Hong Kong, Hong Kong wytang@cs.cityu.edu.hk Lam

More information

Web Mining. Data Mining and Text Mining (UIC Politecnico di Milano) Daniele Loiacono

Web Mining. Data Mining and Text Mining (UIC Politecnico di Milano) Daniele Loiacono Web Mining Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann Series in Data Management

More information

Sentiment Analysis for Customer Review Sites

Sentiment Analysis for Customer Review Sites Sentiment Analysis for Customer Review Sites Chi-Hwan Choi 1, Jeong-Eun Lee 2, Gyeong-Su Park 2, Jonghwa Na 3, Wan-Sup Cho 4 1 Dept. of Bio-Information Technology 2 Dept. of Business Data Convergence 3

More information

Cluster-based Instance Consolidation For Subsequent Matching

Cluster-based Instance Consolidation For Subsequent Matching Jennifer Sleeman and Tim Finin, Cluster-based Instance Consolidation For Subsequent Matching, First International Workshop on Knowledge Extraction and Consolidation from Social Media, November 2012, Boston.

More information

Reading group on Ontologies and NLP:

Reading group on Ontologies and NLP: Reading group on Ontologies and NLP: Machine Learning27th infebruary Automated 2014 1 / 25 Te Reading group on Ontologies and NLP: Machine Learning in Automated Text Categorization, by Fabrizio Sebastianini.

More information

Text Analytics (Text Mining)

Text Analytics (Text Mining) CSE 6242 / CX 4242 Text Analytics (Text Mining) Concepts and Algorithms Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko,

More information

Web Service Matchmaking Using Web Search Engine and Machine Learning

Web Service Matchmaking Using Web Search Engine and Machine Learning International Journal of Web Engineering 2012, 1(1): 1-5 DOI: 10.5923/j.web.20120101.01 Web Service Matchmaking Using Web Search Engine and Machine Learning Incheon Paik *, Eigo Fujikawa School of Computer

More information

Approaches to Mining the Web

Approaches to Mining the Web Approaches to Mining the Web Olfa Nasraoui University of Louisville Web Mining: Mining Web Data (3 Types) Structure Mining: extracting info from topology of the Web (links among pages) Hubs: pages pointing

More information

Document Clustering for Mediated Information Access The WebCluster Project

Document Clustering for Mediated Information Access The WebCluster Project Document Clustering for Mediated Information Access The WebCluster Project School of Communication, Information and Library Sciences Rutgers University The original WebCluster project was conducted at

More information

Similarity Joins of Text with Incomplete Information Formats

Similarity Joins of Text with Incomplete Information Formats Similarity Joins of Text with Incomplete Information Formats Shaoxu Song and Lei Chen Department of Computer Science Hong Kong University of Science and Technology {sshaoxu,leichen}@cs.ust.hk Abstract.

More information

Utilization of UML diagrams in designing an events extraction system

Utilization of UML diagrams in designing an events extraction system DESIGN STUDIES Utilization of UML diagrams in designing an events extraction system MIHAI AVORNICULUI Babes-Bolyai University, Department of Computer Science, Cluj-Napoca, Romania mavornicului@yahoo.com

More information

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent

More information

Merging Data Mining Techniques for Web Page Access Prediction: Integrating Markov Model with Clustering

Merging Data Mining Techniques for Web Page Access Prediction: Integrating Markov Model with Clustering www.ijcsi.org 188 Merging Data Mining Techniques for Web Page Access Prediction: Integrating Markov Model with Clustering Trilok Nath Pandey 1, Ranjita Kumari Dash 2, Alaka Nanda Tripathy 3,Barnali Sahu

More information

Introduction to Text Mining. Hongning Wang

Introduction to Text Mining. Hongning Wang Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:

More information

Tag-based Social Interest Discovery

Tag-based Social Interest Discovery Tag-based Social Interest Discovery Xin Li / Lei Guo / Yihong (Eric) Zhao Yahoo!Inc 2008 Presented by: Tuan Anh Le (aletuan@vub.ac.be) 1 Outline Introduction Data set collection & Pre-processing Architecture

More information

Term-Frequency Inverse-Document Frequency Definition Semantic (TIDS) Based Focused Web Crawler

Term-Frequency Inverse-Document Frequency Definition Semantic (TIDS) Based Focused Web Crawler Term-Frequency Inverse-Document Frequency Definition Semantic (TIDS) Based Focused Web Crawler Mukesh Kumar and Renu Vig University Institute of Engineering and Technology, Panjab University, Chandigarh,

More information

Automatically Determining Semantics for World Wide Web Multimedia Information Retrieval

Automatically Determining Semantics for World Wide Web Multimedia Information Retrieval Journal of Visual Languages and Computing (1999) 10, 585}606 Article No. jvlc.1999.0147, available online at http://www.idealibrary.com on Automatically Determining Semantics for World Wide Web Multimedia

More information

Cluster-based Similarity Aggregation for Ontology Matching

Cluster-based Similarity Aggregation for Ontology Matching Cluster-based Similarity Aggregation for Ontology Matching Quang-Vinh Tran 1, Ryutaro Ichise 2, and Bao-Quoc Ho 1 1 Faculty of Information Technology, Ho Chi Minh University of Science, Vietnam {tqvinh,hbquoc}@fit.hcmus.edu.vn

More information

An Improvement of Centroid-Based Classification Algorithm for Text Classification

An Improvement of Centroid-Based Classification Algorithm for Text Classification An Improvement of Centroid-Based Classification Algorithm for Text Classification Zehra Cataltepe, Eser Aygun Istanbul Technical Un. Computer Engineering Dept. Ayazaga, Sariyer, Istanbul, Turkey cataltepe@itu.edu.tr,

More information

An Automatic Extraction of Educational Digital Objects and Metadata from institutional Websites

An Automatic Extraction of Educational Digital Objects and Metadata from institutional Websites An Automatic Extraction of Educational Digital Objects and Metadata from institutional Websites Kajal K. Nandeshwar 1, Praful B. Sambhare 2 1M.E. IInd year, Dept. of Computer Science, P. R. Pote College

More information

Development of an Ontology-Based Portal for Digital Archive Services

Development of an Ontology-Based Portal for Digital Archive Services Development of an Ontology-Based Portal for Digital Archive Services Ching-Long Yeh Department of Computer Science and Engineering Tatung University 40 Chungshan N. Rd. 3rd Sec. Taipei, 104, Taiwan chingyeh@cse.ttu.edu.tw

More information

Searching the Deep Web

Searching the Deep Web Searching the Deep Web 1 What is Deep Web? Information accessed only through HTML form pages database queries results embedded in HTML pages Also can included other information on Web can t directly index

More information

Self-Organizing Maps for cyclic and unbounded graphs

Self-Organizing Maps for cyclic and unbounded graphs Self-Organizing Maps for cyclic and unbounded graphs M. Hagenbuchner 1, A. Sperduti 2, A.C. Tsoi 3 1- University of Wollongong, Wollongong, Australia. 2- University of Padova, Padova, Italy. 3- Hong Kong

More information

Using Text Learning to help Web browsing

Using Text Learning to help Web browsing Using Text Learning to help Web browsing Dunja Mladenić J.Stefan Institute, Ljubljana, Slovenia Carnegie Mellon University, Pittsburgh, PA, USA Dunja.Mladenic@{ijs.si, cs.cmu.edu} Abstract Web browsing

More information

Searching the Deep Web

Searching the Deep Web Searching the Deep Web 1 What is Deep Web? Information accessed only through HTML form pages database queries results embedded in HTML pages Also can included other information on Web can t directly index

More information

Towards Breaking the Quality Curse. AWebQuerying Web-Querying Approach to Web People Search.

Towards Breaking the Quality Curse. AWebQuerying Web-Querying Approach to Web People Search. Towards Breaking the Quality Curse. AWebQuerying Web-Querying Approach to Web People Search. Dmitri V. Kalashnikov Rabia Nuray-Turan Sharad Mehrotra Dept of Computer Science University of California, Irvine

More information

Web site Image database. Web site Video database. Web server. Meta-server Meta-search Agent. Meta-DB. Video query. Text query. Web client.

Web site Image database. Web site Video database. Web server. Meta-server Meta-search Agent. Meta-DB. Video query. Text query. Web client. (Published in WebNet 97: World Conference of the WWW, Internet and Intranet, Toronto, Canada, Octobor, 1997) WebView: A Multimedia Database Resource Integration and Search System over Web Deepak Murthy

More information

UNIT-V WEB MINING. 3/18/2012 Prof. Asha Ambhaikar, RCET Bhilai.

UNIT-V WEB MINING. 3/18/2012 Prof. Asha Ambhaikar, RCET Bhilai. UNIT-V WEB MINING 1 Mining the World-Wide Web 2 What is Web Mining? Discovering useful information from the World-Wide Web and its usage patterns. 3 Web search engines Index-based: search the Web, index

More information

CADIAL Search Engine at INEX

CADIAL Search Engine at INEX CADIAL Search Engine at INEX Jure Mijić 1, Marie-Francine Moens 2, and Bojana Dalbelo Bašić 1 1 Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, 10000 Zagreb, Croatia {jure.mijic,bojana.dalbelo}@fer.hr

More information

How are XML-based Marc21 and Dublin Core Records Indexed and ranked by General Search Engines in Dynamic Online Environments?

How are XML-based Marc21 and Dublin Core Records Indexed and ranked by General Search Engines in Dynamic Online Environments? How are XML-based Marc21 and Dublin Core Records Indexed and ranked by General Search Engines in Dynamic Online Environments? A. Hossein Farajpahlou Professor, Dept. Lib. and Info. Sci., Shahid Chamran

More information

Semantic Web Search Model for Information Retrieval of the Semantic Data *

Semantic Web Search Model for Information Retrieval of the Semantic Data * Semantic Web Search Model for Information Retrieval of the Semantic Data * Okkyung Choi 1, SeokHyun Yoon 1, Myeongeun Oh 1, and Sangyong Han 2 Department of Computer Science & Engineering Chungang University

More information

2 Approaches to worldwide web information retrieval

2 Approaches to worldwide web information retrieval The WEBFIND tool for finding scientific papers over the worldwide web. Alvaro E. Monge and Charles P. Elkan Department of Computer Science and Engineering University of California, San Diego La Jolla,

More information

A hybrid method to categorize HTML documents

A hybrid method to categorize HTML documents Data Mining VI 331 A hybrid method to categorize HTML documents M. Khordad, M. Shamsfard & F. Kazemeyni Electrical & Computer Engineering Department, Shahid Beheshti University, Iran Abstract In this paper

More information

A COMPARATIVE STUDY OF BYG SEARCH ENGINES

A COMPARATIVE STUDY OF BYG SEARCH ENGINES American Journal of Engineering Research (AJER) e-issn: 2320-0847 p-issn : 2320-0936 Volume-2, Issue-4, pp-39-43 www.ajer.us Research Paper Open Access A COMPARATIVE STUDY OF BYG SEARCH ENGINES Kailash

More information

A Novel PAT-Tree Approach to Chinese Document Clustering

A Novel PAT-Tree Approach to Chinese Document Clustering A Novel PAT-Tree Approach to Chinese Document Clustering Kenny Kwok, Michael R. Lyu, Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin, N.T., Hong Kong

More information

VisoLink: A User-Centric Social Relationship Mining

VisoLink: A User-Centric Social Relationship Mining VisoLink: A User-Centric Social Relationship Mining Lisa Fan and Botang Li Department of Computer Science, University of Regina Regina, Saskatchewan S4S 0A2 Canada {fan, li269}@cs.uregina.ca Abstract.

More information

Web Mining Evolution & Comparative Study with Data Mining

Web Mining Evolution & Comparative Study with Data Mining Web Mining Evolution & Comparative Study with Data Mining Anu, Assistant Professor (Resource Person) University Institute of Engineering and Technology Mahrishi Dayanand University Rohtak-124001, India

More information

Keywords APSE: Advanced Preferred Search Engine, Google Android Platform, Search Engine, Click-through data, Location and Content Concepts.

Keywords APSE: Advanced Preferred Search Engine, Google Android Platform, Search Engine, Click-through data, Location and Content Concepts. Volume 5, Issue 3, March 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Advanced Preferred

More information