A Framework for Delivery of Thai Content through Mobile Devices

Size: px
Start display at page:

Download "A Framework for Delivery of Thai Content through Mobile Devices"

Transcription

1 A Framework for Delivery of Thai Content through Mobile Devices Chuleerat Jaruskulchai, Atichart Khanthong, and Wanlapa Tantiprasongchai Intelligent Information Retrieval and Database Department of Computer Science, Faculty of Science, Kasetsart University, Bangkok, Thailand. Abstract With the increasing of mobile devices, there are challenges in providing text information to the mobile clients. Unfortunately, mobile devices have limited display and navigation capabilities. Furthermore, the inconvenient of tiny keypad makes more difficult to input keywords or other information. This problem is more challenge when working with Thai Text. This research paper introduces a framework for delivery of Thai Content through mobile devices. It explores on particular aspect of the automated construction of personalized focus or user s attention. Documents are disseminated based on the personalized focus and routed to a mobile device. Instead of delivery every document, the documents are clustered, topic is extracted for each cluster. Additionally, content of each document is summarized. Basic Naive Bay algorithm is deployed for filter user s attention and topic extraction is based on term frequency and inverse document frequencies. Important sentences are extracted for summarization. An object-oriented technology is used to develop this demo system. Keywords: Mobile Device application, Document summarization, processing of Thai Text. 1. Introduction It has been expected that the handheld computer market will grow larger than computer industry and mobile clients have become a new target for business industry. Due to the popularity and capability of Personal Digital Assistants (PDA) and Mobile devices, such capabilities will increase the usability of PDAs. According to the wireless technology, it provides an opportunity to access information in any time and any places. Thus, numerous information services are offer through the mobile clients, such as travel guides, entertainment advice, news, flight schedules, driving directions. However, these services are task-specific and mobile clients known where to locating information. For web browsing and searching application, mobile devices have limited display, graphics capabilities, navigation capabilities, and processing speed. Furthermore, the inconvenient of tiny keypad makes more difficult to input keywords or other information. This problem poses a number of issues in designing user interface for PDA. It is a challenge when working with Thai Text due to the numerous Thai alphabets. Most of Thai PDA s applications offered are installed and run on PDA. This paper presents a framework to facilitate web navigation, searching, and browsing for small devices for Thai texts. Documents are clustered and automatic created topic to describe the cluster content. To solve the limited display, text summarization techniques are proposed. To save user time for locating information, Naïve Bayes classifier is employed to classified user s preference and notify user by Related Works There are several aspects of mobile client which are attractive to the researchers and can be categorized into three groups. First basic aspect is the effective browsing for client devices. Work in [Matt J. et al. 1999], reports the study of the impact of display size will reduce the user effectiveness by up to 50% original tasks. The effectiveness is measured by the number of scrolling for viewing information. However, Dillon et al reported that the comprehension rate PDA is the same as desktop s display (cited in [Matt J. et al. 1999]). Second aspect is the study of dynamic transformation of format in web pages to small devices [Watters C. and Zhang R., 2003]. Another approach in this aspect is called Web Clipping Application. This approach use propriety language to request portions of web pages they reformat for display. Third aspect is the design of web searching for PDAs which is effective from information retrieval. Thus, this aspect is more concern in the navigation capabilities and how users interact with PDA for searching information. Many successful information retrieval methodologies are deployed such as text summarization, clustering of documents using concept hierarchies and term extraction [Chan D.L. et al. 2002]. Most of the current applications for PDA are available download form the Internet and run on local device. Problem for web browsing for Thai PDA uses the web clipping technology. 190 Jaruskulchai, C.; Khanthong, A. and Tantiprasongchai, W.

2 In processing of Thai text, there are several issues such as word boundary, sentence extraction. Report from National Electronics and Computer Technology Center (NECTEC) [Sornlertlamvanich V. et al. 2000] states that there are number of processing of Thai text has not been fully resolved. Due to the Thai writing system has no end word marker, word segmentation research still one of the research topic. The effectiveness of word segmentation is around 80-95% in precision and 80% in recall. However, many researchers have moved to discover sentence extraction, and national of language processing is employed to improve the word segmentation. 3. A Framework for Delivery of Thai Content Framework for delivery of Thai content is an extended our previous research [Jaruskulchai J., Kiewsuwansuk S., and Kantasena J., 2001] to provide facility to mobile clients. The main focus of our previous research is to investigate the clustering algorithm and topic extraction for Thai text. The research shows the potential to research in Thai text with little concern on word or sentence boundary. It is a challenge to move forward our research to serve mobile clients. Our framework offers several utilities to manage personal information. Not Only navigate function for users to browse information, but system will monitor new information and inform user through . Additionally, two types of information are offered to mobile clients, full text document and summarized document. There are many issues for designing and developing client mobile application. Michael and Kim [Michaeal J. Apbers and Loel Kim, 2000] had lay out a theoretical framework for understanding differences between handheld and full-sized web environment and their intended uses. Michael and Kim reported that there are two types of functions which can be provided to mobile clients, simple look-up and information manipulation. Simple look-up describes the skills and activities involved in locating and recognizing a desired chunk of information, such as checking a stock price, looking up a phone number, or reading an . Information manipulation is more complex task, generally, means the user needs to interact with different pieces of information, such as comparing airline fares. In our framework, we apply many mechanisms from information retrieval area such as simple lookup, information filtering according to user s preferences, document clustering and text summarization. The design framework consists of two main modules. Server module is responsible for collecting, indexing, filtering and summarizing documents. Server also mails user s preferences through mail. Client module allows users to manage (edit, review, delete) their profiles through web browser or PDA s devices. To access personal information is controlled by an address and a password. Finger 1 shows our framework. Detail of information filtering and summarizing are described in next section. Figure 1. A Framework for Delivery of Thai Content Jaruskulchai, C.; Khanthong, A. and Tantiprasongchai, W. 191

3 3.1 An automated Construction of Personalized Information Personal information and an information filtering can be used interchangeable and it is closely to text classification algorithm. Thus, to classify user s preferences, the probabilistic naive Bayes is employed. The naïve Bayes is very efficient algorithm and has been employed in many research fields, such as text classification, classification of , automate mail filtering. To estimate naïve Bayes parameters, user needs to provide a set of training documents or user s preference document. User s interest or user s preferences about information will express in keywords of interest and assign term weight for the most important keywords. In theory, keywords and weight of the most important terms are represented as a term weight vector, according to the vector space model. Most of implementation of naïve Bayes, initial information or training data set will be obtained from user. In real world application, asking user to rate the relevant document at the first time of registration may be not workable. Thus, system precomputes the posterior probability for each user by using the user s keyword and term weight, and this probabilities will be updated when user retrieve the documents. However, system presumes the retrieved documents are the relevant documents and update the probabilities parameters. When new information or new documents are arrived in the system, each document will be classified or filtered according to the probabilistic naïve Bayes and stored in each user s preference file waiting to delivery to user. 3.2 Document Clustering Enhancing the capability of mobile device, instead of delivery each document, documents are clustered according their contents and extracting topic for describing the content of each cluster. The complete linkage is employed in this framework to cluster documents of the same similar concept before delivery back to users. Extracting topic from full text, the high frequent word is extracted and group for describing the cluster. Full detail of our technique for clustering document will be found in [Jaruskulchai J., Kiewsuwansuk S., and Kantasena J., 2001] 3.3 Extracting Summary Sentences The main focus on this research is extracting summary sentences for representing the compact content. Summarization is a process of abstracting key content from one or more information sources. A variety of methods have been investigated. If target reader or function of use is concerned, a summary can be fall into three categories, indicative, informative and critical [Hahn Udo and Mani Inderjeet, 2000]. The indicative summary provides compact content to alert user not to miss the information. Informative summary provides essential information which can substitute the original source. Lastly, critical summary not only the abstracting of information but also provides some opinion on that content. However, the most difficult task of summarization of Thai Text is the sentence boundary, if results of word boundary algorithm are acceptable. Current research in sentence boundary can be found in [Charoenpornsawat P. and Sornlertlamvanich V. 2001], these approaches are probabilistic part-of-speech trigram, grammatical rule based, feature-based are deployed. The featurebased approach was evaluated by ORCHID [Information Research and Development Division, 2003] corpus, a part-of-speech tagged corpus. Thai sentence definition has not fully defined. Figure 2 excerpts some sentences from ORCHID corpus and it was claimed that a sentence. Thus, corrected sentence boundary is not in our concerned and the summary is aimed at indicative summary. พยายามวางแผน และประสานงานอย างใกล ช ด (Try to plan and to cooperate closely) ร ฐมนตร ว าการกระทรวงว ทย าศาสตร Figure 2. Excerpt of sentences from ORCHID corpus Thus, phrases are the major component, which use in the process of Thai text summarization. The simple algorithm presented by H.P. Luhn [Luhn H.P. 1958] is used to measure the important sentences and will be referred as within sentence clustering techniques. This method has been researched in [Buyukkokten O. Garcia-Molina H. and Peapcke A. 2000] with different data set and inverted document frequency shame weight. The summaries process is started by filter out of the Thai stop words. Then, in each document, the high frequent words (TF-Total Frequencies) are computed for representing the significant words in each document. The frequent word occurs more than 10 percents across in the document are eliminated. The rare words and specific words will be not removed. Then, sentence is divided into clusters according to the distance of none significant words. Thus, a cluster is a sequence of consecutive words in which this sequence starts and ends with a significant word and not more than D n of none significant words to separate significant word. If more than one clusters in a sentence, the highest one is selected. Then, sentence is ranked by counting the square of number of significant words in cluster divided by total number of words in cluster. In Figure 3 shows details of computation of sentence ranking. 192 Jaruskulchai, C.; Khanthong, A. and Tantiprasongchai, W.

4 Sentence [ * * * * ] Cluster with in sentence Figure 3. Computation of clusters and sentence ranking If number of D n > 2 then the sentence ranking is 2.3 The number of none significant word (D n ) should greater than 2. The system limits number of sentences in the summary of each document. Thus, to reduce the lost of content, system allow user to view the original document. At current report, the system is evaluated the effectiveness of summarization using the user s satisfaction. Thirty documents are randomly selected and are evaluated by second year under graduate students. Evaluation criteria are satisfy, fair, and unacceptable. The results show that more than 53%, of summaries are fair, around 11 %, readers feel satisfactory and the rest of results are unacceptable. 4. System Implementation The developing of this framework is written in Java technology. This framework is built on a client-server model, where text based and client modules are handled at the server side. The process of indexing and retrieving is developing using RMI technology, a distributed process. The summary content delivered to client is marked up with XML tags, thus it is easy to reformatted and display on any devices. Displaying information on PDA, the kxml parser is employed. On general browser uses default IE parser. On mobile client need at least 16 MB and Palm OS version 4 or higher. There are no standard for displaying Thai on PDA. The current version for displaying Thai needs to install Thai routing from Thaihack. For text summarization, the test data set is collected from the Thai news paper, Daily News. The content of data set is a daily event, such as economic, foreign affairs, political and social news in Thailand. Figure 4 shows the some display of this framework. 6. Conclusion and Future works This framework is our experimental to explore new application in PDA devices and extended our previous research for mobile clients. This framework will be served for a particular search task. We present a compact content by applying summarization and clustering techniques. Furthermore, the user s preferences information is filtered according to their needs and mail back to users. The system has not been conducted the evaluation in the theoretical information retrieval science, since the purpose to present the possible model for delivery Thai Text to mobile users. For future research in summarization process, there number of approaches may be investigated and experiment with Thai text. The approaches for text summarization has been grouped into 4 categories, a summary consists of list of terms or concepts terms, a single passage extracted from the text, a sequence of sentences extracted from the text and use natural language understanding for generating summary. The most important for Thai text summarization is the standard data test, since there are a number of parameters needed to explore. Additionally, to improve the summarization result, natural language processing technique may need to extract proper nouns. Figure 4. Shows Original Document, Summarization Results and Clustering Results Jaruskulchai, C.; Khanthong, A. and Tantiprasongchai, W. 193

5 Acknowledgments This research was partially supported by the Office of the National Research Council of Thailand, References [1] Buyukkokten O., Garcia-Molina H. and Peapcke A., Seeing the Whole in Parts: Text Summarization for Web Browsing on Handheld Devices, Proceedings of the Tenth International World-Wide Web Conference, 2000 [2] Marsden G., Cheery R., and Haefele A., Small Screen Access to Digital Libraries, Computer Network 31 (1999) [3] Michael J. Albers, Loel Kim, Implications of the wireless web for technical communicators: User web browsing characteristics using palm handhelds for information retrieval, Proceedings of IEEE professional communication society international professional communication conference and Proceedings of the 18th annual ACM international conference on Computer documentation: technology & teamwork September 2000 [4] Watters C. and Zhang R., PDA Access to Internet Content: Focus on Forms, HICSS 36 Hawaii, Jan [5] Matt J., Gary M., Norliza M., and Boone K., Improving Web Interaction on Small Displays, Computer Network 31(1999) [7] Hahn Udo and Mani Inderjeet, The Challenges of Automatic Summarization, IEEE Computer, Nov. 2000, (Vol 33, No. 11) [8] Luhn H.P., The Automatic Creation of Literature Abstracts, Advances in Automatic Text Summarization, edited by Inderjeet Mani and Mark T. Maybury, The MIT Press, Cambridge, Massachusetts, Landon, England, [9] Charoenpornsawat P. and Sornlertlamvanich V., Automatic Sentence Break Disambiguation for Thai, Proceedings of ICCPOL2001, Korea, pp , May [10] Chan D.L., Luk R.W.P., Mark W.K., Leon H.V., Ho E.K.S. and Lu Q., Multiple Related Document Summary and Navigation using Concept Hierarchies for Mobil Clients, Proceedings of the 2002 ACM Symposium on Applied Computing (SAC), March 10-14, 2002, Madrid, Spain. ACM 2002 [11] Jaruskulchai J., Kiewsuwansuk S., and Kantasena J., Thai Text Document Clustering, The Fifth National Computer Science and Engineering Conference, 7-9 Nov 2001, Chiang Mai, Thailand, [12] Information Research and Development Division, Orchid Corpus, National Electronics and Computer Technology Centers, /itech/download.html, Jan [6] Sornlertlamvanich V., Potipiti T., Wutiwiwatchai C., and Mittrapiyanuruk P., The State of the Art in Thai Language Processing, Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL2000), Hong Kong, pp , October Jaruskulchai, C.; Khanthong, A. and Tantiprasongchai, W.

A PRELIMINARY STUDY ON THE EXTRACTION OF SOCIO-TOPICAL WEB KEYWORDS

A PRELIMINARY STUDY ON THE EXTRACTION OF SOCIO-TOPICAL WEB KEYWORDS A PRELIMINARY STUDY ON THE EXTRACTION OF SOCIO-TOPICAL WEB KEYWORDS KULWADEE SOMBOONVIWAT Graduate School of Information Science and Technology, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033,

More information

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent

More information

A Web Page Segmentation Method by using Headlines to Web Contents as Separators and its Evaluations

A Web Page Segmentation Method by using Headlines to Web Contents as Separators and its Evaluations IJCSNS International Journal of Computer Science and Network Security, VOL.13 No.1, January 2013 1 A Web Page Segmentation Method by using Headlines to Web Contents as Separators and its Evaluations Hiroyuki

More information

Domain-specific Concept-based Information Retrieval System

Domain-specific Concept-based Information Retrieval System Domain-specific Concept-based Information Retrieval System L. Shen 1, Y. K. Lim 1, H. T. Loh 2 1 Design Technology Institute Ltd, National University of Singapore, Singapore 2 Department of Mechanical

More information

Web of Science. LIBRARY SERVICES

Web of Science.   LIBRARY SERVICES Web of Science Web of Science is a comprehensive online database providing access to academic journals, conference proceedings and books in the sciences, social sciences, arts and humanities, from 1970

More information

A Frequent Max Substring Technique for. Thai Text Indexing. School of Information Technology. Todsanai Chumwatana

A Frequent Max Substring Technique for. Thai Text Indexing. School of Information Technology. Todsanai Chumwatana School of Information Technology A Frequent Max Substring Technique for Thai Text Indexing Todsanai Chumwatana This thesis is presented for the Degree of Doctor of Philosophy of Murdoch University May

More information

(http://www.emeraldinsight.com)

(http://www.emeraldinsight.com) Emerald (http://www.emeraldinsight.com) Emerald publishes the world's widest range of management journals which provides information, ideas and opportunity to gain insight into key management topics. Emerald

More information

Information Retrieval

Information Retrieval Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,

More information

Blind Evaluation for Thai Search Engines

Blind Evaluation for Thai Search Engines Blind Evaluation for Thai Search Engines Shisanu Tongchim, Prapass Srichaivattana, Virach Sornlertlamvanich, Hitoshi Isahara Thai Computational Linguistics Laboratory 112 Paholyothin Road, Klong 1, Klong

More information

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 1 Department of Electronics & Comp. Sc, RTMNU, Nagpur, India 2 Department of Computer Science, Hislop College, Nagpur,

More information

A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK

A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK Qing Guo 1, 2 1 Nanyang Technological University, Singapore 2 SAP Innovation Center Network,Singapore ABSTRACT Literature review is part of scientific

More information

News Filtering and Summarization System Architecture for Recognition and Summarization of News Pages

News Filtering and Summarization System Architecture for Recognition and Summarization of News Pages Bonfring International Journal of Data Mining, Vol. 7, No. 2, May 2017 11 News Filtering and Summarization System Architecture for Recognition and Summarization of News Pages Bamber and Micah Jason Abstract---

More information

Karami, A., Zhou, B. (2015). Online Review Spam Detection by New Linguistic Features. In iconference 2015 Proceedings.

Karami, A., Zhou, B. (2015). Online Review Spam Detection by New Linguistic Features. In iconference 2015 Proceedings. Online Review Spam Detection by New Linguistic Features Amir Karam, University of Maryland Baltimore County Bin Zhou, University of Maryland Baltimore County Karami, A., Zhou, B. (2015). Online Review

More information

Adding a Source Code Searching Capability to Yioop ADDING A SOURCE CODE SEARCHING CAPABILITY TO YIOOP CS297 REPORT

Adding a Source Code Searching Capability to Yioop ADDING A SOURCE CODE SEARCHING CAPABILITY TO YIOOP CS297 REPORT ADDING A SOURCE CODE SEARCHING CAPABILITY TO YIOOP CS297 REPORT Submitted to Dr. Chris Pollett By Snigdha Rao Parvatneni 1 1. INTRODUCTION The aim of the CS297 project is to explore and learn important

More information

KEYWORD EXTRACTION FROM DESKTOP USING TEXT MINING TECHNIQUES

KEYWORD EXTRACTION FROM DESKTOP USING TEXT MINING TECHNIQUES KEYWORD EXTRACTION FROM DESKTOP USING TEXT MINING TECHNIQUES Dr. S.Vijayarani R.Janani S.Saranya Assistant Professor Ph.D.Research Scholar, P.G Student Department of CSE, Department of CSE, Department

More information

Adaptable and Adaptive Web Information Systems. Lecture 1: Introduction

Adaptable and Adaptive Web Information Systems. Lecture 1: Introduction Adaptable and Adaptive Web Information Systems School of Computer Science and Information Systems Birkbeck College University of London Lecture 1: Introduction George Magoulas gmagoulas@dcs.bbk.ac.uk October

More information

String Vector based KNN for Text Categorization

String Vector based KNN for Text Categorization 458 String Vector based KNN for Text Categorization Taeho Jo Department of Computer and Information Communication Engineering Hongik University Sejong, South Korea tjo018@hongik.ac.kr Abstract This research

More information

Text Mining. Representation of Text Documents

Text Mining. Representation of Text Documents Data Mining is typically concerned with the detection of patterns in numeric data, but very often important (e.g., critical to business) information is stored in the form of text. Unlike numeric data,

More information

You ve Got A Workflow Management Extraction System

You ve Got   A Workflow Management Extraction System 342 Journal of Reviews on Global Economics, 2017, 6, 342-349 You ve Got Email: A Workflow Management Extraction System Piyanuch Chaipornkaew 1, Takorn Prexawanprasut 1,* and Michael McAleer 2-6 1 College

More information

Asia Top Internet Countries June 30, 2012 China 538 India Japan Indonesia Korea, South Philippines Vietnam Pakistan Thailand Malaysia

Asia Top Internet Countries June 30, 2012 China 538 India Japan Indonesia Korea, South Philippines Vietnam Pakistan Thailand Malaysia EXPLORING TECHNOLOGY ADOPTION FACTORS OF WEB SEARCH ENGINES INFLUENCING TO USERS IN THAILAND Patthawadi Pawanprommaraj, Supaporn Kiattisin and Adisorn Leelasantitham Department of Technology of Information

More information

Scitation A User Guide

Scitation A User Guide Scitation A User Guide Manage your research faster and easier scitation.aip.org Scitation A Rich Resource for Accessing and Using Scholarly Publications Scitation is the online home to more than one million

More information

A Comparative Study Weighting Schemes for Double Scoring Technique

A Comparative Study Weighting Schemes for Double Scoring Technique , October 19-21, 2011, San Francisco, USA A Comparative Study Weighting Schemes for Double Scoring Technique Tanakorn Wichaiwong Member, IAENG and Chuleerat Jaruskulchai Abstract In XML-IR systems, the

More information

STUDYING OF CLASSIFYING CHINESE SMS MESSAGES

STUDYING OF CLASSIFYING CHINESE SMS MESSAGES STUDYING OF CLASSIFYING CHINESE SMS MESSAGES BASED ON BAYESIAN CLASSIFICATION 1 LI FENG, 2 LI JIGANG 1,2 Computer Science Department, DongHua University, Shanghai, China E-mail: 1 Lifeng@dhu.edu.cn, 2

More information

LetterScroll: Text Entry Using a Wheel for Visually Impaired Users

LetterScroll: Text Entry Using a Wheel for Visually Impaired Users LetterScroll: Text Entry Using a Wheel for Visually Impaired Users Hussain Tinwala Dept. of Computer Science and Engineering, York University 4700 Keele Street Toronto, ON, CANADA M3J 1P3 hussain@cse.yorku.ca

More information

Evaluation of Web Search Engines with Thai Queries

Evaluation of Web Search Engines with Thai Queries Evaluation of Web Search Engines with Thai Queries Virach Sornlertlamvanich, Shisanu Tongchim and Hitoshi Isahara Thai Computational Linguistics Laboratory 112 Paholyothin Road, Klong Luang, Pathumthani,

More information

A Prototype System to Browse Web News using Maps for NIE in Elementary Schools in Japan

A Prototype System to Browse Web News using Maps for NIE in Elementary Schools in Japan A Prototype System to Browse Web News using Maps for NIE in Elementary Schools in Japan Yutaka Uchiyama *1 Akifumi Kuroda *2 Kazuaki Ando *3 *1, 2 Graduate School of Engineering, *3 Faculty of Engineering

More information

Comment Extraction from Blog Posts and Its Applications to Opinion Mining

Comment Extraction from Blog Posts and Its Applications to Opinion Mining Comment Extraction from Blog Posts and Its Applications to Opinion Mining Huan-An Kao, Hsin-Hsi Chen Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan

More information

Classifying Twitter Data in Multiple Classes Based On Sentiment Class Labels

Classifying Twitter Data in Multiple Classes Based On Sentiment Class Labels Classifying Twitter Data in Multiple Classes Based On Sentiment Class Labels Richa Jain 1, Namrata Sharma 2 1M.Tech Scholar, Department of CSE, Sushila Devi Bansal College of Engineering, Indore (M.P.),

More information

MURDOCH RESEARCH REPOSITORY

MURDOCH RESEARCH REPOSITORY MURDOCH RESEARCH REPOSITORY http://researchrepository.murdoch.edu.au/ This is the author s final version of the work, as accepted for publication following peer review but without the publisher s layout

More information

Feature Selecting Model in Automatic Text Categorization of Chinese Financial Industrial News

Feature Selecting Model in Automatic Text Categorization of Chinese Financial Industrial News Selecting Model in Automatic Text Categorization of Chinese Industrial 1) HUEY-MING LEE 1 ), PIN-JEN CHEN 1 ), TSUNG-YEN LEE 2) Department of Information Management, Chinese Culture University 55, Hwa-Kung

More information

CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets

CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets Arjumand Younus 1,2, Colm O Riordan 1, and Gabriella Pasi 2 1 Computational Intelligence Research Group,

More information

Fall 2013 Harvard Library User Survey Summary December 18, 2013

Fall 2013 Harvard Library User Survey Summary December 18, 2013 Fall 2013 Harvard Library User Survey Summary December 18, 2013 The Discovery Platform Investigation group placed links to a User Survey on the four major Harvard Library web sites (HOLLIS, HOLLIS Classic,

More information

VisoLink: A User-Centric Social Relationship Mining

VisoLink: A User-Centric Social Relationship Mining VisoLink: A User-Centric Social Relationship Mining Lisa Fan and Botang Li Department of Computer Science, University of Regina Regina, Saskatchewan S4S 0A2 Canada {fan, li269}@cs.uregina.ca Abstract.

More information

A Review on Identifying the Main Content From Web Pages

A Review on Identifying the Main Content From Web Pages A Review on Identifying the Main Content From Web Pages Madhura R. Kaddu 1, Dr. R. B. Kulkarni 2 1, 2 Department of Computer Scienece and Engineering, Walchand Institute of Technology, Solapur University,

More information

Multi-Dimensional Text Classification

Multi-Dimensional Text Classification Multi-Dimensional Text Classification Thanaruk THEERAMUNKONG IT Program, SIIT, Thammasat University P.O. Box 22 Thammasat Rangsit Post Office, Pathumthani, Thailand, 12121 ping@siit.tu.ac.th Verayuth LERTNATTEE

More information

Clustering Web Documents using Hierarchical Method for Efficient Cluster Formation

Clustering Web Documents using Hierarchical Method for Efficient Cluster Formation Clustering Web Documents using Hierarchical Method for Efficient Cluster Formation I.Ceema *1, M.Kavitha *2, G.Renukadevi *3, G.sripriya *4, S. RajeshKumar #5 * Assistant Professor, Bon Secourse College

More information

An Empirical Study of Web Interface Design on Small Display Devices

An Empirical Study of Web Interface Design on Small Display Devices An Empirical Study of Web Interface Design on Small Display Devices Mei Kang QIU Kang ZHANG Maolin HUANG Department of Computer Science Department of Computer Science Department of Computer Systems University

More information

Reading group on Ontologies and NLP:

Reading group on Ontologies and NLP: Reading group on Ontologies and NLP: Machine Learning27th infebruary Automated 2014 1 / 25 Te Reading group on Ontologies and NLP: Machine Learning in Automated Text Categorization, by Fabrizio Sebastianini.

More information

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS 1 WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS BRUCE CROFT NSF Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts,

More information

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and

More information

Recruitment Agency Based on SOA and XML Web Services

Recruitment Agency Based on SOA and XML Web Services Recruitment Agency Based on SOA and XML Web Services Nutthapat Kaewrattanapat and Jarumon Nookhong Department of Information Science, Suan Sunandha Rajabhat University, Bangkok, Thailand Email: {nutthapat.ke,

More information

Web Product Ranking Using Opinion Mining

Web Product Ranking Using Opinion Mining Web Product Ranking Using Opinion Mining Yin-Fu Huang and Heng Lin Department of Computer Science and Information Engineering National Yunlin University of Science and Technology Yunlin, Taiwan {huangyf,

More information

arxiv: v1 [cs.hc] 14 Nov 2017

arxiv: v1 [cs.hc] 14 Nov 2017 A visual search engine for Bangladeshi laws arxiv:1711.05233v1 [cs.hc] 14 Nov 2017 Manash Kumar Mandal Department of EEE Khulna University of Engineering & Technology Khulna, Bangladesh manashmndl@gmail.com

More information

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data American Journal of Applied Sciences (): -, ISSN -99 Science Publications Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data Ibrahiem M.M. El Emary and Ja'far

More information

Advanced Smart Mobile Monitoring Solution for Managing Efficiently Gas Facilities of Korea

Advanced Smart Mobile Monitoring Solution for Managing Efficiently Gas Facilities of Korea Advanced Smart Mobile Monitoring Solution for Managing Efficiently Gas Facilities of Korea Jeong Seok Oh 1, Hyo Jung Bang 1, Green Bang 2 and Il-ju Ko 2, 1 Institute of Gas Safety R&D, Korea Gas Safety

More information

Chapter 27 Introduction to Information Retrieval and Web Search

Chapter 27 Introduction to Information Retrieval and Web Search Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval

More information

International ejournals

International ejournals Available online at www.internationalejournals.com International ejournals ISSN 0976 1411 International ejournal of Mathematics and Engineering 112 (2011) 1023-1029 ANALYZING THE REQUIREMENTS FOR TEXT

More information

Archives in a Networked Information Society: The Problem of Sustainability in the Digital Information Environment

Archives in a Networked Information Society: The Problem of Sustainability in the Digital Information Environment Archives in a Networked Information Society: The Problem of Sustainability in the Digital Information Environment Shigeo Sugimoto Research Center for Knowledge Communities Graduate School of Library, Information

More information

Information Gathering Support Interface by the Overview Presentation of Web Search Results

Information Gathering Support Interface by the Overview Presentation of Web Search Results Information Gathering Support Interface by the Overview Presentation of Web Search Results Takumi Kobayashi Kazuo Misue Buntarou Shizuki Jiro Tanaka Graduate School of Systems and Information Engineering

More information

A hybrid method to categorize HTML documents

A hybrid method to categorize HTML documents Data Mining VI 331 A hybrid method to categorize HTML documents M. Khordad, M. Shamsfard & F. Kazemeyni Electrical & Computer Engineering Department, Shahid Beheshti University, Iran Abstract In this paper

More information

Learning and Development. UWE Staff Profiles (USP) User Guide

Learning and Development. UWE Staff Profiles (USP) User Guide Learning and Development UWE Staff Profiles (USP) User Guide About this training manual This manual is yours to keep and is intended as a guide to be used during the training course and as a reference

More information

Document Summarization on Handheld Device:

Document Summarization on Handheld Device: Document Summarization on Handheld Device: An Information Visualization Tool for Mobile Commerce Christopher C. Yang Dept. of Systems Eng. and Eng. Management The Chinese University of Hong Kong Hong Kong

More information

Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman

Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman Abstract We intend to show that leveraging semantic features can improve precision and recall of query results in information

More information

Scientific databases

Scientific databases SCID 305 : Generic Skills in Science Research Scientific databases Suang Udomvaraphunt Academic IT Stang Monkolsuk library and Information Division Faculty of Science Stang Mongkolsuk Library http://stang.sc.mahidol.ac.th

More information

Automatic Text Summarization System Using Extraction Based Technique

Automatic Text Summarization System Using Extraction Based Technique Automatic Text Summarization System Using Extraction Based Technique 1 Priyanka Gonnade, 2 Disha Gupta 1,2 Assistant Professor 1 Department of Computer Science and Engineering, 2 Department of Computer

More information

Domain Specific Search Engine for Students

Domain Specific Search Engine for Students Domain Specific Search Engine for Students Domain Specific Search Engine for Students Wai Yuen Tang The Department of Computer Science City University of Hong Kong, Hong Kong wytang@cs.cityu.edu.hk Lam

More information

Next-Generation Standards Management with IHS Engineering Workbench

Next-Generation Standards Management with IHS Engineering Workbench ENGINEERING & PRODUCT DESIGN Next-Generation Standards Management with IHS Engineering Workbench The addition of standards management capabilities in IHS Engineering Workbench provides IHS Standards Expert

More information

News-Oriented Keyword Indexing with Maximum Entropy Principle.

News-Oriented Keyword Indexing with Maximum Entropy Principle. News-Oriented Keyword Indexing with Maximum Entropy Principle. Li Sujian' Wang Houfeng' Yu Shiwen' Xin Chengsheng2 'Institute of Computational Linguistics, Peking University, 100871, Beijing, China Ilisujian,

More information

Extracting Summary from Documents Using K-Mean Clustering Algorithm

Extracting Summary from Documents Using K-Mean Clustering Algorithm Extracting Summary from Documents Using K-Mean Clustering Algorithm Manjula.K.S 1, Sarvar Begum 2, D. Venkata Swetha Ramana 3 Student, CSE, RYMEC, Bellary, India 1 Student, CSE, RYMEC, Bellary, India 2

More information

January- March,2016 ISSN NO

January- March,2016 ISSN NO USER INTERFACES FOR INFORMATION RETRIEVAL ON THE WWW: A PERSPECTIVE OF INDIAN WOMEN. Sunil Kumar Research Scholar Bhagwant University,Ajmer sunilvats1981@gmail.com Dr. S.B.L. Tripathy Abstract Information

More information

Map-based Access to Multiple Educational On-Line Resources from Mobile Wireless Devices

Map-based Access to Multiple Educational On-Line Resources from Mobile Wireless Devices Map-based Access to Multiple Educational On-Line Resources from Mobile Wireless Devices P. Brusilovsky 1 and R.Rizzo 2 1 School of Information Sciences, University of Pittsburgh, Pittsburgh PA 15260, USA

More information

Keywords Data alignment, Data annotation, Web database, Search Result Record

Keywords Data alignment, Data annotation, Web database, Search Result Record Volume 5, Issue 8, August 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Annotating Web

More information

INFOED. identify and. target

INFOED. identify and. target CREATING A RESEARCH INTEREST PROFILE WITH INFOED Research interest profiles are used by the Office of Research and Sponsored Programs (ORSP) to find funding opportunities. By having a list of interests

More information

Microsoft SharePoint Server 2013 for the Site Owner/Power User

Microsoft SharePoint Server 2013 for the Site Owner/Power User Course 55035B: Microsoft SharePoint Server 2013 for the Site Owner/Power User Page 1 of 6 Microsoft SharePoint Server 2013 for the Site Owner/Power User Course 55035B: 2 days; Instructor-Led Introduction

More information

Diploma Of Computing

Diploma Of Computing Diploma Of Computing Course Outline Campus Intake CRICOS Course Duration Teaching Methods Assessment Course Structure Units Melbourne Burwood Campus / Jakarta Campus, Indonesia March, June, October 022638B

More information

SIRS Issues Researcher

SIRS Issues Researcher From the main screen of SIRS, click on the SIRS Issues Researcher link. 1 This tutorial will provide an overview of the following features available through SIRS Issues Researcher: 2. Search Tabs 3. Reference

More information

FOUNDATIONS OF INFORMATION SYSTEMS MIS 2749 COURSE SYLLABUS Fall, Course Title and Description

FOUNDATIONS OF INFORMATION SYSTEMS MIS 2749 COURSE SYLLABUS Fall, Course Title and Description FOUNDATIONS OF INFORMATION SYSTEMS MIS 2749 COURSE SYLLABUS Fall, 2013 Instructor s Name: Vicki Robertson E-mail: vrobrtsn@memphis.edu Course Title and Description Foundations of Information Systems. (3

More information

VIDEO SEARCHING AND BROWSING USING VIEWFINDER

VIDEO SEARCHING AND BROWSING USING VIEWFINDER VIDEO SEARCHING AND BROWSING USING VIEWFINDER By Dan E. Albertson Dr. Javed Mostafa John Fieber Ph. D. Student Associate Professor Ph. D. Candidate Information Science Information Science Information Science

More information

Information Retrieval. (M&S Ch 15)

Information Retrieval. (M&S Ch 15) Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion

More information

Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching

Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching Wolfgang Tannebaum, Parvaz Madabi and Andreas Rauber Institute of Software Technology and Interactive Systems, Vienna

More information

Quoogle: A Query Expander for Google

Quoogle: A Query Expander for Google Quoogle: A Query Expander for Google Michael Smit Faculty of Computer Science Dalhousie University 6050 University Avenue Halifax, NS B3H 1W5 smit@cs.dal.ca ABSTRACT The query is the fundamental way through

More information

Assessment Plan. Academic Cycle

Assessment Plan. Academic Cycle College of Business and Technology Division or Department: School of Business (Business Administration, BS) Prepared by: Marcia Hardy Date: June 21, 2017 Approved by: Margaret Kilcoyne Date: June 21, 2017

More information

Lecture Video Indexing and Retrieval Using Topic Keywords

Lecture Video Indexing and Retrieval Using Topic Keywords Lecture Video Indexing and Retrieval Using Topic Keywords B. J. Sandesh, Saurabha Jirgi, S. Vidya, Prakash Eljer, Gowri Srinivasa International Science Index, Computer and Information Engineering waset.org/publication/10007915

More information

PELLISSIPPI STATE TECHNICAL COMMUNITY COLLEGE MASTER SYLLABUS CIW JAVASCRIPT FUNDAMENTALS CERTIFICATION WEB 2391

PELLISSIPPI STATE TECHNICAL COMMUNITY COLLEGE MASTER SYLLABUS CIW JAVASCRIPT FUNDAMENTALS CERTIFICATION WEB 2391 PELLISSIPPI STATE TECHNICAL COMMUNITY COLLEGE MASTER SYLLABUS CIW JAVASCRIPT FUNDAMENTALS CERTIFICATION WEB 2391 Class Hours: 1.0 Credit Hours: 1.0 Laboratory Hours: 0.0 Revised: Fall 06 Note: This course

More information

CS54701: Information Retrieval

CS54701: Information Retrieval CS54701: Information Retrieval Basic Concepts 19 January 2016 Prof. Chris Clifton 1 Text Representation: Process of Indexing Remove Stopword, Stemming, Phrase Extraction etc Document Parser Extract useful

More information

Keyword Extraction by KNN considering Similarity among Features

Keyword Extraction by KNN considering Similarity among Features 64 Int'l Conf. on Advances in Big Data Analytics ABDA'15 Keyword Extraction by KNN considering Similarity among Features Taeho Jo Department of Computer and Information Engineering, Inha University, Incheon,

More information

I-Pats: An Intelligent Search System for Thai Patents

I-Pats: An Intelligent Search System for Thai Patents I-Pats: An Intelligent Search System for Thai Patents Marut Buranarach 1 Choochart Haruechaiyasak 2 Alisa Kongthon 3 1,2,3 Human Language Technology Laboratory, National Electronics and Computer Technology

More information

Automatic Domain Partitioning for Multi-Domain Learning

Automatic Domain Partitioning for Multi-Domain Learning Automatic Domain Partitioning for Multi-Domain Learning Di Wang diwang@cs.cmu.edu Chenyan Xiong cx@cs.cmu.edu William Yang Wang ww@cmu.edu Abstract Multi-Domain learning (MDL) assumes that the domain labels

More information

User-Centered Guidelines for Design of Mobile Applications

User-Centered Guidelines for Design of Mobile Applications The Fourth International Conference on Electronic Business (ICEB2004) / Beijing 853 User-Centered Guidelines for Design of Mobile Applications Xiaowen Fang, Susy Chan, Jacek Brzezinski, Shuang Xu, Jean

More information

Overview of the INEX 2009 Link the Wiki Track

Overview of the INEX 2009 Link the Wiki Track Overview of the INEX 2009 Link the Wiki Track Wei Che (Darren) Huang 1, Shlomo Geva 2 and Andrew Trotman 3 Faculty of Science and Technology, Queensland University of Technology, Brisbane, Australia 1,

More information

A Linear Regression Model for Assessing the Ranking of Web Sites Based on Number of Visits

A Linear Regression Model for Assessing the Ranking of Web Sites Based on Number of Visits A Linear Regression Model for Assessing the Ranking of Web Sites Based on Number of Visits Dowming Yeh, Pei-Chen Sun, and Jia-Wen Lee National Kaoshiung Normal University Kaoshiung, Taiwan 802, Republic

More information

Enhancing Cluster Quality by Using User Browsing Time

Enhancing Cluster Quality by Using User Browsing Time Enhancing Cluster Quality by Using User Browsing Time Rehab M. Duwairi* and Khaleifah Al.jada'** * Department of Computer Information Systems, Jordan University of Science and Technology, Irbid 22110,

More information

Enhancing Web Page Skimmability

Enhancing Web Page Skimmability Enhancing Web Page Skimmability Chen-Hsiang Yu MIT CSAIL 32 Vassar St Cambridge, MA 02139 chyu@mit.edu Robert C. Miller MIT CSAIL 32 Vassar St Cambridge, MA 02139 rcm@mit.edu Abstract Information overload

More information

Efficient Web Browsing on Handheld Devices Using Page and Form Summarization

Efficient Web Browsing on Handheld Devices Using Page and Form Summarization Efficient Web Browsing on Handheld Devices Using Page and Form Summarization ORKUT BUYUKKOKTEN, OLIVER KALJUVEE, HECTOR GARCIA-MOLINA, ANDREAS PAEPCKE and TERRY WINOGRAD Stanford University We present

More information

Wrapper: An Application for Evaluating Exploratory Searching Outside of the Lab

Wrapper: An Application for Evaluating Exploratory Searching Outside of the Lab Wrapper: An Application for Evaluating Exploratory Searching Outside of the Lab Bernard J Jansen College of Information Sciences and Technology The Pennsylvania State University University Park PA 16802

More information

A Study of Thai Succession Law Ontology on Supreme Court Sentences Retrieval

A Study of Thai Succession Law Ontology on Supreme Court Sentences Retrieval A Study of Thai Succession Law Ontology on Supreme Court Sentences Retrieval Tanapon Tantisripreecha and Nuanwan Soonthornphisaj Abstract This paper presents an improvement of our approach called SCRO_II

More information

Efficient Web Browsing on Handheld Devices Using Page and Form Summarization

Efficient Web Browsing on Handheld Devices Using Page and Form Summarization Efficient Web Browsing on Handheld Devices Using Page and Form Summarization Orkut Buyukkokten, Oliver Kaljuvee, Hector Garcia-Molina, Andreas Paepcke, Terry Winograd Digital Libraries Lab(InfoLab), Stanford

More information

Bachelor of Arts Program in Information Science

Bachelor of Arts Program in Information Science Bachelor of Arts Program in Information Science Philosophy Creativity Service-minded Information Specialist Degree Bachelor of Arts (Information Science) B.A. (Information Science) Now in the process of

More information

Developing Focused Crawlers for Genre Specific Search Engines

Developing Focused Crawlers for Genre Specific Search Engines Developing Focused Crawlers for Genre Specific Search Engines Nikhil Priyatam Thesis Advisor: Prof. Vasudeva Varma IIIT Hyderabad July 7, 2014 Examples of Genre Specific Search Engines MedlinePlus Naukri.com

More information

Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique

Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique Anotai Siltepavet 1, Sukree Sinthupinyo 2 and Prabhas Chongstitvatana 3 1 Computer Engineering, Chulalongkorn University,

More information

Participant Training Guide

Participant Training Guide http://secnet.cch.com March, 2010 Table of Contents Introduction...2 Objectives...2 Accessing...3 Home Page...4 Filings...5 Viewing Search Results...7 Viewing Documents...8 Record Keeping...9 Today s Filings...10

More information

Information Management (IM)

Information Management (IM) 1 2 3 4 5 6 7 8 9 Information Management (IM) Information Management (IM) is primarily concerned with the capture, digitization, representation, organization, transformation, and presentation of information;

More information

Enhancing Cluster Quality by Using User Browsing Time

Enhancing Cluster Quality by Using User Browsing Time Enhancing Cluster Quality by Using User Browsing Time Rehab Duwairi Dept. of Computer Information Systems Jordan Univ. of Sc. and Technology Irbid, Jordan rehab@just.edu.jo Khaleifah Al.jada' Dept. of

More information

I. General regulations

I. General regulations Degree and examination regulations for the consecutive international master's program in Architecture Typology at Faculty VI of the Technische Universität Berlin, October 2, 206 On October 2, 206, the

More information

Text Documents clustering using K Means Algorithm

Text Documents clustering using K Means Algorithm Text Documents clustering using K Means Algorithm Mrs Sanjivani Tushar Deokar Assistant professor sanjivanideokar@gmail.com Abstract: With the advancement of technology and reduced storage costs, individuals

More information

Investigating the Effects of User Age on Readability

Investigating the Effects of User Age on Readability Investigating the Effects of User Age on Readability Kyung Hoon Hyun, Ji-Hyun Lee, and Hwon Ihm Korea Advanced Institute of Science and Technology, Korea {hellohoon,jihyunl87,raccoon}@kaist.ac.kr Abstract.

More information

Integration of Handwriting Recognition in Butterfly Net

Integration of Handwriting Recognition in Butterfly Net Integration of Handwriting Recognition in Butterfly Net Sye-Min Christina Chan Department of Computer Science Stanford University Stanford, CA 94305 USA sychan@stanford.edu Abstract ButterflyNet allows

More information

WEBMASTER OVERVIEW PURPOSE ELIGIBILITY TIME LIMITS

WEBMASTER OVERVIEW PURPOSE ELIGIBILITY TIME LIMITS WEBMASTER OVERVIEW Participants are required to design, build and launch a World Wide Web site that features the school s career and technology education program, the TSA chapter, and the chapter s ability

More information

CSI5387: Data Mining Project

CSI5387: Data Mining Project CSI5387: Data Mining Project Terri Oda April 14, 2008 1 Introduction Web pages have become more like applications that documents. Not only do they provide dynamic content, they also allow users to play

More information

MURDOCH RESEARCH REPOSITORY

MURDOCH RESEARCH REPOSITORY MURDOCH RESEARCH REPOSITORY http://researchrepository.murdoch.edu.au/ This is the author s final version of the work, as accepted for publication following peer review but without the publisher s layout

More information