Keyword Extraction from Multiple Words for Report Recommendations in Media Wiki

Similar documents
Human Identification at a Distance Using Body Shape Information

Automatic 3D modeling using visual programming

ScienceDirect KEYWORD EXTRACTION USING PARTICLE SWARM OPTIMIZATION

Assistant Professor. Dept.Computer Science and Engineering. Kerala India

Intelligent Hands Free Speech based SMS System on Android

Optimizing the hotspot performances by using load and resource balancing method

Hole Feature on Conical Face Recognition for Turning Part Model

UDP-Lite Enhancement Through Checksum Protection

Undercut feature recognition for core and cavity generation

Fast Learning for Big Data Using Dynamic Function

Optimization Model of K-Means Clustering Using Artificial Neural Networks to Handle Class Imbalance Problem

Using the rear projection of the Socibot Desktop robot for creation of applications with facial expressions

Voice activated spell-check

Keyword Extraction Using Clustering With Incremental Key Index Fast Search (IKIFS)

Journal of Applied Research and Technology ISSN: Centro de Ciencias Aplicadas y Desarrollo Tecnológico.

Conveyor Performance based on Motor DC 12 Volt Eg-530ad-2f using K-Means Clustering

Combined string searching algorithm based on knuth-morris- pratt and boyer-moore algorithms

Model of load balancing using reliable algorithm with multi-agent system

Study of Photovoltaic Cells Engineering Mathematical Model

Citations and Bibliographies

Improved Information Retrieval Performance on SQL Database Using Data Adapter

Performance analysis of robust road sign identification

Human Computer Interaction Using Speech Recognition Technology

Principles of E-network modelling of heterogeneous systems

CMS - HLT Configuration Management System

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

Design and Development of an Asynchronous Serial Communication Learning Media to Visualize the Bit Data

Optical Character Recognition Based Speech Synthesis System Using LabVIEW

The Development of the Education Related Multimedia Whitelist Filter using Cache Proxy Log Analysis

Web-page Indexing based on the Prioritize Ontology Terms

Scuole di dottorato in Bioscienze e biotecnologie e Scienze biomediche sperimentali WEB OF SCIENCE

Seamless transfer of ambient media from environment to mobile device

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Altering Height Data by Using Natural Logarithm as 3D Modelling Function for Reverse Engineering Application

Striped Data Server for Scalable Parallel Data Analysis

Integration of scheduling and discrete event simulation systems to improve production flow planning

Implementation of RS-485 Communication between PLC and PC of Distributed Control System Based on VB

Application of Augmented Reality Technology in Workshop Production Management

Fast Hybrid String Matching Algorithms

Simulation of rotation and scaling algorithm for numerically modelled structures

HTML Extraction Algorithm Based on Property and Data Cell

Design of Smart Home Systems Prototype Using MyRIO

Context Based Web Indexing For Semantic Web

Design optimization method for Francis turbine

A Supervised Method for Multi-keyword Web Crawling on Web Forums

Auto Secured Text Monitor in Natural Scene Images

How SPICE Language Modeling Works

Automated Tagging to Enable Fine-Grained Browsing of Lecture Videos

Web-Page Indexing Based on the Prioritized Ontology Terms

Competency Assessment Parameters for System Analyst Using System Development Life Cycle

A Survey On Different Text Clustering Techniques For Patent Analysis

Interoperability framework for communication between processes running on different mobile operating systems

The recognition of female voice based on voice registers in singing techniques in real-time using hankel transform method and macdonald function

Oxford University Press. Browse. (

Voice controller mobile android application

Prediction-Based NLP System by Boyer-Moore Algorithm for Requirements Elicitation

MATRIX BASED INDEXING TECHNIQUE FOR VIDEO DATA

Clustering Technique with Potter stemmer and Hypergraph Algorithms for Multi-featured Query Processing

Legal Research Checklist

Context Based Indexing in Search Engines: A Review

The Observation of Bahasa Indonesia Official Computer Terms Implementation in Scientific Publication

User Task Automator. Himanshu Prasad 1, P. Geetha Priya 2, S.Manjunatha 3, B.H Namratha 4 and Rekha B. Venkatapur 5 1,2,3&4

Polite mode for a virtual assistant

Coordinated and Unified Control Scheme of IP and Optical Networks for Smart Power Grid

Annotating Multiple Web Databases Using Svm

Research on Scalable Zigbee Wireless Sensor Network Expansion Solution

Jacquard Control System of Warp Knitting Machine Based on Embedded System

Comprehensive Tool for Generation and Compatibility Management of Subtitles for English Language Videos

Key Technologies of Phone Storage Forensics Based on ARM Architecture

Input And Output Method Of Personalized Answers With Chatbot

Supporting Fuzzy Keyword Search in Databases

Comparative data compression techniques and multi-compression results

Power Transmission and Distribution Monitoring using Internet of Things (IoT) for Smart Grid

Simulation of the effectiveness evaluation process of security systems

Unsupervised Keyword Extraction from Single Document. Swagata Duari Aditya Gupta Vasudha Bhatnagar

Analyzing traffic source impact on returning visitors ratio in information provider website

ATLAS software configuration and build tool optimisation

Performing searches on Érudit

Improving Practical Exact String Matching

Attendance fingerprint identification system using arduino and single board computer

MARATHI TEXT-TO-SPEECH SYNTHESISYSTEM FOR ANDROID PHONES

An Automatic Extraction of Educational Digital Objects and Metadata from institutional Websites

Chapter 50 Tracing Related Scientific Papers by a Given Seed Paper Using Parscit

The Role of Participatory Design in Mobile Application Development

I. INTRODUCTION ABSTRACT

EBSCOHost. EBSCO Publishing/EBSCOhost is the registered trademark of EBSCO Publishing.

Speech Recognizing Robotic Arm for Writing Process

Performance analysis of parallel de novo genome assembly in shared memory system

ISSN: (Online) Volume 2, Issue 3, March 2014 International Journal of Advance Research in Computer Science and Management Studies

A Hybrid Unsupervised Web Data Extraction using Trinity and NLP

Implementation of pattern generation algorithm in forming Gilmore and Gomory model for two dimensional cutting stock problem

Building Api Student Store at Iris Labs Unikom

Design of business simulation game database for managerial learning

Differentiating Parameters for Selecting Simple Object Access Protocol (SOAP) vs. Representational State Transfer (REST) Based Architecture

Study of Selected Shifting based String Matching Algorithms

You can access ProQuest, as well as any of the other available subscription databases, from the library Web page,

Effect of suction pipe leaning angle and water level on the internal flow of pump sump

Citation extraction and modeling. Meen Chul Kim, Andrea Forte, Aaron Halfaker

A Novel Approach for Smart Home Using Microsoft Speech Recognition

Environment Independent Speech Recognition System using MFCC (Mel-frequency cepstral coefficient)

Transcription:

IOP Conference Series: Materials Science and Engineering PAPER OPEN ACCESS Keyword Extraction from Multiple Words for Report Recommendations in Media Wiki To cite this article: K Elakiya and Arun Sahayadhas 2017 IOP Conf. Ser.: Mater. Sci. Eng. 183 012029 Related content - Keyword Index - The enhancement of TextRank algorithm by using word2vec and its application on topic extraction Xiaolei Zuo, Silan Zhang and Jingbo Xia - Keyword detection in natural languages and DNA M. Ortuño, P. Carpena, P. Bernaola- Galván et al. View the article online for updates and enhancements. This content was downloaded from IP address 46.3.203.230 on 26/11/2017 at 07:16

Abstract: KEYWORD EXTRACTION FROM MULTIPLE WORDS FOR REPORT RECOMMENDATIONS IN MEDIA WIKI Elakiya K, Arun Sahayadhas* Department of CSE, Vels University, Chennai, India *arun.se@velsuniv.ac.in This paper addresses the problem of multiple words search, with the goal of using these multiple word search to retrieve, relevant wiki page which will be recommended to end user. However, the existing system provides a link to wiki page for only a single keyword only which is available in Wikipedia. Therefore it is difficult to get the correct result when search input has multiple keywords or a sentence. We have introduced a FastStringSearch technique which will provide option for efficient search with multiple key words and which will increase the flexibility for the end user to get his expected content easily. Key Terms: mediawiki, Fast String Search, Cirrus Search 1. INTRODUCTION In many information retrieval applications it is necessary to locate quickly some, or all the occurrence of a user specified word or phrases in one or several arbitrary text strings [1]. Specifically, the retrieval of unformatted data takes time. Therefore secondary index has been introduced which contains the keyword fragments [2]. Searching the index with user specified keywords yields the superset of pages required. In case of larger alphabets the Boyer-Moore algorithm takes larger time to search for all the occurrence of keyword in the document. Moreover, if the length of the search phrase is longer, then the performance of this technique gets even worse. To overcome this problem, we are implementing FastStringSearch (FSS) Technique [5] [3].This technique starts comparing the text pattern from left to right with multiple keywords.the main idea of this technique is to improve the performance of the search. FSS will start to compare the pattern against some portion of the text, where a possible match occurs and will list all the pages which has any one of text specified in the input. This gives the benefit to the user by specified set of keywords instead of single keyword. Specifying multiple keywords will always give better results which is not possible with the existing system. The performance of the search is calculated by using Squid caching [4] and Varnish caching [5]. Squid stores the copies of webserver pages and when the same page is requested again, Squid will provide the page from its storage. Because of Squid, web server need not generate the page again and again which boosts the performance of webserver. Varnish caching is also same like Squid. A varnish store the page of webserver in Apache server and it retrieves back when the same page is requested again. This inturn boosts the performance of MediaWiki [6]. 2. Methodology: Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. Published under licence by Ltd 1

2.1. Acquiring the text: The input would be acquired either as a text or audio file from user dynamically; audio file is converted into text and stored as a text in the database. This enables relevant documents to be fetched by a word or multi-words search directly by using mediawiki. It also enables to get the results simultaneously. Speech will be recognized by microphone which will act as input for the text conversation. Invoking of online speech converter does the conversion to digital text. The converted text is considered as keyword and stored in the database. When the user searches with the corresponding keyword, match happens and results are fetched. Fig 1 illustrates as to how Speech recognition and text Conversion happens. Fig 1: Speech recognition and text Conversion [7] 2.2. Keyword extraction: The keyword is extracted from the search engine using Cirrussearch [8]. The same Cirrus features can be used in API queries that users can enter in the search box. For example, special prefix can be used to find related pages. Fig 2 shows how text is forwarded to message formulation. The code converts the text into code form which is a discrete input with a speed of 200bps. Neuro-Mascular controls the articulatory motions with a constant speed of 200bps and transferring those motions to Vocal Tract System is done with a constant speed of 64-700 Kbps. Transmission channel helps to transfer the acoustic waveform from one form to another. Basilar Membrane Motion analyze the spectrum and send 2

to Neural Transduction for feature extraction and forms a continuous output. This output is sent to language translation with proper sentences, words and phonemes. Thus the given audio file would be converted into in a text message. Fig 2 : Flow chart for converting audio file into text message 2.3. Extracting relevant Documents: Through fast string search and cirrus search technique we can fetch any type of relevant documents as (text, pdf, video, audio etc) based upon the performance tuning by Squid caching [11]and varnish caching [5] methods. 2.4. Displaying the Document. The multiple word semantic search is used to do retrieval, fragmentation of each word, pertinent documents which can be suggested to end user and it will display the result as Automatic Content Linking Device(ACLD) [12] of relevant multiple documents. 3. Results: 3.1. Acquiring the audio & conversion to text The GUI for acquiring the speech from end user is as shown in Fig 3 and converting the speech to text is shown in Fig 4. 3

Fig 3:Audio listener from user end User has been provided with the option to choose the preferred language. In Fig 4, English has been chosen as the preferred language. Once the user chooses his preferred language, on clicking the button Voice, listener gets invoked and user provides his speech input via microphone. Fig 4: conversion message from audio to text Converter converts the recognized speech to text and fills the text box as shown in Fig 4. User also has been provided with the option to save the input text into the data base. When the user clicks on Talk It! button, system converts the text into audio and reads the same to the user. 3.2. Keyword & relevant document extraction As shown in Fig 5 and Fig 6, the media wiki takes either a single word or Multiple words search. Fig 5: The media wiki search takes as a keyword and fetches the result. Once speech is converted in to either single word or multiple keywords, those are given as input into MediaWiki. The system extracts the relevant documents based on either single keyword or mulitple keywords. 4

Fig 6: MediaWiki takes multiple word as search input and retrieves the document 3.3. Displaying the documents to the user The system displays the extracted relevant documents to user based on the search input for both multiple keywords and single keyword. 3.4. Performance comparison of proposed system with existing system The Proposed system gives an option to do efficient search with a sentence instead of a word which gives better results. The search is much faster when compared to the existing system. Users have an option to look the information directly in MediaWiki, and in Wikidata by reading Help:NavigatingWikidata. Search can be done with whole words, with spaces or punctuation marks. So if the search has the word 'book', the resulting pages with the word 'books' or 'booklet' will not be displayed. Fig 7 to Fig 9 shows the audio conversion loading time, performance and audio tuning timeline respectively for the proposed system. Fig 7: Audio conversion loading time for proposed system 5

Fig 8: Audio conversion performance for proposed system Fig 9: Audio tuning timeline for proposed system 4. Discussions: The existing system takes an input as a word and retrieve the result by a Automatic Content Linking Device (ACLD) [13]. Fig 10 shows the time taken to convert a audio file into text message in the existing system and Fig 11 depicts the frequency timeline for existing system. Some of the limitations of the Existing system are: 1. Does not searching in an efficient way (i.e) Searching using a sentence will result in incorrect results. 2. The input can only be a word and multiple words cannot be processed. For example, if the word "mobile computing" is given as the input in the search box then it will show relevant document in 60.0 milliseconds. However, if a sentence "What is mobile computing?" is given as input; it will throw an error "Incorrect sentence". 6

Fig 10: Converting a audio file into a text messages with a speed of 59ms-61ms for existing system Fig 11: Audio to text conversion frequency timeline for existing system So, we have overcome this problem by using FastStringSearch (FSS) [3]. Using FSS, we can search either a word or multiple words and retrieve multiple relevant documents in minimum time (40.0 milliseconds) compared to the existing systems (60 ms). Our system, takes an input as speech (audio), and then retrieveses relevant document in text format from the database. 5. CONCLUSION This fastsearchstring technique improves the search time for the user and the performance is much better than the existing systems. In the proposed system, the single word search or multiple word search retrieves the relevant document in much lesser time than the current system which retrieves only through single word search. In future, real time index updates can be added and the overall stability and performancee of search can be improved by upgrading to Elastic based search 2.x. REFERENCES [1] C. Charras, T. Lecroq, and J. D. Pehoushek, A very fast string matching algorithm for small alphabets and long patterns, Comb. Pattern Matching, pp. 55 64, 1998. [2] D. Brodie, A. Gupta, and W. Shi, Keyword-based fragment detection for dynamic Web content delivery, Thirteen. Int. World Wide Web Conf. Proceedings, WWW2004, pp. 1030 1031, 2004. 7

[3] A. Hume and D. Sunday, Fast string searching, Softw. Pract. Exp., vol. 21, no. 11, pp. 1221 1248, 1991. [4] B. J. Cooper, High Performance Web Caching With Squid Table of Contents, 2002. [5] K. Lyngstøl, Varnish Cache : What it is and why it s cool. Quick Glance, 2013. [6] D. Filipiak and A. Ławrynowicz, Generating Semantic Media Wiki Content from Domain Ontologies, Third Int. Work. Semant. Web Collab. Spaces, pp. 49 58, 2014. [7] B. Venkataramani, SOPC-Based Speech-to-Text Conversion, pp. 83 108, 2006. [8] N. Everett, S. Engineer, C. Horohoe, S. Engineer, D. Garry, and P. Manager, CirrusSearch. [9] K. Kluge, Elasticsearch in 10 seconds, 2014. [10] D. Bijl and H. Hyde-Thomson, Speech to text conversion, 2001. [11] K. Saini, Squid Proxy Server 3. 1 Beginner s Guide About the Author, no. 3. [12] M. Habibi and A. Popescu-Belis, Diverse keyword extraction from conversations, Proc. ACL 2013 (51th, no. 2010, pp. 651 657, 2013. [13] A. Popescu-Belis, M. Yazdani, A. Nanchen, and P. N. Garner, A Speech-based Just-in- Time Retrieval System using Semantic Search, Acl2011, no. June, pp. 80 85, 2011. 8