Natural Language to Database Interface

Similar documents
SQL Generation and PL/SQL Execution from Natural Language Processing

ACCESSING DATABASE USING NLP

NLP - Based Expert System for Database Design and Development

Pattern Based Approach for Natural Language Interface to Database

Design and Implementation of Search Engine Using Vector Space Model for Personalized Search

V. Thulasinath M.Tech, CSE Department, JNTU College of Engineering Anantapur, Andhra Pradesh, India

Ian Kenny. November 28, 2017

SEARCH TECHNIQUES: BASIC AND ADVANCED

Unit Assessment Guide

IMPROVING THE RELEVANCY OF DOCUMENT SEARCH USING THE MULTI-TERM ADJACENCY KEYWORD-ORDER MODEL

An Increasing Efficiency of Pre-processing using APOST Stemmer Algorithm for Information Retrieval

Domain-specific Concept-based Information Retrieval System

Text Mining. Representation of Text Documents

EECS 647: Introduction to Database Systems

Results and Discussions on Transaction Splitting Technique for Mining Differential Private Frequent Itemsets

A NOVEL APPROACH FOR INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLP

Implementation of Smart Question Answering System using IoT and Cognitive Computing

ISSN: [Sugumar * et al., 7(4): April, 2018] Impact Factor: 5.164

Obtaining Rough Set Approximation using MapReduce Technique in Data Mining

Oracle Database 10g: Introduction to SQL

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2

Your Language Catalyst...

Horizontal Aggregations in SQL to Prepare Data Sets Using PIVOT Operator

CE4031 and CZ4031 Database System Principles

Extracting Algorithms by Indexing and Mining Large Data Sets

Online Programming Assessment and Evaluation Platform. In Education System

Ontology for Exploring Knowledge in C++ Language

5 The Control Structure Diagram (CSD)

INFORMATION RETRIEVAL SYSTEM: CONCEPT AND SCOPE

COMPUTER SCIENCE AND ENGINEERING (CSEG)

Empirical Analysis of Single and Multi Document Summarization using Clustering Algorithms

Clustering of Data with Mixed Attributes based on Unified Similarity Metric

A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2

Refinement of Web Search using Word Sense Disambiguation and Intent Mining

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) CONTEXT SENSITIVE TEXT SUMMARIZATION USING HIERARCHICAL CLUSTERING ALGORITHM

Prediction-Based NLP System by Boyer-Moore Algorithm for Requirements Elicitation

Shortcut Method for Implementation of BDRT

MPLEMENTATION OF DIGITAL LIBRARY USING HDFS AND SOLR

Inferring User Search for Feedback Sessions

Keywords Hadoop, Map Reduce, K-Means, Data Analysis, Storage, Clusters.

From Scratch to the Web: Terminological Theses at the University of Innsbruck

CE4031 and CZ4031 Database System Principles

1. Data Definition Language.

AIDAS: Incremental Logical Structure Discovery in PDF Documents

Human Language Query Processing in Temporal Database using Semantic Grammar

Online Editor for Compiling and Executing Different Languages Source Code

Building and Annotating Corpora of Collaborative Authoring in Wikipedia

Research Article. August 2017

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

Project Management Professional (PMP ) Certification

Mining Distributed Frequent Itemset with Hadoop

FREQUENT ITEMSET MINING USING PFP-GROWTH VIA SMART SPLITTING

Semantic Estimation for Texts in Software Engineering

Implementation of Data Mining for Vehicle Theft Detection using Android Application

1. Data Model, Categories, Schemas and Instances. Outline

Master & Doctor of Philosophy Programs in Computer Science

ASSIGNMENT NO Computer System with Open Source Operating System. 2. Mysql

database, Database Management system, Semantic matching, Tokenizer.

Fault Identification from Web Log Files by Pattern Discovery

INTRODUCTION. Chapter GENERAL

INTRODUCTION... 2 FEATURES OF DARWIN... 4 SPECIAL FEATURES OF DARWIN LATEST FEATURES OF DARWIN STRENGTHS & LIMITATIONS OF DARWIN...

Natural Language Processing on Hospitals: Sentimental Analysis and Feature Extraction #1 Atul Kamat, #2 Snehal Chavan, #3 Neil Bamb, #4 Hiral Athwani,

Encryption. ,Pune411027,Maharashtra, India. ,Pune411027,Maharashtra, India. ,Pune411027,Maharashtra, India. ,Pune411027,Maharashtra, India

An approach to the Optimization of menu-based Natural Language Interfaces to Databases

In the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google,

FUZZY SQL for Linguistic Queries Poonam Rathee Department of Computer Science Aim &Act, Banasthali Vidyapeeth Rajasthan India

Oracle and Toad Course Descriptions Instructor: Dan Hotka

Keywords Code cloning, Clone detection, Software metrics, Potential clones, Clone pairs, Clone classes. Fig. 1 Code with clones

An Ameliorated Methodology to Eliminate Redundancy in Databases Using SQL

A Frequent Max Substring Technique for. Thai Text Indexing. School of Information Technology. Todsanai Chumwatana

Certification Exam Preparation Seminar: Oracle Database SQL

Java Archives Search Engine Using Byte Code as Information Source

IEEE Transactions on Fuzzy Systems (TFS) is published bi-monthly (February, April, June, August, October and December).

Ubiquitous Computing and Communication Journal (ISSN )

Information Technology Department, PCCOE-Pimpri Chinchwad, College of Engineering, Pune, Maharashtra, India 2

Towards Generating Domain-Specific Model Editors with Complex Editing Commands

MATRIX BASED INDEXING TECHNIQUE FOR VIDEO DATA

Web Scraping Framework based on Combining Tag and Value Similarity

INCORPORATING ADVANCED PROGRAMMING TECHNIQUES IN THE COMPUTER INFORMATION SYSTEMS CURRICULUM

TEXT CHAPTER 5. W. Bruce Croft BACKGROUND

City University of Hong Kong Course Syllabus. offered by Department of Computer Science with effect from Semester A 2017/18

TERMINOLOGY MANAGEMENT DURING TRANSLATION PROJECTS: PROFESSIONAL TESTIMONY

Automation of Semantic Web based Digital Library using Unified Modeling Language Minal Bhise 1 1

Compilers. Prerequisites

An Algebra for Protein Structure Data

OFF-SITE LEARNING SCHEDULE

MANAGING DATA(BASES) USING SQL (NON-PROCEDURAL SQL, X401.9)

Survey Paper on Traditional Hadoop and Pipelined Map Reduce

An Overview of various methodologies used in Data set Preparation for Data mining Analysis

KDD, SEMMA AND CRISP-DM: A PARALLEL OVERVIEW. Ana Azevedo and M.F. Santos

Development of E-Institute Management System Based on Integrated SSH Framework

Google technology for teachers

Information Push Service of University Library in Network and Information Age

CS 6320 Natural Language Processing

Scannerless Boolean Parsing

Web Data Extraction Using Tree Structure Algorithms A Comparison

ENHANCING DATA MODELS WITH TUNING TRANSFORMATIONS

COMPUTER SCIENCE (CSCI)

School of Computing, Engineering and Information Sciences University of Northumbria. Set Operations

General Architecture of HIVE/WARE

Transcription:

Natural Language to Database Interface Aarti Sawant 1, Pooja Lambate 2, A. S. Zore 1 Information Technology, University of Pune, Marathwada Mitra Mandal Institute Of Technology. Pune, Maharashtra, India Abstract- Information is playing an essential role and one of the foremost sources of information is databases. Almost all IT application are storing and retrieve or extracting information from databases. To retrieve the information one has to be use the Structured Query Language (SQL) for relational database systems. The idea of using Natural Language instead of SQL has prompted the development of new type of processing called Natural language Interface to Database. The natural language interface takes natural language questions as input and gives textual response in the form of data present in the relational database so that informal or non-technical user are not essential to have knowledge of SQL language structures and format hence any user can interact with the databases. Keywords: Natural language processing, Question-Answering in NLP, Natural Language to Database Interface, NLDBI Architecture. 1. INTRODUCTION Information is playing a very important role in our lives today and one of the major sources of information is databases. Databases are gaining major importance in a massive variety of application areas both private and public information systems. Databases are build with the objective of facilitating the behavior of data storage, processing, and retrieval etc. every part of which are coupled with data management in information systems. In relational databases, to retrieve information from a database, one wishes to originate a query in such way that the computer will recognize and fabricate the preferred output. Retrieval of information from the databases needs the knowledge of databases language like the Structured Query Language (SQL).The SQL rule are based on a Boolean interpretation of the queries. But not entire users are known to the Query language and not all user s seeking information from the database can be answered explicitly by the querying system. It is due to the fact that not all the requirement s characteristics can be expressed by regular query languages. To simplify the complexity of SQL and to manipulate of data in databases for common people (not SQL professionals), many researches have turned out to use natural language instead of SQL. The idea of using natural language instead of SQL has led to the development of new type of processing method called Natural Language Interface to Database systems (NLIDB). The NLIDB system is actually a branch of more comprehensive method called Natural Language Processing (NLP). In general, the main objective of NLP research is to create an easy and friendly environment to interact with computers in the sense that computer usage does not require any programming language skills to access the data; only natural language (i.e. English) is required. 2. NATURAL LANGUAGE PROCESSING Natural Language Processing (NLP) is interdisciplinary by nature. Linguistics and artificial intelligence can be combined to develop computer programs. It helps to coordinate activities of understanding or producing texts in a natural language. Database Natural Language Processing is an important success in NLP. Asking questions in natural language to get answers from databases is a very convenient and easy method of data access. 2.1 Qa System In Nlp In QA domain, many QA systems, such as START, Ask have been presented to sustain those users searching the precise information about some topics. With these systems, START may be measured as the finest system that can return the superior answers for users. However, START is only capable to answer questions about concept, but it could not answer the questions concerning causes and methods. Additionally, we had determined that an open source QA system, that is Open-Ephyra; it is an open framework for question answering. It extracts answers to natural language questions from the complementary sources. It was developed on Java framework. This system has: a glossary, a set of questions, method to find out the straightforward answers for questions. Once accepting a question, the system organize question into one of definite categories of question to discover and come apart keywords based on the dictionary. Later than, OpenEphyra uses these keywords to search in dataset paragraphs that contain them. The result will be projected adequate degree and the excellent result will be display to user. Once taking into consideration these QA systems, we presume that recent QA systems perform the question processing based on this theory: 1) Match the query to existing queries form. 2) Generate a set of keywords or a set of queries in knowledge base. 3) Determine the result that fit to the question and generate the answer/result. In nearly all of the typical NLDBi systems the natural language statement is transformed into an internal representation based on the syntactic and semantic information of the natural language. This representation is 1365

then converted into queries using an illustration converter. Prior to a natural language query is translated to a corresponding query in technical language like SQL it has to be moved out through a lot of steps. 1) Preprocessing 2) Grammar Checking 3) Query Generation 4) Data Collection 3. NLDBI ARCHITECTURE Tokenization: This module splits the string of question into number of tokens. Each token is identifying by unique identifier and order number is assign to each token. It is very important to tokenize a question so that the text can undergo further processing. Removing excessive words: This module removes the excessive words from the question which does not play important role in the application. A, an, are, the, is, was, were etc. these are the escape words need to be removed. Mapping rules: After removing excessive words some rules have to be mapped and roles are defined by the system administrator who works on system database. Rules are mapped with user input question statement. SQL element identifier: This module identifies each and every SQL element present in the input question entered by user. Elements are identified using rules and SQL template string. Mapping SQL template: All SQL elements are stored in particular array. SQL template is constructing using available SQL elements. Constructing SQL query: SQL query is formed by constructing SQL template and differentiating SQL elements. Algorithm to generate SQL query from SQL template Begin Store the SQL template string to the variable U Fig 1: Brief Architecture Of Nldbi System User question: Input from the user is given in the form of questions (Like what, who, where).all the values in the input natural language statement have to be in double quotes which yield to identify the values from the user input statement. The user will enter his question in natural language. E.g. English is considering as most preferable natural language. The question to be asked is decided by user by collecting the query data; this question is input to the system on which further processing will carried out. Checking words: The question in string words of asked by user are checked against the words present in the data dictionary. The question contains string of words which should be present in the data dictionary. If word in the question does not match with data dictionary the user has to change the words in the question. Do End If End If (the value of variable U equals tocken1 ) then Store the name of the identified tables in array V Store the value of variable V to T Get the default attribute for the table name T and store it to the variable A Construct a SQL query as SELECT DISTINCT A FROM T and store it to variable W While for each table in array V Data Collection: This module collects the output of the SQL statement and displayed it in the user interface as a outcome of SQL query. 1366

3.1 Algorithm used in NLWIDB System The following stages illustrate the algorithm for the NLDBI system: 1.Check for a user question value and if value is exist Then Remove the value from the question string. Check for all words in question string which occurs in data dictionary. 2. Tokenization (scanning) Fragment the question string into the tokens. Assign index number to each token identified. 3. Removing extreme words in question sequence or string. Mapping rule: Salary of employee is mapped with: emp_salary table Employee id is mapped with: emp_id Identifying SQL elements: Attribute: emp_salary Attribute: emp_id Value: 101 Mapping SQL template: The string code for finding SQL template [emp_salary, emp_id] Found SQL template. Constructing SQL query: Select emp_salary from employee where emp_id = 101 ; 4. Mapping rules by eliminating preceding words from question string. If matches with a rule, remove the matched string from the question string and make the remaining as the question string. Map the rule till you have question string becomes null. 5. Store the identified SQL elements in arrays with its index. 6. Build an SQL pattern string by considering available number of SQL elements. 3.2 Following example shows how the query is formulated from the natural language statement: User will enter input in natural language statement: What is the salary of employee where employee Id = 101? Checking the words: What is the salary of employee where employee Id = 101? All words are present in data dictionary. Running SQL query result will be: Emp_salary 25000 3.3 ADVANTAGES 1. No use of artificial language In this system the users input is entirely in natural language which is most easy for interaction with the database this principle makes the system more flexible and scalable. 2. Fault tolerance Most of NLDBI systems offer some tolerances to negligible grammatical errors, while in a computer system; most of the time, the lexicon should be precisely the same as defined, the syntax should properly follow definite rules, and any errors will cause the input automatically be rejected by the system. Tokenization: Token [0]: What Token [1]: is Token [2]: the Token [3]: salary Token [4]: of Token [5]: employee Token [6]: where Token [7]: employee Token [8]: id Token [9]: = Token [10:] 101 Removing excessive words: The question string after removing excessive words: Salary of employee where employee id = 101 3. Easy access for Multiple Database Tables Queries that include multiple database tables like Show the list of employee where employee salary is greater than 20,000 and dept= marketing such queries are also provides the accurate results. 3.4 Limitations NLDBI system are domain specific: Almost all NLDBI systems are domain specific so that it is difficult to interface with the other databases, in such case it is necessary to regularly update database dictionaries consistently. 1367

4. CONCLUSIONS Natural Language Processing can bring dominant enhancements to almost any database interface. The Natural Language to Database Interface (NLBDI) is no exclusion because we presented the NLDBI system which converts an extensive range of text queries (English questions) into prescribed ones that can then be executed against a database by employing strong language processing techniques and methods. This study illustrates that NLDBI provides a convenient as well as consistent means of querying access, hence, a genuine potential for associating the gap between computer and the normal end users. ACKNOWLEDGMENTS This is a small review of my B.E. thesis work that we are going to implement. I specially thank to my guide for his assistance and guidance. REFERENCES 1. Natural language Interface for Database: A Brief review IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 2, March 2011www.IJCSI.org 2. Natural Language Interface for Database Proceedings of the Third International Symposium, SEUSL: 6-7 July 2013, Olivia, Sri Lanka 3. Incremental Information Extraction Using Relational Databases[IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 24, NO. 1, JANUARY 2012] Luis Tari, Student Member, IEEE Computer Society, Phan Huy Tu,Jo rg Hakenberg, Yi Chen, Member, IEEE Computer Society,Tran Cao Son, Graciela Gonzalez, and Chitta Baral 4. GenerIE: Information Extraction Using Database Queries.Luis Tari1, Phan Huy Tu1, Jorge Hakenberg1 Department of Computer Science and Engineering, Arizona State UniversityTempe, AZ 85287, USA 5. Efficient Information Extraction over Evolving Text Data Jun Yang, Raghu Ramakrishnan Duke University, Yahoo! Research Biographies: Prof. A. S. Zore is currently working as a Lecturer in Marathwada Mitra Mandal Institute of Technology, Pune, India. His research interests are Data Mining, etc. 1368