Rapid Information Discovery System (RAID)

Size: px
Start display at page:

Download "Rapid Information Discovery System (RAID)"

Transcription

1 Int'l Conf. Artificial Intelligence ICAI' Rapid Information Discovery System (RAID) B. Gopal, P. Benjamin, and K. Madanagopal Knowledge Based Systems, Inc. (KBSI), College Station, TX, USA Summary - This paper describes the motivations, solution concepts, and architecture of a framework for a Rapid Information Discovery System (RAID), to support semantic enterprise search and knowledge discovery from large volumes of multi-source text data. First, the overall solution concept is summarized. An ontology-driven approach to natural language processing (NLP) is described. Then the RAID architecture for semantic indexing and semantic search is summarized. The RAID human machine interface and knowledge extraction methods are outlined. Dynamic learning and user feedback-driven adaptation mechanisms are then summarized. Finally, conclusions and opportunities for further R&D are summarized. The concepts described in this paper provide a new approach and solution architecture for iterative and adaptive discovery of information content associated with imprecisely specified descriptions of end user information needs. Key Words: Semantic Search, Natural Language Processing, Knowledge Discovery, Ontologies, Semantic Technologies 1. Background and Motivations Information analysts are slowly drowning in a flood of human and computer generated information. The constantly increasing volumes and velocities of data make it increasingly difficult to identify and utilize key information; hence, making timely and accurate information extraction a very challenging problem. Improvements in semantic modeling, natural language processing, and collaboration technology may provide significant leverage to address this problem. If information analysts were able to express their information needs in plain terms, understood by a search engine as guidance or examples, documents and other information artifacts might be brought to light that simply guessing at appropriate keywords would never elicit. If a search engine to find similar documents could use examples of sought-after text, analysts could focus their attention on interpreting and utilizing the information rather than formulating search criteria. Because information analysts seldom work in isolation, a shared understanding of analysis goals and subsequent sharing of knowledge and effort can significantly improve analytic outcomes. The same technology that would make effective search by example a reality could be used to derive an understanding of analysts goals directly from their search activities. Such an understanding could easily be used to connect analysts working on similar problems or to alert analysts to progress made by others in related areas. Information Retrieval (IR) refers to the activity of obtaining relevant information resources from a collection. Information Extraction (IE) refers to the activity of extracting structured information from an unstructured or semi-structured information resource. In this era of information overload, there is a need to combine the best of both IR and IE to support the timely and accurate discovery of relevant information buried in large volumes of data, regardless of the domain of interest. Mechanisms are also needed that rapidly learn the context and intent of the agents tasks and progressively enhance the quality of information discovered to address their evolving information needs. IR tools typically do not require any customization when applied to new domains and provide a broad coverage by enabling a user to cast a wide net during the search process. Although such systems make it easier to uncover new information, precise information extraction fails because entities, events, and their relationships are not identified. On the other hand, IE tools excel at generating structured data from unstructured text; and hence, identifying relevant and precise concepts. IE tools unfortunately suffer from being overly specific to a domain, invariably requiring much customization to support other domains. According to Etzioni et. al [1], Information Extraction (IE) has traditionally relied on extensive human involvement in the form of hand-crafted extraction rules or hand-tagged training examples. Because IE tools have a narrower coverage, it is harder to stumble upon new information.

2 322 Int'l Conf. Artificial Intelligence ICAI'17 There is a need to combine the best of both IR and IE spectrums to support the discovery of information in large corpora in any domain. Adequate semantic tagging and analysis methods are needed that would intelligently find useful nuggets of information from text corpora through natural language-based semantic analysis of informal end user descriptions or queries. Other research initiatives in semantic search and ontology-based querying address some aspects of the problem targeted by the RAID framework. Representative examples of related research include Simple HTML Ontology Extension (SHOE) [2], TAP [3], Intelligent Semantic Web Retrieval Agent (ISRA) [4], Semantic Content Organization and Retrieval Engine (SCORE) [5], Unsupervised Learning of Semantic Relationships [6], Ontologybased Information Extraction [7][8]. The main limitations of these approaches include (i) inability to adequately capture and end users search context, (ii) inability to address the dynamic nature of the ontologies used for search, and (iii) limited learning and adaptation abilities. The RAID framework described in this paper provides a comprehensive solution to address these semantic search challenges and research gaps. RAID supports the discovery of relevant information across large volumes of data. Moreover, the RAID approach provides mechanisms to facilitate the modeling of user information needs while using learning mechanisms to progressively improve the search results based on the users interaction with the system. 2. The RAID Solution Overview RAID provides a web-based enterprise search capability that enables focused and high precision semantic search over disparate data sources using various types of user inputs and/or user-defined ontologies. The RAID approach applies ontologydriven text analytics and NLP methods in order to extract and discover knowledge from collections of structured and unstructured data sources. The RAID solution differs from other search technologies in two important ways: (1) Support for both structured and unstructured data: Typically, enterprises have access to both unstructured data on a file system and structured data in databases. RAID can process both types of data; hence, supporting searching across various data sources. The application can index and search against the textual contents of SQL Server and Oracle databases in addition to documents on a file system. (2) Support for a variety of rich input types: RAID accepts a number of different user input types beyond basic keywords. The more input the user provides, the more accurate is the process of query formulation and hence, more semantic content may be extracted by the query builder. Consequently the search results will match the user s search tasks more accurately. Specifically, the different types of input supported include the following: Keywords: Similar to many search engines, RAID allows the user to directly input the individual terms s/he is looking for. The user also has the option of assigning a weight to each term, and to indicate whether the term must, should, or must not occur in the target data. The user also has the ability to specify exact phrase matching. Example Text: The user may also provide example text that discusses, in natural language, the concepts for which the user is searching. This text may be as short as a single sentence or as long as an entire document or an entire directory of documents. The example text is analyzed by the RAID query builder, which generates a weighted list of important tokens to search for. Ontologies: An ontology captures important concepts and relationships relevant to the domain of the user s search task. In RAID, a user can enhance the search process through the use of a specific ontology, which identifies terms of particular interest within a domain. Ontology models often provide information that associates context with a specific search and can be used to disambiguate terms and provide background knowledge that might help in interpreting content. For example, the term launch would have a different meaning to a music executive, a rocket scientist, and a web entrepreneur. Using an ontology model to assist the search affects both the content and the weighting of the term list generated by the query builder. Users can specify an ontology through a Controlled Natural Language (CNL) interface, eliminating the need to understand advanced ontology modeling concepts. Acronym Lists and Glossaries: Finally, the user can supply a set of acronym lists and/or glossaries to augment the search. Both these inputs provide additional domain information about specific terms that may appear in the search inputs or in the target data. The query builder uses this information to augment the weights and content of the term list it generates. 3. The Ontology-Driven Text Analytics Solution Concept

3 Int'l Conf. Artificial Intelligence ICAI' A distinguishing aspect of RAID is the use of an ontology-driven approach to text processing. First, we outline the upper level ontology used by RAID. Figure 1 provides an overview of the scope of information contained in the core Ontological Semantics Resource (OSR) upper ontology. The OSR was designed as a resource for deep natural language understanding. As such it contains approximately 27,000 lexical items (to process the information as displayed ) and 5,000 concepts (to model the information conveyed ). The data model below shows that each lexeme connects to at least one word use sense. Each word use sense may have both syntactic and semantic constraints that govern its correct occurrence (e.g., a joke can bomb and a house can be bombed, but a house can t bomb). Each word sense has one concept label that represents a concept within the OSR ontology. The Concept component of the OSR is the meaning model (ontology) of the OSR. Concepts from the OSR ontology can be event, object or property types. When a lexeme maps to multiple word senses, the syntactic (SynStruct) and semantic (SemStruct) constraints are used to disambiguate and select the proper word sense assignment. Figure 1. OSR Ontology Overview 4. RAID NLP RAID uses KBSI s Natural Language Processing Pipeline (KNLP ), an ontology driven semantic information extraction module that is designed to process unstructured data. We use the term unstructured to refer to text that is in a natural language form that conforms closely to the rules of English grammar (Figure 2). Figure 2. The RAID KNLP Pipeline As shown in Figure 2, the KNLP comprises eight stages. Each block in the pipeline is labeled according to the set of tags that are added to the input text after the input text has passed through the block. The following list provides a description of the functional blocks. Sentence Boundary Detection (SBD): This module splits the input text into sentences. Tokenization (TKN): This module splits each sentence into a set of tokens. Named Entity Recognition Level 1 (NER1): This module consists of two algorithms. The first algorithm classifies sets of adjacent tokens as PEOPLE, ORGANIZATIONS, and LOCATIONS when appropriate. The second algorithms uses a set of regular expressions to recognize a wide array of different entity types such as part numbers, system identifies, MGRS coordinates, and addresses to name a few. The set of entity types for this second algorithm is expandable based on the requirements of the input text. Named entity recognition is the first step in transforming input text into a semantic representation. Part of Speech Tagging (POS): This module labels the individual tokens with their corresponding part of speech tags. Phrase Chunking (CNK): This module groups adjacent tokens into phrases such as noun and verb phrases. Named Entity Recognition Level 2 (NER2): This module is an additional stage of named entity recognition that makes use of the phrase boundaries calculated at the CNK level. Subject-Verb-Object, Clause Identification (SVO): This module identifies the subject, verb, and object with a clause. It also identifies the boundaries between clauses. Semantic Analyzer (SEM): This module maps noun phrases and verb phrases into the concepts within the OSR ontology. The SEM component works to provide concept labels for each identified noun and verb phrase within a sentence using different semantic analysis techniques. The SEM module depends on the output from all previous stages to perform its processing. The SEM module also depends heavily on the OSR ontology to provide the syntactic and semantic constraints used in the semantic analysis calculations. The main function of the SEM component is to

4 324 Int'l Conf. Artificial Intelligence ICAI'17 perform semantic disambiguation at varying levels of fidelity. 5. Semantic Indexing and Search The RAID semantic indexing is shown in Figure 3 and the search architecture is shown in Figure 4. Figure 3. RAID Semantic Indexing The main goal of the semantic indexing component is to generate a Lucene index and a triple store from the corpus of interest. The RAID index is generated by the Term Indexer component that indexes every term from the text extracted from the documents by the Text Extractor. The RAID triple store represents the central location where RDF triples are stored and is generated by the Triple Indexer. The Triple Indexer processes the output of the KNLP Pipeline and extracts semantic content from the document corpus in the form of RDF. This triple store will store and maintain knowledge from three different sources: (i) Knowledge extracted from the document corpus during indexing, (ii) Knowledge supplied by the analysts in the form of ontology input and the OSR and (iii) Knowledge inferred by the RAID system based on user feedback. Figure 4. RAID Search Architecture The RAID search architecture is summarized as follows. During a search process, a user may provide the following inputs: keywords, example text, ontologies, acronyms and glossaries. After processing these inputs through the KNLP Pipeline, the output is fed to two different pipelines. The first pipeline generates a weighted search query through the Query Builder, and the query is used to search against the RAID index. This results in one set of search results. The second pipeline involves a SPARQL Builder and a SPARQL Processor. The SPARQL Builder component analyzes the different types of user inputs (example text, keywords, ontologies, acronym lists, and glossaries) and uses the results to generate a SPARQL query that asks for the existence of certain concepts and relationships. The SPARQL Processor component serves as the interface to the triple store, processing the SPARQL queries, making inferences from the domain knowledge coded in the triple store and returning results. There are a number of third party SPARQL processors available such as JENA [9]. This results in another set of search results being generated. The two sets of search results generated by the two separate pipelines will then be merged and re-ranked by the Search Results Ranker. After viewing the search results, the user may provide some feedback about each result, which will then be persisted in the triple store. With SPARQL, the RAID system will ask for the existence of ontological concepts that match specific criteria. The result of the query will be a list of the concepts in the triple store that match the criteria. These specific concepts can then be used as search criteria in the index, and the documents containing those concepts will be returned as search results. For example, a SPARQL query may essentially ask the question Are there any instances of weapons owned by North Korea in the RDF store? During indexing, the semantic analyzer may have extracted the following knowledge: (i) No-dong is a type of missile and (ii) No-dong is owned by North Korea Additionally, through user-supplied ontologies, the RDF triple store may also contain the following item of knowledge: A missile is a type of weapon. With this knowledge present in the RDF triple store, the SPARQL query described above would result in Nodong being returned. This term could then be

5 Int'l Conf. Artificial Intelligence ICAI' supplied to the RAID semantic query builder and searched for in the index, resulting in a list of documents containing the concept No-dong. In addition to individual concepts, this approach would also work for searching for relationships (e.g., Find documents that assert that X is owned by Y, where X is a weapon ). 6. Dynamic Human Machine Interface Mechanisms The RAID framework provides intuitive Human Machine Interface (HMI) mechanisms to capture the user s queries as accurately and interactively as possible. The intuitive user interfaces are intended to augment and complement the ontology-driven text analytics algorithms. The solution architecture provides intelligent, intuitive, and interactive user interface mechanisms that assist the end user by supporting multiple steps in the knowledge discovery process. The idea is to empower the user in having an influence on the query expansion and term weighting process steps before these tasks are executed. Figure 5 shows an example of a user interface that allows data exploration in an intuitive manner. The user, who has no idea of what the dataset contains, types a query crimes in Mexico in the RAID application. The semantic tagging capability will generate the simple ontology shown below the query. Crime gets tagged as CRIMINAL_ACTIVITY and is connected to Mexico by the HAS_LOCATION relationship. It should be noted that RAID will allow the user to change the semantic tags in case of incorrect tagging. At this point, the user can expand, collapse, delete or even manually add a concept. By selecting the concept CRIMINAL_ACTIVITY and choosing the option subclasses, the user is presented with the subclasses. The user narrows down his search for one specific type of CRIMINAL_ACTIVITY, for example, DRUG_TRAFFICKING by deleting the other subclasses. Further expansion of the DRUG_TRAFFICKING concept reveals the ontology structure around that concept. At this point, if the user executes a search, he will get results as shown on the far right of Figure 5. Notice how after this exploratory phase, results that talk about marijuana, heroin and cocaine in Mexico are now returned. Figure 5. Dynamic Query Building in RAID All this interaction is stored and used later by the feedback and learning mechanism. RAID provides the ability to incrementally save user interaction information resulting in more accurate user modeling. Information saved includes user interaction history, current query, past queries in the same search session, and past queries in the entire search history. The saved results are later used by RAID to inform the learning and adaption mechanisms. 7. RAID Knowledge Extraction RAID provides the ability to extract entities and relationships and events from text data. The extracted entities and events are mapped to concepts in the OSR ontology. The mapping of unstructured text into the ontology of the OSR is performed by the KNLP Pipeline s Semantic Analysis component. The RAID approach to unsupervised relationship extraction builds upon on Bollega s [6] and Hasegawa s [10] approaches. The architecture of a Relationship Extraction pipeline is shown in Figure 6. Figure 6. Relationship Extraction Pipeline Instance Generator: This module is responsible for extracting training instances from the supplied data. This module does not require any import from the user to function. This module performs a single pass over data and extracts instances of the form {NP1}

6 326 Int'l Conf. Artificial Intelligence ICAI'17 <text> {NP2} from the text. Here NP1 and NP2 are entity pairs that can be noun phrases or named entities. NP1 and NP2 can also be extracted based on syntactic guidelines, like their head nouns have to begin with an uppercase letter. <text> represents the text between the two noun phrases, and is called the context. For the example input The market closed high because of the news of Google hiring Dr. William, the instance generator would output {Google} <hiring> {Dr. William}. Our approach uses a combination of several shallow linguistic features with a set of deep semantic features extracted by the KNLP Pipeline. In our example, the instance generator would output {Google} <hiring> {Dr. William}. The KNLP Pipeline extracts named entities along with their respective types and OSR tags. These extra features will aid the entity-pair clustering at a later stage of the text processing pipeline. The output of the Instance Generator is a context matrix. Each row in the matrix represents an entity pair, and the columns represent the context. For a row i and column j, a cell in the matrix represents the number of times entity pair i occurred with j context. Clustering: Each row in the instance matrix represents a context vector for an entity pair. This module performs clustering on all the row vectors. Entity pairs that occur in similar types of relationships will be clustered together. These clusters will be used to train a machine learning classifier. Classifier: This module uses the generated clusters to train a machine learning classifier. Each entity pair in a cluster is assigned a label yk where yk is the cluster id to which the entity pair belong to. The goal of the classifier is to learn: P(yk ei), where ei is the context vector of an entity pair. Runtime tagging: Once the model is learned, new entity pairs are extracted and classified without any input from the user. At runtime, all the noun phrases and their context vector are extracted as noted in the Instance Generator description. The clustering module are bypassed and the instances in the instance matrix are fed directly into the classifier. 8. RAID Dynamic Improvement from User Feedback Analysts have the ability to provide feedback to the RAID system, both implicitly and explicitly, starting from the first step of query building all the way to the review of search results. This feedback is processed by RAID in order to enhance the quality of queries and improve results from subsequent searches. Additionally, the system automatically detects similarities between different analysts search tasks and uses the feedback and results found by one analyst to improve the results of another analyst. Implicit feedback By using a technique very similar to the way RAID is indexing its corpus for semantic search, all queries that are issued in the tool are indexed. Subsequently, when a new query is generated, the new query is tokenized and weighted; this weighted term vector is then used to search the index of all previous queries. The search results are similar previously run queries from both, the same user as well as other collaborators. This capability can support (i) query completion by providing assistance while a user enters a query and (ii) query suggestion by suggesting useful related queries from the same user or collaborators. Explicit feedback In IR systems it is important to make use of the user s judgment on previous searches to continuously learn and enhance the performance of the IR system. Currently, the RAID tool captures user feedback through unobtrusive links in the user interface. When a search is performed, each result shows three options beside it that the user can click: Thumbs Up. Clicking this link marks the search result as relevant to the current search task. Thumbs Down. Clicking this link marks the search result as irrelevant to the current search task. Save. Clicking this saves the search result to the RAID database, associated with the user s current search task. This enables the user to revisit search results at another time, and to share results with other analysts. Any search result that is saved is also marked as relevant to the current search task. Figure 7. Process of Refining the Query through Relevance Feedback

7 Int'l Conf. Artificial Intelligence ICAI' As shown in Figure 7, the main idea behind relevance feedback is to perform an initial query, incorporate feedback from previous searches regarding which documents are relevant and which are irrelevant, and then augment the initial query by adding, removing, and re-weighting terms. Using the Rocchio Vector Space Relevance Feedback algorithm, the initial query gets modified after the relevance feedback as shown below: Q = aq + b sum(r) - c sum(s) Q: original query vector R: set of relevant document vectors S: set of non-relevant document vectors a, b, c: constants (Rocchio weights) Q : new query vector The effect of modifying the initial query is to bias the query towards more relevant documents in the document vector space. 9. Conclusions and Opportunities for Further R&D This paper described a solution architecture for iterative and adaptive discovery of information content associated with imprecisely specified descriptions of end user information needs The RAID solution provides several benefits, including (i) significant and measurable gains in the precision and the recall of searches performed by information analysts, resulting in a significant reduction in the time required to discovered relevant information; (ii) significant increase in information sharing among analysts, reducing redundant search efforts and increasing the overall quality of information discovered by a collaborating analysts; and (iii) significant increases in precision and recall of analysts searches over time as the RAID system better learns the search tasks of users through both explicit and implicit user feedback. Areas that would benefit through further R&D include (i) enhanced scalability of semantic search through the use of big data technologies; (ii) enhanced abilities to dynamically update and revise the ontology models that drive the semantic information extraction engine; and (iii) enhanced capabilities to use better use ontologies and automated reasoning techniques to improve semantic search and knowledge discovery. 10. References [1] O. Etzioni, O., Banko, M., Soderland S, & Weld D. Open Information Extraction from the Web. Communications of the ACM, Vol. 51, No. 12, December [2] Heflin, J. and Hendler, J. Searching the web with SHOE, AAAI Workshop, WS-00-01, AAAI Press, Menlo Park, CA, pp.35 40, [3] Guha, R., McCool, R. and Miller, E., Semantic search, WWW 03: Proc. of the Twelfth Int. Conf. on WWW, May, Budapest, Hungary, [4] Burton-Jones, A., Storey, V.C., Sugumaran, V. and Purao, S., A heuristic-based methodology for semantic augmentation of user queries on the web, 22nd Int. Conf. on Conceptual Modeling, Chicago, IL, USA, Oct.13 16, Proceedings, pp , [5] Sheth, A., Bertram, C., Avant, D., et. al., Managing semantic content for the web, IEEE Internet Computing, Vol. 6, No. 4, pp.80 87, [6] D. Bollegala, Y. Matsuo, & M. Ishizuka. Relational Duality: Unsupervised Extraction of Semantic Relations between Entities on the Web. WWW 2010, Raleigh, NC, April 26-30, [7] D. Dou, H. Wang, H. Liu. Semantic data mining: A survey of ontology-based approaches, Semantic Computing (ICSC), 2015 IEEE International Conference on, IEEE (2015), pp [8] R. Shah and S. Jain. Ontology-based Information Extraction: An Overview and a Study of different Approaches, International Journal of Computer Applications 87(4):6-8, February [9] [10] Hasegawa, T., Sekine, S., & Grishman, R. Discovering relations among named entities from large corpora. ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 1 Department of Electronics & Comp. Sc, RTMNU, Nagpur, India 2 Department of Computer Science, Hislop College, Nagpur,

More information

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent

More information

A Framework for Ontology Life Cycle Management

A Framework for Ontology Life Cycle Management A Framework for Ontology Life Cycle Management Perakath Benjamin, Nitin Kumar, Ronald Fernandes, and Biyan Li Knowledge Based Systems, Inc., College Station, TX, USA Abstract - This paper describes a method

More information

A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2

A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2 A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2 1 Student, M.E., (Computer science and Engineering) in M.G University, India, 2 Associate Professor

More information

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.

More information

Ontology Based Prediction of Difficult Keyword Queries

Ontology Based Prediction of Difficult Keyword Queries Ontology Based Prediction of Difficult Keyword Queries Lubna.C*, Kasim K Pursuing M.Tech (CSE)*, Associate Professor (CSE) MEA Engineering College, Perinthalmanna Kerala, India lubna9990@gmail.com, kasim_mlp@gmail.com

More information

Semantic Clickstream Mining

Semantic Clickstream Mining Semantic Clickstream Mining Mehrdad Jalali 1, and Norwati Mustapha 2 1 Department of Software Engineering, Mashhad Branch, Islamic Azad University, Mashhad, Iran 2 Department of Computer Science, Universiti

More information

Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm

Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm Rekha Jain 1, Sulochana Nathawat 2, Dr. G.N. Purohit 3 1 Department of Computer Science, Banasthali University, Jaipur, Rajasthan ABSTRACT

More information

User Configurable Semantic Natural Language Processing

User Configurable Semantic Natural Language Processing User Configurable Semantic Natural Language Processing Jason Hedges CEO and Founder Edgetide LLC info@edgetide.com (443) 616-4941 Table of Contents Bridging the Gap between Human and Machine Language...

More information

Introduction to Text Mining. Hongning Wang

Introduction to Text Mining. Hongning Wang Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:

More information

Information Retrieval

Information Retrieval Introduction Information Retrieval Information retrieval is a field concerned with the structure, analysis, organization, storage, searching and retrieval of information Gerard Salton, 1968 J. Pei: Information

More information

Finding Topic-centric Identified Experts based on Full Text Analysis

Finding Topic-centric Identified Experts based on Full Text Analysis Finding Topic-centric Identified Experts based on Full Text Analysis Hanmin Jung, Mikyoung Lee, In-Su Kang, Seung-Woo Lee, Won-Kyung Sung Information Service Research Lab., KISTI, Korea jhm@kisti.re.kr

More information

WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY

WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.4, April 2009 349 WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY Mohammed M. Sakre Mohammed M. Kouta Ali M. N. Allam Al Shorouk

More information

2 Experimental Methodology and Results

2 Experimental Methodology and Results Developing Consensus Ontologies for the Semantic Web Larry M. Stephens, Aurovinda K. Gangam, and Michael N. Huhns Department of Computer Science and Engineering University of South Carolina, Columbia,

More information

Text Mining: A Burgeoning technology for knowledge extraction

Text Mining: A Burgeoning technology for knowledge extraction Text Mining: A Burgeoning technology for knowledge extraction 1 Anshika Singh, 2 Dr. Udayan Ghosh 1 HCL Technologies Ltd., Noida, 2 University School of Information &Communication Technology, Dwarka, Delhi.

More information

Data and Information Integration: Information Extraction

Data and Information Integration: Information Extraction International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Data and Information Integration: Information Extraction Varnica Verma 1 1 (Department of Computer Science Engineering, Guru Nanak

More information

An Approach To Web Content Mining

An Approach To Web Content Mining An Approach To Web Content Mining Nita Patil, Chhaya Das, Shreya Patanakar, Kshitija Pol Department of Computer Engg. Datta Meghe College of Engineering, Airoli, Navi Mumbai Abstract-With the research

More information

Overview of Web Mining Techniques and its Application towards Web

Overview of Web Mining Techniques and its Application towards Web Overview of Web Mining Techniques and its Application towards Web *Prof.Pooja Mehta Abstract The World Wide Web (WWW) acts as an interactive and popular way to transfer information. Due to the enormous

More information

CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS

CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS 82 CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS In recent years, everybody is in thirst of getting information from the internet. Search engines are used to fulfill the need of them. Even though the

More information

Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language

Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language Dong Han and Kilian Stoffel Information Management Institute, University of Neuchâtel Pierre-à-Mazel 7, CH-2000 Neuchâtel,

More information

Powering Knowledge Discovery. Insights from big data with Linguamatics I2E

Powering Knowledge Discovery. Insights from big data with Linguamatics I2E Powering Knowledge Discovery Insights from big data with Linguamatics I2E Gain actionable insights from unstructured data The world now generates an overwhelming amount of data, most of it written in natural

More information

Precise Medication Extraction using Agile Text Mining

Precise Medication Extraction using Agile Text Mining Precise Medication Extraction using Agile Text Mining Chaitanya Shivade *, James Cormack, David Milward * The Ohio State University, Columbus, Ohio, USA Linguamatics Ltd, Cambridge, UK shivade@cse.ohio-state.edu,

More information

IJCSC Volume 5 Number 1 March-Sep 2014 pp ISSN

IJCSC Volume 5 Number 1 March-Sep 2014 pp ISSN Movie Related Information Retrieval Using Ontology Based Semantic Search Tarjni Vyas, Hetali Tank, Kinjal Shah Nirma University, Ahmedabad tarjni.vyas@nirmauni.ac.in, tank92@gmail.com, shahkinjal92@gmail.com

More information

XETA: extensible metadata System

XETA: extensible metadata System XETA: extensible metadata System Abstract: This paper presents an extensible metadata system (XETA System) which makes it possible for the user to organize and extend the structure of metadata. We discuss

More information

Information Retrieval

Information Retrieval Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,

More information

Making Sense Out of the Web

Making Sense Out of the Web Making Sense Out of the Web Rada Mihalcea University of North Texas Department of Computer Science rada@cs.unt.edu Abstract. In the past few years, we have witnessed a tremendous growth of the World Wide

More information

Text Mining. Representation of Text Documents

Text Mining. Representation of Text Documents Data Mining is typically concerned with the detection of patterns in numeric data, but very often important (e.g., critical to business) information is stored in the form of text. Unlike numeric data,

More information

Empirical Analysis of Single and Multi Document Summarization using Clustering Algorithms

Empirical Analysis of Single and Multi Document Summarization using Clustering Algorithms Engineering, Technology & Applied Science Research Vol. 8, No. 1, 2018, 2562-2567 2562 Empirical Analysis of Single and Multi Document Summarization using Clustering Algorithms Mrunal S. Bewoor Department

More information

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper.

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper. Semantic Web Company PoolParty - Server PoolParty - Technical White Paper http://www.poolparty.biz Table of Contents Introduction... 3 PoolParty Technical Overview... 3 PoolParty Components Overview...

More information

Semantic Web Mining and its application in Human Resource Management

Semantic Web Mining and its application in Human Resource Management International Journal of Computer Science & Management Studies, Vol. 11, Issue 02, August 2011 60 Semantic Web Mining and its application in Human Resource Management Ridhika Malik 1, Kunjana Vasudev 2

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK REVIEW PAPER ON IMPLEMENTATION OF DOCUMENT ANNOTATION USING CONTENT AND QUERYING

More information

CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets

CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets Arjumand Younus 1,2, Colm O Riordan 1, and Gabriella Pasi 2 1 Computational Intelligence Research Group,

More information

Enterprise Multimedia Integration and Search

Enterprise Multimedia Integration and Search Enterprise Multimedia Integration and Search José-Manuel López-Cobo 1 and Katharina Siorpaes 1,2 1 playence, Austria, 2 STI Innsbruck, University of Innsbruck, Austria {ozelin.lopez, katharina.siorpaes}@playence.com

More information

Parmenides. Semi-automatic. Ontology. construction and maintenance. Ontology. Document convertor/basic processing. Linguistic. Background knowledge

Parmenides. Semi-automatic. Ontology. construction and maintenance. Ontology. Document convertor/basic processing. Linguistic. Background knowledge Discover hidden information from your texts! Information overload is a well known issue in the knowledge industry. At the same time most of this information becomes available in natural language which

More information

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department

More information

Theme Identification in RDF Graphs

Theme Identification in RDF Graphs Theme Identification in RDF Graphs Hanane Ouksili PRiSM, Univ. Versailles St Quentin, UMR CNRS 8144, Versailles France hanane.ouksili@prism.uvsq.fr Abstract. An increasing number of RDF datasets is published

More information

Ontology Matching with CIDER: Evaluation Report for the OAEI 2008

Ontology Matching with CIDER: Evaluation Report for the OAEI 2008 Ontology Matching with CIDER: Evaluation Report for the OAEI 2008 Jorge Gracia, Eduardo Mena IIS Department, University of Zaragoza, Spain {jogracia,emena}@unizar.es Abstract. Ontology matching, the task

More information

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) CONTEXT SENSITIVE TEXT SUMMARIZATION USING HIERARCHICAL CLUSTERING ALGORITHM

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) CONTEXT SENSITIVE TEXT SUMMARIZATION USING HIERARCHICAL CLUSTERING ALGORITHM INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & 6367(Print), ISSN 0976 6375(Online) Volume 3, Issue 1, January- June (2012), TECHNOLOGY (IJCET) IAEME ISSN 0976 6367(Print) ISSN 0976 6375(Online) Volume

More information

Annotating Spatio-Temporal Information in Documents

Annotating Spatio-Temporal Information in Documents Annotating Spatio-Temporal Information in Documents Jannik Strötgen University of Heidelberg Institute of Computer Science Database Systems Research Group http://dbs.ifi.uni-heidelberg.de stroetgen@uni-hd.de

More information

RPI INSIDE DEEPQA INTRODUCTION QUESTION ANALYSIS 11/26/2013. Watson is. IBM Watson. Inside Watson RPI WATSON RPI WATSON ??? ??? ???

RPI INSIDE DEEPQA INTRODUCTION QUESTION ANALYSIS 11/26/2013. Watson is. IBM Watson. Inside Watson RPI WATSON RPI WATSON ??? ??? ??? @ INSIDE DEEPQA Managing complex unstructured data with UIMA Simon Ellis INTRODUCTION 22 nd November, 2013 WAT SON TECHNOLOGIES AND OPEN ARCHIT ECT URE QUEST ION ANSWERING PROFESSOR JIM HENDLER S IMON

More information

A Linguistic Approach for Semantic Web Service Discovery

A Linguistic Approach for Semantic Web Service Discovery A Linguistic Approach for Semantic Web Service Discovery Jordy Sangers 307370js jordysangers@hotmail.com Bachelor Thesis Economics and Informatics Erasmus School of Economics Erasmus University Rotterdam

More information

A Hybrid Unsupervised Web Data Extraction using Trinity and NLP

A Hybrid Unsupervised Web Data Extraction using Trinity and NLP IJIRST International Journal for Innovative Research in Science & Technology Volume 2 Issue 02 July 2015 ISSN (online): 2349-6010 A Hybrid Unsupervised Web Data Extraction using Trinity and NLP Anju R

More information

Markov Chains for Robust Graph-based Commonsense Information Extraction

Markov Chains for Robust Graph-based Commonsense Information Extraction Markov Chains for Robust Graph-based Commonsense Information Extraction N iket Tandon 1,4 Dheera j Ra jagopal 2,4 Gerard de M elo 3 (1) Max Planck Institute for Informatics, Germany (2) NUS, Singapore

More information

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A.

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. Knowledge Retrieval Franz J. Kurfess Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. 1 Acknowledgements This lecture series has been sponsored by the European

More information

Discovering Semantic Similarity between Words Using Web Document and Context Aware Semantic Association Ranking

Discovering Semantic Similarity between Words Using Web Document and Context Aware Semantic Association Ranking Discovering Semantic Similarity between Words Using Web Document and Context Aware Semantic Association Ranking P.Ilakiya Abstract The growth of information in the web is too large, so search engine come

More information

Ontology Extraction from Heterogeneous Documents

Ontology Extraction from Heterogeneous Documents Vol.3, Issue.2, March-April. 2013 pp-985-989 ISSN: 2249-6645 Ontology Extraction from Heterogeneous Documents Kirankumar Kataraki, 1 Sumana M 2 1 IV sem M.Tech/ Department of Information Science & Engg

More information

Keywords Data alignment, Data annotation, Web database, Search Result Record

Keywords Data alignment, Data annotation, Web database, Search Result Record Volume 5, Issue 8, August 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Annotating Web

More information

OPEN INFORMATION EXTRACTION FROM THE WEB. Michele Banko, Michael J Cafarella, Stephen Soderland, Matt Broadhead and Oren Etzioni

OPEN INFORMATION EXTRACTION FROM THE WEB. Michele Banko, Michael J Cafarella, Stephen Soderland, Matt Broadhead and Oren Etzioni OPEN INFORMATION EXTRACTION FROM THE WEB Michele Banko, Michael J Cafarella, Stephen Soderland, Matt Broadhead and Oren Etzioni Call for a Shake Up in Search! Question Answering rather than indexed key

More information

TEXT CHAPTER 5. W. Bruce Croft BACKGROUND

TEXT CHAPTER 5. W. Bruce Croft BACKGROUND 41 CHAPTER 5 TEXT W. Bruce Croft BACKGROUND Much of the information in digital library or digital information organization applications is in the form of text. Even when the application focuses on multimedia

More information

PROJECT PERIODIC REPORT

PROJECT PERIODIC REPORT PROJECT PERIODIC REPORT Grant Agreement number: 257403 Project acronym: CUBIST Project title: Combining and Uniting Business Intelligence and Semantic Technologies Funding Scheme: STREP Date of latest

More information

Information Retrieval CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Information Retrieval CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science Information Retrieval CS 6900 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Information Retrieval Information Retrieval (IR) is finding material of an unstructured

More information

An Efficient Technique for Tag Extraction and Content Retrieval from Web Pages

An Efficient Technique for Tag Extraction and Content Retrieval from Web Pages An Efficient Technique for Tag Extraction and Content Retrieval from Web Pages S.Sathya M.Sc 1, Dr. B.Srinivasan M.C.A., M.Phil, M.B.A., Ph.D., 2 1 Mphil Scholar, Department of Computer Science, Gobi Arts

More information

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS 1 WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS BRUCE CROFT NSF Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts,

More information

Information Extraction Techniques in Terrorism Surveillance

Information Extraction Techniques in Terrorism Surveillance Information Extraction Techniques in Terrorism Surveillance Roman Tekhov Abstract. The article gives a brief overview of what information extraction is and how it might be used for the purposes of counter-terrorism

More information

MIRACLE at ImageCLEFmed 2008: Evaluating Strategies for Automatic Topic Expansion

MIRACLE at ImageCLEFmed 2008: Evaluating Strategies for Automatic Topic Expansion MIRACLE at ImageCLEFmed 2008: Evaluating Strategies for Automatic Topic Expansion Sara Lana-Serrano 1,3, Julio Villena-Román 2,3, José C. González-Cristóbal 1,3 1 Universidad Politécnica de Madrid 2 Universidad

More information

A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet

A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet Joerg-Uwe Kietz, Alexander Maedche, Raphael Volz Swisslife Information Systems Research Lab, Zuerich, Switzerland fkietz, volzg@swisslife.ch

More information

The Goal of this Document. Where to Start?

The Goal of this Document. Where to Start? A QUICK INTRODUCTION TO THE SEMILAR APPLICATION Mihai Lintean, Rajendra Banjade, and Vasile Rus vrus@memphis.edu linteam@gmail.com rbanjade@memphis.edu The Goal of this Document This document introduce

More information

What is this Song About?: Identification of Keywords in Bollywood Lyrics

What is this Song About?: Identification of Keywords in Bollywood Lyrics What is this Song About?: Identification of Keywords in Bollywood Lyrics by Drushti Apoorva G, Kritik Mathur, Priyansh Agrawal, Radhika Mamidi in 19th International Conference on Computational Linguistics

More information

Metadata and the Semantic Web and CREAM 0

Metadata and the Semantic Web and CREAM 0 Metadata and the Semantic Web and CREAM 0 1 Siegfried Handschuh, 1;2 Steffen Staab, 1;3 Alexander Maedche 1 Institute AIFB, University of Karlsruhe, D-76128 Karlsruhe, Germany http://www.aifb.uni-karlsruhe.de/wbs

More information

Iterative Learning of Relation Patterns for Market Analysis with UIMA

Iterative Learning of Relation Patterns for Market Analysis with UIMA UIMA Workshop, GLDV, Tübingen, 09.04.2007 Iterative Learning of Relation Patterns for Market Analysis with UIMA Sebastian Blohm, Jürgen Umbrich, Philipp Cimiano, York Sure Universität Karlsruhe (TH), Institut

More information

Development of an Ontology-Based Portal for Digital Archive Services

Development of an Ontology-Based Portal for Digital Archive Services Development of an Ontology-Based Portal for Digital Archive Services Ching-Long Yeh Department of Computer Science and Engineering Tatung University 40 Chungshan N. Rd. 3rd Sec. Taipei, 104, Taiwan chingyeh@cse.ttu.edu.tw

More information

A Lightweight Approach to Semantic Tagging

A Lightweight Approach to Semantic Tagging A Lightweight Approach to Semantic Tagging Nadzeya Kiyavitskaya, Nicola Zeni, Luisa Mich, John Mylopoulus Department of Information and Communication Technologies, University of Trento Via Sommarive 14,

More information

Text mining tools for semantically enriching the scientific literature

Text mining tools for semantically enriching the scientific literature Text mining tools for semantically enriching the scientific literature Sophia Ananiadou Director National Centre for Text Mining School of Computer Science University of Manchester Need for enriching the

More information

Cluster-based Instance Consolidation For Subsequent Matching

Cluster-based Instance Consolidation For Subsequent Matching Jennifer Sleeman and Tim Finin, Cluster-based Instance Consolidation For Subsequent Matching, First International Workshop on Knowledge Extraction and Consolidation from Social Media, November 2012, Boston.

More information

TSS: A Hybrid Web Searches

TSS: A Hybrid Web Searches 410 TSS: A Hybrid Web Searches Li-Xin Han 1,2,3, Gui-Hai Chen 3, and Li Xie 3 1 Department of Mathematics, Nanjing University, Nanjing 210093, P.R. China 2 Department of Computer Science and Engineering,

More information

Customisable Curation Workflows in Argo

Customisable Curation Workflows in Argo Customisable Curation Workflows in Argo Rafal Rak*, Riza Batista-Navarro, Andrew Rowley, Jacob Carter and Sophia Ananiadou National Centre for Text Mining, University of Manchester, UK *Corresponding author:

More information

Query-Time JOIN for Active Intelligence Engine (AIE)

Query-Time JOIN for Active Intelligence Engine (AIE) Query-Time JOIN for Active Intelligence Engine (AIE) Ad hoc JOINing of Structured Data and Unstructured Content: An Attivio-Patented Breakthrough in Information- Centered Business Agility An Attivio Technology

More information

Chapter 27 Introduction to Information Retrieval and Web Search

Chapter 27 Introduction to Information Retrieval and Web Search Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval

More information

End-User Evaluations of Semantic Web Technologies

End-User Evaluations of Semantic Web Technologies End-User Evaluations of Semantic Web Technologies Rob McCool 1, Andrew J. Cowell 2 and David A. Thurman 3 1 Knowledge Systems Lab, Stanford University robm@ksl.stanford.edu 2 Rich Interaction Environments,

More information

Juggling the Jigsaw Towards Automated Problem Inference from Network Trouble Tickets

Juggling the Jigsaw Towards Automated Problem Inference from Network Trouble Tickets Juggling the Jigsaw Towards Automated Problem Inference from Network Trouble Tickets Rahul Potharaju (Purdue University) Navendu Jain (Microsoft Research) Cristina Nita-Rotaru (Purdue University) April

More information

Text Mining for Software Engineering

Text Mining for Software Engineering Text Mining for Software Engineering Faculty of Informatics Institute for Program Structures and Data Organization (IPD) Universität Karlsruhe (TH), Germany Department of Computer Science and Software

More information

Available online at ScienceDirect. Procedia Computer Science 52 (2015 )

Available online at  ScienceDirect. Procedia Computer Science 52 (2015 ) Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 52 (2015 ) 1071 1076 The 5 th International Symposium on Frontiers in Ambient and Mobile Systems (FAMS-2015) Health, Food

More information

Improving the Efficiency of Fast Using Semantic Similarity Algorithm

Improving the Efficiency of Fast Using Semantic Similarity Algorithm International Journal of Scientific and Research Publications, Volume 4, Issue 1, January 2014 1 Improving the Efficiency of Fast Using Semantic Similarity Algorithm D.KARTHIKA 1, S. DIVAKAR 2 Final year

More information

Question Answering Approach Using a WordNet-based Answer Type Taxonomy

Question Answering Approach Using a WordNet-based Answer Type Taxonomy Question Answering Approach Using a WordNet-based Answer Type Taxonomy Seung-Hoon Na, In-Su Kang, Sang-Yool Lee, Jong-Hyeok Lee Department of Computer Science and Engineering, Electrical and Computer Engineering

More information

Domain Specific Search Engine for Students

Domain Specific Search Engine for Students Domain Specific Search Engine for Students Domain Specific Search Engine for Students Wai Yuen Tang The Department of Computer Science City University of Hong Kong, Hong Kong wytang@cs.cityu.edu.hk Lam

More information

Deep Web Content Mining

Deep Web Content Mining Deep Web Content Mining Shohreh Ajoudanian, and Mohammad Davarpanah Jazi Abstract The rapid expansion of the web is causing the constant growth of information, leading to several problems such as increased

More information

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN:

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN: IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 20131 Improve Search Engine Relevance with Filter session Addlin Shinney R 1, Saravana Kumar T

More information

Web Information Retrieval using WordNet

Web Information Retrieval using WordNet Web Information Retrieval using WordNet Jyotsna Gharat Asst. Professor, Xavier Institute of Engineering, Mumbai, India Jayant Gadge Asst. Professor, Thadomal Shahani Engineering College Mumbai, India ABSTRACT

More information

SEMANTIC INFORMATION RETRIEVAL USING ONTOLOGY IN UNIVERSITY DOMAIN

SEMANTIC INFORMATION RETRIEVAL USING ONTOLOGY IN UNIVERSITY DOMAIN SEMANTIC INFORMATION RETRIEVAL USING ONTOLOGY IN UNIVERSITY DOMAIN Swathi Rajasurya, Tamizhamudhu Muralidharan, Sandhiya Devi, Prof.Dr.S.Swamynathan Department of Information and Technology,College of

More information

Developing InfoSleuth Agents Using Rosette: An Actor Based Language

Developing InfoSleuth Agents Using Rosette: An Actor Based Language Developing InfoSleuth Agents Using Rosette: An Actor Based Language Darrell Woelk Microeclectronics and Computer Technology Corporation (MCC) 3500 Balcones Center Dr. Austin, Texas 78759 InfoSleuth Architecture

More information

A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK

A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK Qing Guo 1, 2 1 Nanyang Technological University, Singapore 2 SAP Innovation Center Network,Singapore ABSTRACT Literature review is part of scientific

More information

Maximizing the Value of STM Content through Semantic Enrichment. Frank Stumpf December 1, 2009

Maximizing the Value of STM Content through Semantic Enrichment. Frank Stumpf December 1, 2009 Maximizing the Value of STM Content through Semantic Enrichment Frank Stumpf December 1, 2009 What is Semantics and Semantic Processing? Content Knowledge Framework Technology Framework Search Text Images

More information

NUS-I2R: Learning a Combined System for Entity Linking

NUS-I2R: Learning a Combined System for Entity Linking NUS-I2R: Learning a Combined System for Entity Linking Wei Zhang Yan Chuan Sim Jian Su Chew Lim Tan School of Computing National University of Singapore {z-wei, tancl} @comp.nus.edu.sg Institute for Infocomm

More information

Text Mining. Munawar, PhD. Text Mining - Munawar, PhD

Text Mining. Munawar, PhD. Text Mining - Munawar, PhD 10 Text Mining Munawar, PhD Definition Text mining also is known as Text Data Mining (TDM) and Knowledge Discovery in Textual Database (KDT).[1] A process of identifying novel information from a collection

More information

Actionable User Intentions for Real-Time Mobile Assistant Applications

Actionable User Intentions for Real-Time Mobile Assistant Applications Actionable User Intentions for Real-Time Mobile Assistant Applications Thimios Panagos, Shoshana Loeb, Ben Falchuk Applied Research, Telcordia Technologies One Telcordia Drive, Piscataway, New Jersey,

More information

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and

More information

Revealing the Modern History of Japanese Philosophy Using Digitization, Natural Language Processing, and Visualization

Revealing the Modern History of Japanese Philosophy Using Digitization, Natural Language Processing, and Visualization Revealing the Modern History of Japanese Philosophy Using Digitization, Natural Language Katsuya Masuda *, Makoto Tanji **, and Hideki Mima *** Abstract This study proposes a framework to access to the

More information

Domain-specific Concept-based Information Retrieval System

Domain-specific Concept-based Information Retrieval System Domain-specific Concept-based Information Retrieval System L. Shen 1, Y. K. Lim 1, H. T. Loh 2 1 Design Technology Institute Ltd, National University of Singapore, Singapore 2 Department of Mechanical

More information

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda 1 Observe novel applicability of DL techniques in Big Data Analytics. Applications of DL techniques for common Big Data Analytics problems. Semantic indexing

More information

An Approach for Accessing Linked Open Data for Data Mining Purposes

An Approach for Accessing Linked Open Data for Data Mining Purposes An Approach for Accessing Linked Open Data for Data Mining Purposes Andreas Nolle, German Nemirovski Albstadt-Sigmaringen University nolle, nemirovskij@hs-albsig.de Abstract In the recent time the amount

More information

EXTRACTION INFORMATION ADAPTIVE WEB. The Amorphic system works to extract Web information for use in business intelligence applications.

EXTRACTION INFORMATION ADAPTIVE WEB. The Amorphic system works to extract Web information for use in business intelligence applications. By Dawn G. Gregg and Steven Walczak ADAPTIVE WEB INFORMATION EXTRACTION The Amorphic system works to extract Web information for use in business intelligence applications. Web mining has the potential

More information

Ontology based Web Page Topic Identification

Ontology based Web Page Topic Identification Ontology based Web Page Topic Identification Abhishek Singh Rathore Department of Computer Science & Engineering Maulana Azad National Institute of Technology Bhopal, India Devshri Roy Department of Computer

More information

Text Document Clustering Using DPM with Concept and Feature Analysis

Text Document Clustering Using DPM with Concept and Feature Analysis Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 2, Issue. 10, October 2013,

More information

Semantic Web Technology Evaluation Ontology (SWETO): A Test Bed for Evaluating Tools and Benchmarking Applications

Semantic Web Technology Evaluation Ontology (SWETO): A Test Bed for Evaluating Tools and Benchmarking Applications Wright State University CORE Scholar Kno.e.sis Publications The Ohio Center of Excellence in Knowledge- Enabled Computing (Kno.e.sis) 5-22-2004 Semantic Web Technology Evaluation Ontology (SWETO): A Test

More information

SEMANTIC WEB POWERED PORTAL INFRASTRUCTURE

SEMANTIC WEB POWERED PORTAL INFRASTRUCTURE SEMANTIC WEB POWERED PORTAL INFRASTRUCTURE YING DING 1 Digital Enterprise Research Institute Leopold-Franzens Universität Innsbruck Austria DIETER FENSEL Digital Enterprise Research Institute National

More information

Advances in Natural and Applied Sciences. Information Retrieval Using Collaborative Filtering and Item Based Recommendation

Advances in Natural and Applied Sciences. Information Retrieval Using Collaborative Filtering and Item Based Recommendation AENSI Journals Advances in Natural and Applied Sciences ISSN:1995-0772 EISSN: 1998-1090 Journal home page: www.aensiweb.com/anas Information Retrieval Using Collaborative Filtering and Item Based Recommendation

More information

Letter Pair Similarity Classification and URL Ranking Based on Feedback Approach

Letter Pair Similarity Classification and URL Ranking Based on Feedback Approach Letter Pair Similarity Classification and URL Ranking Based on Feedback Approach P.T.Shijili 1 P.G Student, Department of CSE, Dr.Nallini Institute of Engineering & Technology, Dharapuram, Tamilnadu, India

More information

A Review on Identifying the Main Content From Web Pages

A Review on Identifying the Main Content From Web Pages A Review on Identifying the Main Content From Web Pages Madhura R. Kaddu 1, Dr. R. B. Kulkarni 2 1, 2 Department of Computer Scienece and Engineering, Walchand Institute of Technology, Solapur University,

More information

Social Behavior Prediction Through Reality Mining

Social Behavior Prediction Through Reality Mining Social Behavior Prediction Through Reality Mining Charlie Dagli, William Campbell, Clifford Weinstein Human Language Technology Group MIT Lincoln Laboratory This work was sponsored by the DDR&E / RRTO

More information

TISA Methodology Threat Intelligence Scoring and Analysis

TISA Methodology Threat Intelligence Scoring and Analysis TISA Methodology Threat Intelligence Scoring and Analysis Contents Introduction 2 Defining the Problem 2 The Use of Machine Learning for Intelligence Analysis 3 TISA Text Analysis and Feature Extraction

More information