Advances in Data Networks, Communications, Computers and Materials
|
|
- Jasper Anthony
- 6 years ago
- Views:
Transcription
1 Multi-dimensional Data Model of Textual Information PAVEL ČECH Department of Information Technologies University of Hradec Kralove Rokitanského 62, Hradec Králové, CZECH REPUBLIC Abstract: - The key to understanding the meaning is in the proper representation of the textual information. The tree like representation of a text is not very convenient for large scale processing since the information about types of phrases, part of speech tags and actual terms are scattered in various tree nodes. The aim of the article is to propose a multidimensional model of textual data that would contain all the lexical information in a structure that enables further semantic processing. The approach in this article is based on the data modeling principles used in data warehousing. Key-Words: - Multi-dimensional data model, data warehousing, star schema, text representation, lexical parsing. 1 Introduction Information retrieval and text mining have already attracted attention of many researchers. The major challenge in these areas is the understanding of meaning reposed in the text. The key to understanding the meaning is in the proper representation of the textual information. Today, it is known that the traditional term based representation of documents disregards the semantic differences contained in connections between the terms [29]. The novel concept based approaches utilize lexical parsing to extract semantics. Lexical parsing usually results in a tree like representation of a text (such as sentence). Such structure is however not very convenient for further large scale processing. The information about types of phrases, part of speech tags and actual terms are scattered in various levels of the tree depending on the text complexity. The aim of the article is to propose a multidimensional model of textual data that would contain all the lexical information in a structure that enables further semantic processing. The approach in this article is based on the data modeling principles used in data warehousing [9, 10, 12]. The supposed major advantage of the proposed approach is in a relational calculus and standardized query language for fast and flexible retrieving of lexical information. Hence, suggested approach would enable easy exploration of the text using the various information retrieval and text mining techniques such as term proximity calculation, n-gram determination, Markov model application and Formal Concept Analysis. The approach also addresses a problem stated by Castells et al that once the replacement of documents by bag of knowledge inevitably implies a loss of information [5]. Using this approach the documents can be fully reconstructed and thus new perspectives of the text could be reflected. The paper presents the description of the proposed multidimensional model and a query possibilities as well as the verification of the approach on a sample set of approx. 200 documents. 2 Related work There are various data models being used in the field of information retrieval and text mining. Basically, the two broad categories can be identified. The first category is term-based in which the vector-space model is being used [6, 17, 19, 20, 25]. The second category is concept or semantic based that utilizes ontology and a graph or tree like structures [5, 7, 11, 14, 16, 21]. There are also approaches that entail dimensionality or multidimensionality [27, 28, 29]. For example, Tseng proposed a multidimensional indexing structure, called D-tree [28], and corresponding query expressions MD2X (Multi- Dimensional Document Expressions) for document warehouses [27]. Tseng characterizes documents as a set of keywords and defines dimensions as a tree structure representing the hierarchical relations ISBN:
2 between keywords. The query expression MD 2 X is an extension to MDX (Multi-Dimensional expressions) that is designed to include complete set of SQL-like constructs so that to enable the traditional OLAP (Online Analytical Processing) operation such as drill up and drill down on the above defined dimensions. Similarly, Urbain et al designed a dimensional information retrieval model for genomics literature search. The model combines concept-based semantics and term statistics within multiple levels of document context in order to identify concise, variable length passages of text that answer a user query [29]. Their approach is based on a parallel dimensional indexing model in which the term index is used for identification of terms related to biological concepts and a concept index is used to search for resolved concepts. Furtehr, de Mazière et al [6] analyzed documents and organized them using special visualization techniques. In order to properly categorize the documents, they employed lexical parsing, n-gram finding and subsequent vectorization to use TF-IDF technique and clustering. There also works that refer directly to data/web warehousing concepts. Thus, the issues of warehousing web data have been discussed in [1, 2, 3, 4] in which the concept of a Web Information Coupling and the WHOWEDA (Warehouse of Web Data) project have been introduced. The elaborated schema was a graph structure consisting of nodes representing web pages and links representing hyperlinks. They provided formal description of the schema and algebra with manipulation operators to query/navigate among interconnected pages. Further, Fuller et al constructed an entity retrieval model for semi-structured information found in distributed and heterogeneous data sources such as world wide web or data of social networks [7]. In their approach they regard documents (HTML pages embedding RDF) to contain a set of statements describing one or more entities. The model use consists of entities and entity descriptions that form a graph-like structure in three levels. It has to be noted, that none but one of the reviewed articles discuss the multidimensional model as it is being declared in the data warehousing concepts and principles. The multidimensionality refers rather to a vector space or the warehouse represents just the collection of data ignoring the subject orientation and other characteristics. Predominant data model is a graph like or tree like structure that are close to how ontologies are organized. The proper application of data warehousing principles is found only in Urbain et al [29]. Urbain, however, does not refer to parts of the speech and phrases as dimensions and also the links between words are not considered. 3 Approach The approach in this paper is based on the adoption of the data warehousing principles widely implemented in the so called business intelligence [15, 30]. Originally, data warehouse referred to a large volume data repository that enabled informed decision making of high level business managers. The term data warehouse was coined and promoted by Bill Inmon, who is also the author of commonly accepted definition of a data warehouse as a subject oriented, nonvolatile, integrated, time variant collection of data in support of management's decisions [10]. Correspondingly, the data warehousing can be regarded as a collection of methods and principles of how to build and operate data warehouse. There are various approaches of how to realize the data warehouse using a relational database [10, 12, 13, 18, 24]. However, the core concept is a multidimensional data model or schema that is, in its simple form, often referred to as a star schema [9, 12]. The dimensions represent entities or perspectives through which the data should be viewed or analyzed. Dimensions are organized around particular measures (usually numerical data such as sales). In relational databases the dimensions and measures are represented as tables (dimension tables and fact tables respectively). Unlike in traditional operational databases the data in data warehouse might not be normalized [9, 12]. The variant of the star schema, in which the dimensions are normalized, requires the data to be separated into several interrelated dimension that form a structure similar to a snowflake. This is used in cases in which data in dimensions can form a hierarchy (e.g. time dimension can be split into days, months, years or other time periods or location can be split into city, country, etc.). However, querying data stored in the snowflake schema is computationally more expensive since it entails more table joins. In the application of a multidimensional model to textual data, the dimensions can be formed from the various lexical elements and structures (such as ISBN:
3 words, sentences, documents and also parts of speech tags, etc. The measures contain occurrences of entities in dimensions. Thus the occurrence can refer to a particular occurrence of a certain word in a given sentence from a given document. Populating the model would require some lexical parsing that might be achieved using the various tools and libraries such as Apache OpenNLP [26], Stanford NLP [23] or GATE [22]. 3.1 Proposed multi-dimensional data model The proposed model consists of the following dimensions (see fig. 1): Fig.1: multi-dimensional data model Words individual words; serves as a unique collection of all words indentified; POS parts of the speech tags of words (such as noun, adjective, etc.); for example the Penn Treebank [8] tagset is suggested to be used; Phrases phrases that the words are parts of (such as noun phrase or prepositional phrase); optional if no information about the phrase would be stored in that case the phrase would be stored in the fact table; Sentences sentences as containing the words and are parts of the paragraphs and documents; optional if no information about the sentence would be stored in that case the sentence would be captured as order in document or paragraph in the fact table; Paragraphs paragraphs as consisting of sentences; the paragraphs would serve for providing context and priority (e.g. paragraph with title might have greater weight than normal text). This dimension is optional if in some documents paragraphs might not be identified. Documents documents such as pages are characterized by the title and location i.e. URL in case of pages or path in case of files in a document repository. Word_Links links between consecutive words; the link is realized between two word occurrences; this dimension might not be necessary since the orders of the various parts are stored, thus when word occurrences are properly ordered the next word occurrence represents the next word. However, linking the words would allow easier manipulation using a relation calculus in which there are no operators for next or previous records. The Documents, Paragraphs, Sentences dimensions allow construction of a hierarchy, however, that would be more computationally demanding. Hence, the star schema is followed linking all dimensions directly to the fact table. Still, the dimensions are suggested to be normalized it means there will be no redundancy. In case the frequency of sentences and phrases are to be calculated a new fact tables for sentence or phrase occurrence might be created. However, the same information might be stored in the word occurrences table. If Word_Links are normalized it would be characterized by link frequency that would be increased whenever the same consecutive pair of words is encountered. This enables pattern identification based e.g. on n-gram computation. In a simple form the model is suggested to contain one fact table called Words_Occurrences that would connect the data from the above mentioned dimensions. In relational databases the connection is realized using the foreign keys concept. Apart from the foreign keys the fact table would contain the word_order that is the order of a given word in a sentence. 3.2 Querying proposed data model The proposed data model can be queried using the standardized SQL (Structured Query Language) or ISBN:
4 MDX (Multi-Dimensional expressions). This would allow for easy manipulation with data since various views can be defined and also the off-the-shelf OLAP tools might be used to analyze the data. Thus, for example an OLAP cube might be constructed above the data to compute aggregates and to allow drill down and drill down operations. The following SQL statement might be used to reconstruct the whole text enriched with part of speech tags and phrase types: SELECT w.word, pos.tag, phr.type wo.document_id, wo.word_order, wo.sentence_order FROM dbo.tf_word_occurrences AS wo INNER JOIN dbo.td_words AS w ON wo.word_id = w.id INNER JOIN dbo.td_pos AS pos ON wo.pos_id = pos.id INNER JOIN dbo.td_phrases AS phr ON wo.phrase_id = phr.id ORDER BY wo.document_id, wo.sentence_order, wo.word_order There is also a possibility to identify word pairs. The following SQL statement would identify all adjective noun pairs: SELECT v1.word AS adjective, v2.word AS noun, FROM dbo.vx_wo_words_pos AS v1 INNER JOIN dbo.vx_wo_words_pos AS v2 ON v1.wo_after_id = v2.id WHERE (v1.tag = 'JJ') AND (v2.tag = 'NN') The expression v1.wo_after_id = v2.id has been realized using the Word_Links dimension. The frequency of the above identified adjective noun pairs in documents can be computed as follows: SELECT document_title, adjective, noun, count(*) counts FROM vl_adjectives_nouns an INNER JOIN td_documents d ON d.id = an.document_id GROUP BY title, adjective, noun HAVING count(*) > 1 ORDER BY title, counts DESC The result set from the above SQL statements is limited only to adjective noun pairs that appear in a given document at least twice. The model enables for even more complex patterns to be analyzed. For instance, in the formation of concept hierarchies the is a pattern provides a good clue to identify sub and super concepts. The search expression between word w1 that is a noun and a word w2 that is also noun and a phrase p that is is a can be formulated as: w1.tag='nn' and p.text='is a' and w2.tag='nn' The pattern might be extended to include also adjectives that would further specify the concepts. 4 Verification The verification of suggested approach has been conducted on the corpus of 180 documents or pages from Wikipedia. In that corpus over 40 thousands unique words were identified with more than 400 thousand word occurrences in 19 thousand sentences. In order to analyze the documents, the suggested multidimensional model based on a star schema was created. Lexical parsing have been provided using OpenNLP framework [26]. During the lexical parsing the paragraphs were not identified and thus the Paragraphs dimension was omitted. Similarly, the sentence order in the document and word links between successive words were stored as a part of the word occurrence fact table. The model enables the easy calculation of various aggregations. The Table 1 presents a sample of noun counts for selected documents. Document title Counts Evolutionary programming 68 Candidate solution 76 Genetic representation 84 Chromosome (genetic algorithm) 89 Business intelligence tools 91 Social behavior 93 Table 1: Count of nouns in selected documents ISBN:
5 More complicated aggregations are also possible. Using the above presented SQL statements the table 2 shows the frequency of keywords consisting of an adjective and noun in a document titled Computational science. Keywords Counts computational science 12 scientific computing 5 numerical analysis 3 computational engineering 2 scientific computation 2 Table 2: Frequency of adjective noun pairs Computing the frequency of keywords can serve as a basis for the TF-IDF calculations and for other analysis. 5 Conclusion The multi-dimensional data model appeared to provide fast and flexible way of retrieving lexical information. The query mechanism is based on the standardized query language and provides for further semantic analysis of the text. The major disadvantage of the approach is the extent of the data. The possibility to fully reconstruct the analyzed text is obviously at the expense of the disk space. However, in space saving alternative mainly the word occurrence fact table would grow proportionally with the length of the text analyzed given that there is relatively small number of unique words that are used in a language. References: [1] S. Bhowmick, S. Madria, W. Ng, and E. Lim, "Web Warehousing: Design and Issues Advances in Database Technologies." vol. 1552, Y. Kambayashi, D.-L. Lee, E.-p. Lim, M. Mohania, and Y. Masunaga, Eds., ed: Springer Berlin / Heidelberg, 1999, pp [2] S. Bhowmick, W.-K. Ng, and E.-P. Lim, "Information Coupling in Web Databases Conceptual Modeling ER 98." vol. 1507, T.- W. Ling, S. Ram, and M. Li Lee, Eds., ed: Springer Berlin / Heidelberg, 1998, pp [3] Y. Cao, E.-P. Lim, and W.-K. Ng, "On Warehousing Historical Web Information Conceptual Modeling ER 2000." vol. 1920, A. Laender, S. Liddle, and V. Storey, Eds., ed: Springer Berlin / Heidelberg, 2000, pp [4] Y. Cao, E.-P. Lim, and W.-K. Ng, "Data model for warehousing historical Web information," Information and Software Technology, vol. 45, pp , [5] P. Castells, M. Fernandez, and D. Vallet, "An Adaptation of the Vector-Space Model for Ontology-Based Information Retrieval," IEEE Trans. on Knowl. and Data Eng., vol. 19, pp , [6] P. A. De Mazière and M. M. Van Hulle, "A clustering study of a 7000 EU document inventory using MDS and SOM," Expert Systems with Applications, vol. 38, pp , [7] C. M. Fuller, D. P. Biros, and D. Delen, "An investigation of data and text mining methods for real world deception detection," Expert Systems with Applications, vol. 38, pp , [8] J. Hajič, "Treebanks and Tagsets," in Encyclopedia of Language & Linguistics (Second Edition), B. Editor-in-Chief: Keith, Ed., ed Oxford: Elsevier, 2006, pp [9] J. Han, M. Kamber, and J. Pei, "Data Warehousing and Online Analytical Processing," in Data Mining (Third Edition), ed Boston: Morgan Kaufmann, 2012, pp [10] W. H. Inmon, Building the Data Warehouse.: Wiley and Sons, [11] B.-Y. Kang, D.-W. Kim, and S.-J. Lee, "Exploiting concept clusters for content-based information retrieval," Information Sciences, vol. 170, pp , [12] R. Kimball and R. Margy, The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, 2nd ed.: Wiley, [13] S. Lightstone, T. Teorey, and T. Nadeau, "Physical design for decision support, warehousing, and OLAP," in Physical Database Design, ed Burlington: Morgan Kaufmann, 2007, pp [14] H.-T. Lin, N.-W. Chi, and S.-H. Hsieh, "A concept-based information retrieval approach for engineering domain-specific technical documents," Advanced Engineering Informatics, vol. 26, pp , [15] D. Loshin, Business Intelligence. San Francisco: Morgan Kaufmann, [16] Q. Luo, E. Chen, and H. Xiong, "A semantic term weighting scheme for text categorization," Expert Systems with Applications, vol. 38, pp , ISBN:
6 [17] W. Mao and W. W. Chu, "The phrase-based vector space model for automatic retrieval of free-text medical documents," Data & Knowledge Engineering, vol. 61, pp , [18] S. T. March and A. R. Hevner, "Integrated decision support systems: A data warehousing perspective," Decision Support Systems, vol. 43, pp , [19] S. Robertson, "Understanding inverse document frequency: on theoretical arguments for IDF," Journal of Documentation, vol. 60, pp , [20] G. Salton, "Term-weighting approaches in automatic text retrieval," Information Processing & Management, vol. 24 pp , [21] R. Setchi, Q. Tang, and I. Stankov, "Semanticbased information retrieval in support of concept design," Advanced Engineering Informatics, vol. 25, pp , [22] T. U. o. Sheffield. (2012, 7 July, 2012). GATE [23] T. Stanford. (2012, 7 July, 2012). Stanford NLP Group. Available: [24] A. Subramanian, L. Douglas Smith, A. C. Nelson, J. F. Campbell, and D. A. Bird, "Strategic planning for data warehousing," Information & Management, vol. 33, pp , [25] X. Tai, F. Ren, and K. Kita, "An information retrieval model based on vector space method by supervised learning," Information Processing & Management, vol. 38, pp , [26] The Apache Software Foundation. (2012, 7 July, 2012). Apache OpenNLP. Available: [27] F. S. C. Tseng, "Design of a multi-dimensional query expression for document warehouses," Information Sciences, vol. 174, pp , [28] F. S. C. Tseng and W.-P. Lin, "D-Tree: a multidimensional indexing structure for constructing document warehouses," Journal of Information Science and Engineering, vol. 22, pp , [29] J. Urbain, N. Goharian, and O. Frieder, "A dimensional retrieval model for integrating semantics and statistical evidence in context for genomics literature search," Computers in Biology and Medicine, vol. 39, pp , [30] S. Williams and N. Williams, The Profit Impact of Business Intelligence. San Francisco: Morgan Kaufmann, ISBN:
A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective
A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective B.Manivannan Research Scholar, Dept. Computer Science, Dravidian University, Kuppam, Andhra Pradesh, India
More informationNovel Materialized View Selection in a Multidimensional Database
Graphic Era University From the SelectedWorks of vijay singh Winter February 10, 2009 Novel Materialized View Selection in a Multidimensional Database vijay singh Available at: https://works.bepress.com/vijaysingh/5/
More informationData warehouse architecture consists of the following interconnected layers:
Architecture, in the Data warehousing world, is the concept and design of the data base and technologies that are used to load the data. A good architecture will enable scalability, high performance and
More information1. Inroduction to Data Mininig
1. Inroduction to Data Mininig 1.1 Introduction Universe of Data Information Technology has grown in various directions in the recent years. One natural evolutionary path has been the development of the
More informationShrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent
More informationFast and Effective System for Name Entity Recognition on Big Data
International Journal of Computer Sciences and Engineering Open Access Research Paper Volume-3, Issue-2 E-ISSN: 2347-2693 Fast and Effective System for Name Entity Recognition on Big Data Jigyasa Nigam
More informationFig 1.2: Relationship between DW, ODS and OLTP Systems
1.4 DATA WAREHOUSES Data warehousing is a process for assembling and managing data from various sources for the purpose of gaining a single detailed view of an enterprise. Although there are several definitions
More informationData Mining. Data warehousing. Hamid Beigy. Sharif University of Technology. Fall 1394
Data Mining Data warehousing Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 22 Table of contents 1 Introduction 2 Data warehousing
More informationDatabase design View Access patterns Need for separate data warehouse:- A multidimensional data model:-
UNIT III: Data Warehouse and OLAP Technology: An Overview : What Is a Data Warehouse? A Multidimensional Data Model, Data Warehouse Architecture, Data Warehouse Implementation, From Data Warehousing to
More informationData Warehousing and OLAP Technologies for Decision-Making Process
Data Warehousing and OLAP Technologies for Decision-Making Process Hiren H Darji Asst. Prof in Anand Institute of Information Science,Anand Abstract Data warehousing and on-line analytical processing (OLAP)
More informationChapter 3. The Multidimensional Model: Basic Concepts. Introduction. The multidimensional model. The multidimensional model
Chapter 3 The Multidimensional Model: Basic Concepts Introduction Multidimensional Model Multidimensional concepts Star Schema Representation Conceptual modeling using ER, UML Conceptual modeling using
More informationINTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) CONTEXT SENSITIVE TEXT SUMMARIZATION USING HIERARCHICAL CLUSTERING ALGORITHM
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & 6367(Print), ISSN 0976 6375(Online) Volume 3, Issue 1, January- June (2012), TECHNOLOGY (IJCET) IAEME ISSN 0976 6367(Print) ISSN 0976 6375(Online) Volume
More informationThis tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.
About the Tutorial A data warehouse is constructed by integrating data from multiple heterogeneous sources. It supports analytical reporting, structured and/or ad hoc queries and decision making. This
More informationEnabling Off-Line Business Process Analysis: A Transformation-Based Approach
Enabling Off-Line Business Process Analysis: A Transformation-Based Approach Arnon Sturm Department of Information Systems Engineering Ben-Gurion University of the Negev, Beer Sheva 84105, Israel sturm@bgu.ac.il
More informationijade Reporter An Intelligent Multi-agent Based Context Aware News Reporting System
ijade Reporter An Intelligent Multi-agent Based Context Aware Reporting System Eddie C.L. Chan and Raymond S.T. Lee The Department of Computing, The Hong Kong Polytechnic University, Hung Hong, Kowloon,
More informationDHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI
DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI Department of Information Technology IT6702 Data Warehousing & Data Mining Anna University 2 & 16 Mark Questions & Answers Year / Semester: IV / VII Regulation:
More informationData Mining Concepts & Techniques
Data Mining Concepts & Techniques Lecture No. 01 Databases, Data warehouse Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro
More informationHierarchies in a multidimensional model: From conceptual modeling to logical representation
Data & Knowledge Engineering 59 (2006) 348 377 www.elsevier.com/locate/datak Hierarchies in a multidimensional model: From conceptual modeling to logical representation E. Malinowski *, E. Zimányi Department
More informationRocky Mountain Technology Ventures
Rocky Mountain Technology Ventures Comparing and Contrasting Online Analytical Processing (OLAP) and Online Transactional Processing (OLTP) Architectures 3/19/2006 Introduction One of the most important
More informationSTRATEGIC INFORMATION SYSTEMS IV STV401T / B BTIP05 / BTIX05 - BTECH DEPARTMENT OF INFORMATICS. By: Dr. Tendani J. Lavhengwa
STRATEGIC INFORMATION SYSTEMS IV STV401T / B BTIP05 / BTIX05 - BTECH DEPARTMENT OF INFORMATICS LECTURE: 05 (A) DATA WAREHOUSING (DW) By: Dr. Tendani J. Lavhengwa lavhengwatj@tut.ac.za 1 My personal quote:
More informationData Warehouse and Data Mining
Data Warehouse and Data Mining Lecture No. 04-06 Data Warehouse Architecture Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology
More informationA Hybrid Unsupervised Web Data Extraction using Trinity and NLP
IJIRST International Journal for Innovative Research in Science & Technology Volume 2 Issue 02 July 2015 ISSN (online): 2349-6010 A Hybrid Unsupervised Web Data Extraction using Trinity and NLP Anju R
More informationData-Driven Driven Business Intelligence Systems: Parts I. Lecture Outline. Learning Objectives
Data-Driven Driven Business Intelligence Systems: Parts I Week 5 Dr. Jocelyn San Pedro School of Information Management & Systems Monash University IMS3001 BUSINESS INTELLIGENCE SYSTEMS SEM 1, 2004 Lecture
More informationThe Near Greedy Algorithm for Views Selection in Data Warehouses and Its Performance Guarantees
The Near Greedy Algorithm for Views Selection in Data Warehouses and Its Performance Guarantees Omar H. Karam Faculty of Informatics and Computer Science, The British University in Egypt and Faculty of
More informationInformation Modeling and Relational Databases
Information Modeling and Relational Databases Second Edition Terry Halpin Neumont University Tony Morgan Neumont University AMSTERDAM» BOSTON. HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO
More informationA Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2
A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 1 Department of Electronics & Comp. Sc, RTMNU, Nagpur, India 2 Department of Computer Science, Hislop College, Nagpur,
More informationGUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV
GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV Subject Name: Elective I Data Warehousing & Data Mining (DWDM) Subject Code: 2640005 Learning Objectives: To understand
More informationData Mining. Data warehousing. Hamid Beigy. Sharif University of Technology. Fall 1396
Data Mining Data warehousing Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction 2 Data warehousing
More informationQuestion Bank. 4) It is the source of information later delivered to data marts.
Question Bank Year: 2016-2017 Subject Dept: CS Semester: First Subject Name: Data Mining. Q1) What is data warehouse? ANS. A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile
More informationDATA WAREHOUSING IN LIBRARIES FOR MANAGING DATABASE
DATA WAREHOUSING IN LIBRARIES FOR MANAGING DATABASE Dr. Kirti Singh, Librarian, SSD Women s Institute of Technology, Bathinda Abstract: Major libraries have large collections and circulation. Managing
More informationWKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems
Management Information Systems Management Information Systems B10. Data Management: Warehousing, Analyzing, Mining, and Visualization Code: 166137-01+02 Course: Management Information Systems Period: Spring
More informationCS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures)
CS614- Data Warehousing Solved MCQ(S) From Midterm Papers (1 TO 22 Lectures) BY Arslan Arshad Nov 21,2016 BS110401050 BS110401050@vu.edu.pk Arslan.arshad01@gmail.com AKMP01 CS614 - Data Warehousing - Midterm
More informationETL and OLAP Systems
ETL and OLAP Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first semester
More informationBig Trend in Business Intelligence: Data Mining over Big Data Web Transaction Data. Fall 2012
Big Trend in Business Intelligence: Data Mining over Big Data Web Transaction Data Fall 2012 Data Warehousing and OLAP Introduction Decision Support Technology On Line Analytical Processing Star Schema
More informationChapter 27 Introduction to Information Retrieval and Web Search
Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval
More information: How does DSS data differ from operational data?
by Daniel J Power Editor, DSSResources.com Decision support data used for analytics and data-driven DSS is related to past actions and intentions. The data is a historical record and the scale of data
More informationExtracting knowledge from Ontology using Jena for Semantic Web
Extracting knowledge from Ontology using Jena for Semantic Web Ayesha Ameen I.T Department Deccan College of Engineering and Technology Hyderabad A.P, India ameenayesha@gmail.com Khaleel Ur Rahman Khan
More informationData Mining. Data warehousing. Hamid Beigy. Sharif University of Technology. Fall 1394
Data Mining Data warehousing Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 31 Table of contents 1 Introduction 2 Data warehousing
More informationEvolution of Database Systems
Evolution of Database Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Intelligent Decision Support Systems Master studies, second
More informationText Mining: A Burgeoning technology for knowledge extraction
Text Mining: A Burgeoning technology for knowledge extraction 1 Anshika Singh, 2 Dr. Udayan Ghosh 1 HCL Technologies Ltd., Noida, 2 University School of Information &Communication Technology, Dwarka, Delhi.
More informationOLAP Introduction and Overview
1 CHAPTER 1 OLAP Introduction and Overview What Is OLAP? 1 Data Storage and Access 1 Benefits of OLAP 2 What Is a Cube? 2 Understanding the Cube Structure 3 What Is SAS OLAP Server? 3 About Cube Metadata
More informationData Warehouse Testing. By: Rakesh Kumar Sharma
Data Warehouse Testing By: Rakesh Kumar Sharma Index...2 Introduction...3 About Data Warehouse...3 Data Warehouse definition...3 Testing Process for Data warehouse:...3 Requirements Testing :...3 Unit
More informationRETRACTED ARTICLE. Web-Based Data Mining in System Design and Implementation. Open Access. Jianhu Gong 1* and Jianzhi Gong 2
Send Orders for Reprints to reprints@benthamscience.ae The Open Automation and Control Systems Journal, 2014, 6, 1907-1911 1907 Web-Based Data Mining in System Design and Implementation Open Access Jianhu
More informationAdvantages of UML for Multidimensional Modeling
Advantages of UML for Multidimensional Modeling Sergio Luján-Mora (slujan@dlsi.ua.es) Juan Trujillo (jtrujillo@dlsi.ua.es) Department of Software and Computing Systems University of Alicante (Spain) Panos
More informationImplementing and Maintaining Microsoft SQL Server 2008 Analysis Services
Course 6234A: Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services Course Details Course Outline Module 1: Introduction to Microsoft SQL Server Analysis Services This module introduces
More informationBasics of Dimensional Modeling
Basics of Dimensional Modeling Data warehouse and OLAP tools are based on a dimensional data model. A dimensional model is based on dimensions, facts, cubes, and schemas such as star and snowflake. Dimension
More informationMining User - Aware Rare Sequential Topic Pattern in Document Streams
Mining User - Aware Rare Sequential Topic Pattern in Document Streams A.Mary Assistant Professor, Department of Computer Science And Engineering Alpha College Of Engineering, Thirumazhisai, Tamil Nadu,
More informationMOLAP Data Warehouse of a Software Products Servicing Call Center
MOLAP Data Warehouse of a Software Products Servicing Call Center Z. Kazi, B. Radulovic, D. Radovanovic and Lj. Kazi Technical faculty "Mihajlo Pupin" University of Novi Sad Complete Address: Technical
More informationNews Filtering and Summarization System Architecture for Recognition and Summarization of News Pages
Bonfring International Journal of Data Mining, Vol. 7, No. 2, May 2017 11 News Filtering and Summarization System Architecture for Recognition and Summarization of News Pages Bamber and Micah Jason Abstract---
More informationKnowledge Engineering with Semantic Web Technologies
This file is licensed under the Creative Commons Attribution-NonCommercial 3.0 (CC BY-NC 3.0) Knowledge Engineering with Semantic Web Technologies Lecture 5: Ontological Engineering 5.3 Ontology Learning
More informationAn Overview of Data Warehousing and OLAP Technology
An Overview of Data Warehousing and OLAP Technology CMPT 843 Karanjit Singh Tiwana 1 Intro and Architecture 2 What is Data Warehouse? Subject-oriented, integrated, time varying, non-volatile collection
More informationEIAW: Towards a Business-Friendly Data Warehouse Using Semantic Web Technologies
EIAW: Towards a Business-Friendly Data Warehouse Using Semantic Web Technologies Guotong Xie 1, Yang Yang 1, Shengping Liu 1, Zhaoming Qiu 1, Yue Pan 1, and Xiongzhi Zhou 2 1 IBM China Research Laboratory
More informationDatabase Modeling And Design The Fundamental Principles The Morgan Kaufmann Series In Data Management Systems
Database Modeling And Design The Fundamental Principles The Morgan Kaufmann Series In Data Management We have made it easy for you to find a PDF Ebooks without any digging. And by having access to our
More informationOntology based Model and Procedure Creation for Topic Analysis in Chinese Language
Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language Dong Han and Kilian Stoffel Information Management Institute, University of Neuchâtel Pierre-à-Mazel 7, CH-2000 Neuchâtel,
More informationCHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI
CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS Assist. Prof. Dr. Volkan TUNALI Topics 2 Business Intelligence (BI) Decision Support System (DSS) Data Warehouse Online Analytical Processing (OLAP)
More informationDYNAMIC FOAF MANAGEMENT METHOD FOR SOCIAL NETWORKS IN THE SOCIAL WEB ENVIRONMENT
DYNAMIC FOAF MANAGEMENT METHOD FOR SOCIAL NETWORKS IN THE SOCIAL WEB ENVIRONMENT Jong-Soo Sohn and In-Jeong Chung Department of Computer and Information Science Korea University Republic of Korea Abstract
More informationCS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University
CS377: Database Systems Data Warehouse and Data Mining Li Xiong Department of Mathematics and Computer Science Emory University 1 1960s: Evolution of Database Technology Data collection, database creation,
More informationDomain-specific Concept-based Information Retrieval System
Domain-specific Concept-based Information Retrieval System L. Shen 1, Y. K. Lim 1, H. T. Loh 2 1 Design Technology Institute Ltd, National University of Singapore, Singapore 2 Department of Mechanical
More informationA BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK
A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK Qing Guo 1, 2 1 Nanyang Technological University, Singapore 2 SAP Innovation Center Network,Singapore ABSTRACT Literature review is part of scientific
More informationA Data Warehouse Implementation Using the Star Schema. For an outpatient hospital information system
A Data Warehouse Implementation Using the Star Schema For an outpatient hospital information system GurvinderKaurJosan Master of Computer Application,YMT College of Management Kharghar, Navi Mumbai ---------------------------------------------------------------------***----------------------------------------------------------------
More informationDATA MINING AND WAREHOUSING
DATA MINING AND WAREHOUSING Qno Question Answer 1 Define data warehouse? Data warehouse is a subject oriented, integrated, time-variant, and nonvolatile collection of data that supports management's decision-making
More informationData Mining and Warehousing
Data Mining and Warehousing Sangeetha K V I st MCA Adhiyamaan College of Engineering, Hosur-635109. E-mail:veerasangee1989@gmail.com Rajeshwari P I st MCA Adhiyamaan College of Engineering, Hosur-635109.
More informationLectures for the course: Data Warehousing and Data Mining (IT 60107)
Lectures for the course: Data Warehousing and Data Mining (IT 60107) Week 1 Lecture 1 21/07/2011 Introduction to the course Pre-requisite Expectations Evaluation Guideline Term Paper and Term Project Guideline
More informationData Warehousing and Decision Support. Introduction. Three Complementary Trends. [R&G] Chapter 23, Part A
Data Warehousing and Decision Support [R&G] Chapter 23, Part A CS 432 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful patterns and support business
More informationParmenides. Semi-automatic. Ontology. construction and maintenance. Ontology. Document convertor/basic processing. Linguistic. Background knowledge
Discover hidden information from your texts! Information overload is a well known issue in the knowledge industry. At the same time most of this information becomes available in natural language which
More informationTDWI Data Modeling. Data Analysis and Design for BI and Data Warehousing Systems
Data Analysis and Design for BI and Data Warehousing Systems Previews of TDWI course books offer an opportunity to see the quality of our material and help you to select the courses that best fit your
More informationInference in Hierarchical Multidimensional Space
Proc. International Conference on Data Technologies and Applications (DATA 2012), Rome, Italy, 25-27 July 2012, 70-76 Related papers: http://conceptoriented.org/ Inference in Hierarchical Multidimensional
More informationSemantic Web Mining and its application in Human Resource Management
International Journal of Computer Science & Management Studies, Vol. 11, Issue 02, August 2011 60 Semantic Web Mining and its application in Human Resource Management Ridhika Malik 1, Kunjana Vasudev 2
More information20466C - Version: 1. Implementing Data Models and Reports with Microsoft SQL Server
20466C - Version: 1 Implementing Data Models and Reports with Microsoft SQL Server Implementing Data Models and Reports with Microsoft SQL Server 20466C - Version: 1 5 days Course Description: The focus
More informationAdnan YAZICI Computer Engineering Department
Data Warehouse Adnan YAZICI Computer Engineering Department Middle East Technical University, A.Yazici, 2010 Definition A data warehouse is a subject-oriented integrated time-variant nonvolatile collection
More informationIJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 06, 2016 ISSN (online):
IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 06, 2016 ISSN (online): 2321-0613 Tanzeela Khanam 1 Pravin S.Metkewar 2 1 Student 2 Associate Professor 1,2 SICSR, affiliated
More informationInternational Journal of Advance Engineering and Research Development. Survey of Web Usage Mining Techniques for Web-based Recommendations
Scientific Journal of Impact Factor (SJIF): 5.71 International Journal of Advance Engineering and Research Development Volume 5, Issue 02, February -2018 e-issn (O): 2348-4470 p-issn (P): 2348-6406 Survey
More informationInformation Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 05(b) : 23/10/2012 Data Mining: Concepts and Techniques (3 rd ed.) Chapter
More informationData Warehousing. Overview
Data Warehousing Overview Basic Definitions Normalization Entity Relationship Diagrams (ERDs) Normal Forms Many to Many relationships Warehouse Considerations Dimension Tables Fact Tables Star Schema Snowflake
More informationData Modeling and Databases Ch 7: Schemas. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich
Data Modeling and Databases Ch 7: Schemas Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich Database schema A Database Schema captures: The concepts represented Their attributes
More informationSK International Journal of Multidisciplinary Research Hub Research Article / Survey Paper / Case Study Published By: SK Publisher
ISSN: 2394 3122 (Online) Volume 2, Issue 1, January 2015 Research Article / Survey Paper / Case Study Published By: SK Publisher P. Elamathi 1 M.Phil. Full Time Research Scholar Vivekanandha College of
More informationPRIS at TAC2012 KBP Track
PRIS at TAC2012 KBP Track Yan Li, Sijia Chen, Zhihua Zhou, Jie Yin, Hao Luo, Liyin Hong, Weiran Xu, Guang Chen, Jun Guo School of Information and Communication Engineering Beijing University of Posts and
More informationCHAPTER 3 Implementation of Data warehouse in Data Mining
CHAPTER 3 Implementation of Data warehouse in Data Mining 3.1 Introduction to Data Warehousing A data warehouse is storage of convenient, consistent, complete and consolidated data, which is collected
More informationA MAS Based ETL Approach for Complex Data
A MAS Based ETL Approach for Complex Data O. Boussaid, F. Bentayeb, J. Darmont Abstract : In a data warehousing process, the phase of data integration is crucial. Many methods for data integration have
More informationA Star Schema Has One To Many Relationship Between A Dimension And Fact Table
A Star Schema Has One To Many Relationship Between A Dimension And Fact Table Many organizations implement star and snowflake schema data warehouse The fact table has foreign key relationships to one or
More informationData transfer, storage and analysis for data mart enlargement
Data transfer, storage and analysis for data mart enlargement PROKOPOVA ZDENKA, SILHAVY PETR, SILHAVY RADEK Department of Computer and Communication Systems Faculty of Applied Informatics Tomas Bata University
More informationDATA WAREHOUSE MANAGEMENT SYSTEM A CASE STUDY
DATA WAREHOUSE MANAGEMENT SYSTEM A CASE STUDY DARKO KRULJ Trizon Group, Belgrade, Serbia and Montenegro. MILUTIN CUPIC MILAN MARTIC MILIJA SUKNOVIC Faculty of Organizational Science, University of Belgrade,
More informationComplete. The. Reference. Christopher Adamson. Mc Grauu. LlLIJBB. New York Chicago. San Francisco Lisbon London Madrid Mexico City
The Complete Reference Christopher Adamson Mc Grauu LlLIJBB New York Chicago San Francisco Lisbon London Madrid Mexico City Milan New Delhi San Juan Seoul Singapore Sydney Toronto Contents Acknowledgments
More informationTEXT PREPROCESSING FOR TEXT MINING USING SIDE INFORMATION
TEXT PREPROCESSING FOR TEXT MINING USING SIDE INFORMATION Ms. Nikita P.Katariya 1, Prof. M. S. Chaudhari 2 1 Dept. of Computer Science & Engg, P.B.C.E., Nagpur, India, nikitakatariya@yahoo.com 2 Dept.
More informationInformation Retrieval
Information Retrieval CSC 375, Fall 2016 An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have
More informationIT DATA WAREHOUSING AND DATA MINING UNIT-2 BUSINESS ANALYSIS
PART A 1. What are production reporting tools? Give examples. (May/June 2013) Production reporting tools will let companies generate regular operational reports or support high-volume batch jobs. Such
More informationMetadata. Data Warehouse
A DECISION SUPPORT PROTOTYPE FOR THE KANSAS STATE UNIVERSITY LIBRARIES Maria Zamr Bleyberg, Raghunath Mysore, Dongsheng Zhu, Radhika Bodapatla Computing and Information Sciences Department Karen Cole,
More informationData Warehousing and Decision Support
Data Warehousing and Decision Support Chapter 23, Part A Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1 Introduction Increasingly, organizations are analyzing current and historical
More informationSemantic Annotation of Web Resources Using IdentityRank and Wikipedia
Semantic Annotation of Web Resources Using IdentityRank and Wikipedia Norberto Fernández, José M.Blázquez, Luis Sánchez, and Vicente Luque Telematic Engineering Department. Carlos III University of Madrid
More informationData Modeling Online Training
Data Modeling Online Training IQ Online training facility offers Data Modeling online training by trainers who have expert knowledge in the Data Modeling and proven record of training hundreds of students.
More informationFull file at
Chapter 2 Data Warehousing True-False Questions 1. A real-time, enterprise-level data warehouse combined with a strategy for its use in decision support can leverage data to provide massive financial benefits
More informationData Warehouse and Data Mining
Data Warehouse and Data Mining Lecture No. 03 Architecture of DW Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro Basic
More informationData Warehousing. Data Warehousing and Mining. Lecture 8. by Hossen Asiful Mustafa
Data Warehousing Data Warehousing and Mining Lecture 8 by Hossen Asiful Mustafa Databases Databases are developed on the IDEA that DATA is one of the critical materials of the Information Age Information,
More informationA Semantic Model for Concept Based Clustering
A Semantic Model for Concept Based Clustering S.Saranya 1, S.Logeswari 2 PG Scholar, Dept. of CSE, Bannari Amman Institute of Technology, Sathyamangalam, Tamilnadu, India 1 Associate Professor, Dept. of
More informationData Warehouse Design Using Row and Column Data Distribution
Int'l Conf. Information and Knowledge Engineering IKE'15 55 Data Warehouse Design Using Row and Column Data Distribution Behrooz Seyed-Abbassi and Vivekanand Madesi School of Computing, University of North
More informationABD - Database Administration
Coordinating unit: 270 - FIB - Barcelona School of Informatics Teaching unit: 747 - ESSI - Department of Service and Information System Engineering Academic year: Degree: 2017 BACHELOR'S DEGREE IN INFORMATICS
More informationThe Study on Data Warehouse and Data Mining for Birth Registration System of the Surat City
The Study on Data Warehouse and Data Mining for Birth Registration System of the Surat City Pushpal Desai M.Sc.(I.T.) Programme Veer Narmad South Gujarat University Surat, India. Desai Apurva Department
More informationData Warehousing and Decision Support
Data Warehousing and Decision Support [R&G] Chapter 23, Part A CS 4320 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful patterns and support business
More informationData Warehouses Chapter 12. Class 10: Data Warehouses 1
Data Warehouses Chapter 12 Class 10: Data Warehouses 1 OLTP vs OLAP Operational Database: a database designed to support the day today transactions of an organization Data Warehouse: historical data is
More informationData Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20
Data Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20 Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke, Chapter 25 Introduction Increasingly,
More information