Automatic Metadata Analysis for Environmental Information Systems

Size: px
Start display at page:

Download "Automatic Metadata Analysis for Environmental Information Systems"

Transcription

1 1 Automatic Metadata Analysis for Environmental Information Systems Jens Hartmann 1 and Heiner Stuckenschmidt 2 Abstract Metadata plays an important role in web-based environmental information systems (EIS). It structures existing information and provides background information about technical issues as well as the context in which information has been generated or should be interpreted. Further, many systems such as the BUISY (Bremer Umweltinformationssystem) use metadata in order to provide content-based search facilities. Such methods, however, depend on the correctness and completeness of the metadata. In this paper, we discuss an approach for automatically analyzing the metadata of web-based information systems that is based on machine learning techniques and its application to different environmental systems. We analyze three web-based EIS and discuss our results. Introduction The importance of metadata in the context of managing and accessing environmental information has been widely recognized and is witnessed by a number of publications at previous conferences on computer science in environmental protection. Attention has been drawn towards metadata standards, the creation of metadata, metadata based access of information, and metadata analysis and maintenance. In connection with the introduction of web-based information systems, the prominent role of metadata has been recognized very early (Crossley 1994). In modern web-based information systems, metadata is no longer just an addition to the actual information, but it plays an active role in the functionality of the system. An example is the BUISY system, that uses metadata annotations in order to provide a content-based search facility (Voegele et al. 2000). This new role of metadata as part of the systems functionality makes the need for ensuring correctness and completeness of metadata even more vital. At the same time, metadata validation becomes more difficult in a web-based context as it usually appears in terms of annotations on individual web pages thus disabling previous validation approaches 1 Center for Computing Technologies, University of Bremen, jhart@tzi.de 2 AI Department, Vrije Universiteit Amsterdam, heiner@cs.vu.nl

2 2 that relied on a centralized access to metadata in terms of a database (Voigt et al. 1999). We argue that there is a need for supporting the analysis of metadata that indirectly contained on web pages in terms of annotations in some markup language such as HTML, XML or RDF. Currently, we focus on a special form of content related metadata, known as web page categorization. Here, the task is to assign the pages of a web site to a set of predefined object classes as they are used in well known metadata repositories such as the UDK (Swoboda et al. 2000). Unlike in the case of the UDK, object classes do not refer to the type of the information source, but to the subject area of a page such as air pollution or water protection. Based on a representation of HTML and XML documents as a Document Object Model (DOM) we use a formal representation to describe the logical structure of a document. We then identify structural patterns that carry content related information, in our case metadata structured that occur on web pages in terms of HTML meta tags. The structural descriptions of web pages are divided into a training- and a test set. Using the inductive logic programming system Progol (Muggleton 1995) we can now generate classification rules that relate structures on web pages to different topic areas in an environmental system. These classifiers consist of logical rules describing what all pages of a category have in common (Stuckenschmidt et al. 2002). For example we discovered the following rule for the environmental information system of Bavaria that classifies pages from the area of waste management: document(a) :- relation(a,b), metatag(b,keywords,abfall). The rule requires that all pages on this topic have a link to other pages that contains the word abfall in its keyword list. In the same way we can also analyze other properties of a web page that may contain information about its content. The result of this rule generation process provides us with a means for assessing the quality of the metadata in the system, because well-designed metadata should clearly identify the subject of a web page at least at an abstract level. This could be done by assigning keywords to pages as in the example above or by directly linking a page to a subject area. In the presence of such meta information pur learning approach should be able to detect the corresponding pattern in the page structure with an accuracy of 100 percent. Any learning result with a lower accuracy is an indicator for missing, false or badly designed metadata. In the following, we first describe the our validation approach in more details. We then summarize the results of applying the result to three web-based environmental information systems in Germany and Austria. We interpret the result of the experiments by speculating about the origin of misclassifications.

3 3 The Approach The analysis of metadata used in Environmental Information Systems (EIS) utilizes structural rules which describe common metadata of a given category, e.g. the category of waste management. The process of generating such structural rules is represented as a knowledge discovery process, which can be separated into the following five steps (based on Chang et al. 2001). 1. Data Cleaning 2. Data Transformation 3. Data Reduction 4. Data Mining 5. Knowledge Representation The first step of our approach consists of a cleaning of noisy or inconsistent data in the documents, since we are using documents of different EIS which are applicable on the World Wide Web (WWW). The next step is a transformation of possible different document formats into one defined format for further processing. Therefore, we represent HTML and XML documents as a Document Object Model (DOM) and use a formal representation to describe the logical structure of a document. A small set of predicates has been capably defined to express structural elements. This representation can be applied to known and even unknown documents. As data reduction, information is represented in a more general way than it occurs in the original data. The developed generalization process primarily includes a generalization of text and single words. In general, all words are depicted as lowercase alphanumerics. Each word is represented with one predicate. To illustrate, a document title Welcome to MY Homepage!! would be translated as a set of four predicates {welcome, to, my, homepage}. The Syntactical Transformation incorporates several pre-processing steps, which verify the syntactical structure of the desired documents after XHTML standardization. In general we defined a General Transformation based on a generalization of text manifest in these documents. The transformation process is based on the Document Object Model (DOM) representation, which is traversed by the developed software. Declared document structures are extracted and represented in this formal way. Therefore we appropriate PROLOG syntax. For the identification of general regularities (data mining) usable as a classifier, the generated sets of formal clauses are used as input to the Inductive Logic Programming (ILP) system Progol. Given potentially available background knowledge (BK) and a set of positive and negative examples Progol generates a hypothesis, which explains the positive examples and the BK. This rule is then applicable as a general classifier of each document class. The generated rules are stored in a separate file and are used as background-knowledge for further classification tasks.

4 4 Learning Metadata Classifiers Metadata offers an expressive framework for analyzing documents of web-based information systems. This data yields different aspects of information. Metadata is comprised of information about such technical issues as access methods or processing instructions, as well as information about such document content as intended uses or author information. The study made by (Yang et al. 2002) shows in detail the importance of well structured metadata for classification tasks. Generally, metadata can be expressed syntactically by so called Meta Tags. We represent metadata by means of the metatag/3 predicate, which is defined as follows: metatag(i, N, C) descendant(d,i) structure(i, meta) attribute(i,q) attribute(i, W) structure(q, x ) value(q,n) structure(w, y ) value(w,c). where x {http-equiv, name} and y {content}. The values of the attributes N and C are confined by the weak generalization. Further, every single word is associated with one predicate. The number of (metatag) predicates for a document is consequently: n i= 1 p * v i where n is the number of metatags in a document, p is the number of elements in {http-equiv, name} (typically p=1) and v is the cardinality of {content}. For the generation of structural metadata classifiers we use Inductive Logic Programming (ILP), identifiable as an intersection of machine learning and logic programming (Muggleton 1999). In general, ILP concerns the generation of a hypothesis H describing a set of examples E (partitioned into a set of positives E + and negatives E - ) and given BK. In specific, the normal semantics of ILP can be formalized as follows: 1. B E - 2. B E + 3. B H E - 4. B H = E + The normal semantics of ILP demands that (1) the BK be consistent w.r.t the negative examples (prior satisfiability) and that (2) in necessary learning processes the BK does not already explain positive examples (prior necessity). Furthermore, the completeness of the learned hypothesis is given when it (3) is consistent w.r.t the negative examples (posterior satisfiability) and likewise (4) explains all positive examples (posterior sufficiency). i

5 5 Using ILP for data mining processing it is possible to discover relational regularities among training sets, undetectable with classical attribute-value learners. To illustrate, we present the following rule: document(a) :- relation(a,b), relation(b,c), doctitle(c,abfall). This rule precisely classifies all data set documents. Lacking these relational descriptions, one attains accuracy of but 80,30%. In summary, learning relations and relational regularities among documents predictably increases the accuracy of learned classifiers (Hartmann 2002). Experiments To evaluate the developed approach we used three web-based environmental information systems. The systems show a similarity between their data structure and a comparable number of documents. All documents have been automatically downloaded by the web downloader wget. We applied our approach to validate the metadata of the following three EIS: Bremen: Vienna: Bavaria: We analyzed topic areas that normally structure information within these systems (waste management, soil protection, nature conservation, air- and water pollution) and sorted pages into these topic areas based on their metadata. In general, this is theoretically perfectly accurate for EIS maintaining content-related metadata. Environment evaluation is determined by classifier accuracy, as calculated with the following rule: P( A) + P( A) P( A) + P( A) + P( A) + P( A) where P(A) provides the number of correctly classified positive examples and P(A) the number of positive examples classified as negatives. Analogously, P( A) indicates the number of correctly classified negatives and P( A) the number of negatives classified as positives. The experiment results of Table 1 reveal the well-designed metadata infrastructure in all analyzed systems, excluding portions of the Bavarian system. We classified a majority of tags from meta information provided by these systems.

6 6 Kategorie P(A) P(A) P( A) P( A) Acc. BUISY Abfall BUISY Boden BUISY Luft BUISY Natur ,41 BUISY Wasser ,25 BUISY Gesamt 98,53 Ubavie Abfall Ubavie Boden Ubavie Luft Ubavie Natur Ubavie Wasser Ubavie Gesamt 100 Bayern Abfall Bayern Boden Bayern Luft ,33 Bayern Natur Bayern Wasser Bayern Gesamt 58,67 Total 85,73 Table 1: Classification Results The learning process benefits clients in the ability to identify non-obvious relevant classification criteria. For example, our approach generated 3 the following rules: document(a) :- metatag(a,author,'zdl30-13'). document(a) :- metatag(a,bereich,naturschutz). document(a) :- relation(a,b), metatag(b,keywords,abfall). document(a) :- relation(a,b), metatag(b,keywords,bodenschutz). Interpretation of Results The results of the analysis process give us some insight in the status of the metadata annotations in the different systems and even allow us to speculate about the system itself and the way it uses metadata. In the case of the EIS of the City of Vienna for example, it is quite obvious that the metadata annotations are automatically generated as we get an accuracy of 100% for all subject classes. In the case of the BUISY 3 Note, the rules presented here comprise a subset of all experiments.

7 7 system, we see a situation, where the accuracy is very close to 100%, but we also found some mismatches. This observation can be explained by the development process of the system, which was originally designed in a research project and was then handed over to the federal administration. In the research project, metadata annotations were added automatically using a special software tool. After the system was transferred to the administration, obviously new pages were added, some of which do not contain proper metadata, thus leading to a sub-optimal classification result. For the EIS of the federal state of Bavaria, the situation is even more complex. As table 1 shows, there are some subjects where a classifier could be generated with an accuracy of 100%, for other subject area, however, no classification rule could be found at all. This observation can be explained by the fact that at the time of our experiments the system was in the process of being re-designed. Parts of the pages were already properly annotated with metadata, while other parts were not. By now this process is completed and all pages are annotated. If we would redo the experiments now, there would be a result very close to 100% accuracy for all classes in the systems. Discussion In this paper we presented an approach for automatic metadata analysis in environmental information systems. The approach can be applied to web-based environmental systems with content-related metadata. We presented a generation of classifiers as a knowledge-based process with several pre-processing steps, such as cleaning, reduction and transformation of the desired data. We argued that ILP usage is necessary to discover relational regularities among a set of documents. The process benefits clients in the ability to identify imperceptible relevant classification criteria. We presented non-obvious results of learned classifiers for three environmental systems. These results broadly apply to administration and management of webbased information systems. Processing information from web-based information systems we consider noisy, potentially erroneous data; presently, the pre-processing performs this error handling. However, this is still an open problem which requires additional work. For further reference, enhancing this approach to analyse additional structures of web-documents is described in (Hartmann 2002). Bibliography Chang, G., Healey, M.J., McHugh, J., Wang, J. (2001): Mining the World Wide Web An Information Search Approach. Kluwer Academic Publishers.

8 8 Crossley, D. (1994): WAIS through the Web - Discovering Environmental Information. presented at the Second International WWW Conference (WWW Fall 94) Mosaic and the Web - Chicago, USA (17-20 October, 1994). Hartmann, J. (2002): Lernen struktureller Regeln zur Klassifikation von Web-Dokumenten. Diplomarbeit, Universität Bremen, TZI. Muggleton, S. (1995): Inverse Entailment and Progol. In New Generation Computing, Special issue on Inductive Logic Programming, vol. 13, p , Ohmsha. Muggleton, S. (1999): Inductive Logic Programming. The MIT Encyclopedia of the Cognitive Sciences (MITECS), MIT Press. Stuckenschmidt, H., Hartmann, J., Harmelen, F. van (2002): Learning Structural Classification Rules for Web-Page Categorization. Accepted for Special Track on the Semantic Web at Flairs 2002, Pensacola, Florida. Swoboda, W., Kruse, F., Legat, R., Nikolai, R. und Behrens, S. (2000): Harmonisierter Zugang zu Umweltinformationen für Öffentlichkeit, Politik und Planung: Der Umweltdatenkatalog UDK im Einsatz. In Armin B. Cremers, Klaus Greve (Hrsg.) Computer Science for Environmental Protection '00 Environmental Information for Planung, Politics and the Public, Metropolis, Marburg. Voegele, T., Stuckenschmidt, H., Visser, U. (2000): BUISY Using Brokered Data Objects in Environmental Information Systems. In Wolf-Fritz Riekert, Klaus Tochtermann (Hrsg.) Hypermedia im Umweltschutz 3. Workshop, Ulm 2000, Metropolis, Marburg. Voigt, K., Welzl, G., Rediske, G. (1999): Datenanalyse von umweltrelevanten Metadatenbanken. In Claus Rautenstrauch, Michael Schenk (Hrsg.) Umweltinformatik 99 - Umweltinformatik zwischen Theorie und Industrieanwendung 13. Internationales Symposium "Informatik für den Umweltschutz",Metropolis Verlag Marburg. Yang, Y., Slattery, S., Ghani, R. (2002): A study of approaches to hypertext categorization. In Journal of Intelligent Information Systems. Kluwer Academic Press.

Generating and Managing Metadata for Web-Based Information Systems

Generating and Managing Metadata for Web-Based Information Systems Generating and Managing Metadata for Web-Based Information Systems Heiner Stuckenschmidt and Frank van Harmelen Department of Mathematics and Computer Science Vrije Universiteit Amsterdam De Boelelaan

More information

Knowledge-Based Validation, Aggregation and Visualization of Meta-Data: Analyzing a Web-Based Information System

Knowledge-Based Validation, Aggregation and Visualization of Meta-Data: Analyzing a Web-Based Information System Knowledge-Based Validation, Aggregation and Visualization of Meta-Data: Analyzing a Web-Based Information System Heiner Stuckenschmidt 1 and Frank van Harmelen 2,3 1 Center for Computing Technologies,

More information

for applying Inductive Logic Programming (ILP) to Rule Relaxation for the Generation of Metadata

for applying Inductive Logic Programming (ILP) to Rule Relaxation for the Generation of Metadata Applying Inductive Logic Programming and Rule Relaxation for the Generation of Metadata Andreas D. Lattner and Jan D. Gehrke TZI Center for Computing Technologies Universität Bremen, PO Box 330 0, 2833

More information

Sharing Environmental Data with gein

Sharing Environmental Data with gein EnviroInfo 2004 (Geneva) Sh@ring EnviroInfo 2004 Sharing Environmental Data with gein Thomas Vögele 1, Fred Kruse 1, Oliver Karschnik 2 Abstract In response to EU Directive 2003/4/EC on Public Access to

More information

Outlook-Based Concept for the Population and Updating of a Meta-Information System in Environmental Administration

Outlook-Based Concept for the Population and Updating of a Meta-Information System in Environmental Administration Outlook-Based Concept for the Population and Updating of a Meta-Information System in Environmental Administration P. Fischer-Stabel a, R. Krieger a and E.Rietzke b a Umwelt-Campus, University of Applied

More information

Overview of Web Mining Techniques and its Application towards Web

Overview of Web Mining Techniques and its Application towards Web Overview of Web Mining Techniques and its Application towards Web *Prof.Pooja Mehta Abstract The World Wide Web (WWW) acts as an interactive and popular way to transfer information. Due to the enormous

More information

VISO: A Shared, Formal Knowledge Base as a Foundation for Semi-automatic InfoVis Systems

VISO: A Shared, Formal Knowledge Base as a Foundation for Semi-automatic InfoVis Systems VISO: A Shared, Formal Knowledge Base as a Foundation for Semi-automatic InfoVis Systems Jan Polowinski Martin Voigt Technische Universität DresdenTechnische Universität Dresden 01062 Dresden, Germany

More information

Intelligent Brokering of Environmental Information with the BUSTER System

Intelligent Brokering of Environmental Information with the BUSTER System 1 Intelligent Brokering of Environmental Information with the BUSTER System H. Neumann, G. Schuster, H. Stuckenschmidt, U. Visser, T. Vögele and H. Wache 1 Abstract In this paper we discuss the general

More information

A Comparison of Text-Categorization Methods applied to N-Gram Frequency Statistics

A Comparison of Text-Categorization Methods applied to N-Gram Frequency Statistics A Comparison of Text-Categorization Methods applied to N-Gram Frequency Statistics Helmut Berger and Dieter Merkl 2 Faculty of Information Technology, University of Technology, Sydney, NSW, Australia hberger@it.uts.edu.au

More information

Fiona A Tool to Analyze Interacting Open Nets

Fiona A Tool to Analyze Interacting Open Nets Fiona A Tool to Analyze Interacting Open Nets Peter Massuthe and Daniela Weinberg Humboldt Universität zu Berlin, Institut für Informatik Unter den Linden 6, 10099 Berlin, Germany {massuthe,weinberg}@informatik.hu-berlin.de

More information

Knowledge-Based Validation, Aggregation and Visualization of Metadata: Analyzing a Web-Based Information System

Knowledge-Based Validation, Aggregation and Visualization of Metadata: Analyzing a Web-Based Information System Seite 1 von 24 Knowledge-Based Validation, Aggregation and Visualization of Metadata: Analyzing a Web-Based Information System Heiner Stuckenschmidt Center for Computing Technologies, University of Bremen

More information

Adaptable and Adaptive Web Information Systems. Lecture 1: Introduction

Adaptable and Adaptive Web Information Systems. Lecture 1: Introduction Adaptable and Adaptive Web Information Systems School of Computer Science and Information Systems Birkbeck College University of London Lecture 1: Introduction George Magoulas gmagoulas@dcs.bbk.ac.uk October

More information

Text Mining. Representation of Text Documents

Text Mining. Representation of Text Documents Data Mining is typically concerned with the detection of patterns in numeric data, but very often important (e.g., critical to business) information is stored in the form of text. Unlike numeric data,

More information

Using Electronic Document Repositories (EDR) for Collaboration A first definition of EDR and technical implementation

Using Electronic Document Repositories (EDR) for Collaboration A first definition of EDR and technical implementation Using Electronic Document Repositories (EDR) for Collaboration A first definition of EDR and technical implementation Hilda Tellioglu Vienna University of Technology, Department for CSCW Argentinierstrasse

More information

Environmental Markup Language (EML): A Material and Energy Balancing XML Schema Definition

Environmental Markup Language (EML): A Material and Energy Balancing XML Schema Definition EnviroInfo 2008 (Lüneburg) Environmental Informatics and Industrial Ecology Environmental Markup Language (EML): A Material and Energy Balancing XML Schema Definition Hans-Knud Arndt, Henner Graubitz and

More information

Interoperability in GIS Enabling Technologies

Interoperability in GIS Enabling Technologies Interoperability in GIS Enabling Technologies Ubbo Visser, Heiner Stuckenschmidt, Christoph Schlieder TZI, Center for Computing Technologies University of Bremen D-28359 Bremen, Germany {visser heiner

More information

Towards Rule Learning Approaches to Instance-based Ontology Matching

Towards Rule Learning Approaches to Instance-based Ontology Matching Towards Rule Learning Approaches to Instance-based Ontology Matching Frederik Janssen 1, Faraz Fallahi 2 Jan Noessner 3, and Heiko Paulheim 1 1 Knowledge Engineering Group, TU Darmstadt, Hochschulstrasse

More information

Inductive Logic Programming in Clementine

Inductive Logic Programming in Clementine Inductive Logic Programming in Clementine Sam Brewer 1 and Tom Khabaza 2 Advanced Data Mining Group, SPSS (UK) Ltd 1st Floor, St. Andrew s House, West Street Woking, Surrey GU21 1EB, UK 1 sbrewer@spss.com,

More information

Extracting knowledge from Ontology using Jena for Semantic Web

Extracting knowledge from Ontology using Jena for Semantic Web Extracting knowledge from Ontology using Jena for Semantic Web Ayesha Ameen I.T Department Deccan College of Engineering and Technology Hyderabad A.P, India ameenayesha@gmail.com Khaleel Ur Rahman Khan

More information

Automated Retrieval of Information in the Internet by Using Thesauri and Gazetteers as Knowledge Sources

Automated Retrieval of Information in the Internet by Using Thesauri and Gazetteers as Knowledge Sources Appeared in: K. Tochtermann and H. Maurer: Proceedings of I-KNOW'02-2nd Int. Conf. on Knowledge Management, Graz, Austria, July 11-12, 2002. J.UCS, Vol. 8, Issue 5, 2002. Automated Retrieval of Information

More information

I&R SYSTEMS ON THE INTERNET/INTRANET CITES AS THE TOOL FOR DISTANCE LEARNING. Andrii Donchenko

I&R SYSTEMS ON THE INTERNET/INTRANET CITES AS THE TOOL FOR DISTANCE LEARNING. Andrii Donchenko International Journal "Information Technologies and Knowledge" Vol.1 / 2007 293 I&R SYSTEMS ON THE INTERNET/INTRANET CITES AS THE TOOL FOR DISTANCE LEARNING Andrii Donchenko Abstract: This article considers

More information

Towards the Semantic Web

Towards the Semantic Web Towards the Semantic Web Ora Lassila Research Fellow, Nokia Research Center (Boston) Chief Scientist, Nokia Venture Partners LLP Advisory Board Member, W3C XML Finland, October 2002 1 NOKIA 10/27/02 -

More information

Information mining and information retrieval : methods and applications

Information mining and information retrieval : methods and applications Information mining and information retrieval : methods and applications J. Mothe, C. Chrisment Institut de Recherche en Informatique de Toulouse Université Paul Sabatier, 118 Route de Narbonne, 31062 Toulouse

More information

Context-based Navigational Support in Hypermedia

Context-based Navigational Support in Hypermedia Context-based Navigational Support in Hypermedia Sebastian Stober and Andreas Nürnberger Institut für Wissens- und Sprachverarbeitung, Fakultät für Informatik, Otto-von-Guericke-Universität Magdeburg,

More information

A Roadmap to an Enhanced Graph Based Data mining Approach for Multi-Relational Data mining

A Roadmap to an Enhanced Graph Based Data mining Approach for Multi-Relational Data mining A Roadmap to an Enhanced Graph Based Data mining Approach for Multi-Relational Data mining D.Kavinya 1 Student, Department of CSE, K.S.Rangasamy College of Technology, Tiruchengode, Tamil Nadu, India 1

More information

AUTOMATIC ACQUISITION OF DIGITIZED NEWSPAPERS VIA INTERNET

AUTOMATIC ACQUISITION OF DIGITIZED NEWSPAPERS VIA INTERNET AUTOMATIC ACQUISITION OF DIGITIZED NEWSPAPERS VIA INTERNET Ismael Sanz, Rafael Berlanga, María José Aramburu and Francisco Toledo Departament d'informàtica Campus Penyeta Roja, Universitat Jaume I, E-12071

More information

Development of an Ontology-Based Portal for Digital Archive Services

Development of an Ontology-Based Portal for Digital Archive Services Development of an Ontology-Based Portal for Digital Archive Services Ching-Long Yeh Department of Computer Science and Engineering Tatung University 40 Chungshan N. Rd. 3rd Sec. Taipei, 104, Taiwan chingyeh@cse.ttu.edu.tw

More information

FedX: A Federation Layer for Distributed Query Processing on Linked Open Data

FedX: A Federation Layer for Distributed Query Processing on Linked Open Data FedX: A Federation Layer for Distributed Query Processing on Linked Open Data Andreas Schwarte 1, Peter Haase 1,KatjaHose 2, Ralf Schenkel 2, and Michael Schmidt 1 1 fluid Operations AG, Walldorf, Germany

More information

CS229 Lecture notes. Raphael John Lamarre Townshend

CS229 Lecture notes. Raphael John Lamarre Townshend CS229 Lecture notes Raphael John Lamarre Townshend Decision Trees We now turn our attention to decision trees, a simple yet flexible class of algorithms. We will first consider the non-linear, region-based

More information

An Approach to Evaluate and Enhance the Retrieval of Web Services Based on Semantic Information

An Approach to Evaluate and Enhance the Retrieval of Web Services Based on Semantic Information An Approach to Evaluate and Enhance the Retrieval of Web Services Based on Semantic Information Stefan Schulte Multimedia Communications Lab (KOM) Technische Universität Darmstadt, Germany schulte@kom.tu-darmstadt.de

More information

Development of Prediction Model for Linked Data based on the Decision Tree for Track A, Task A1

Development of Prediction Model for Linked Data based on the Decision Tree for Track A, Task A1 Development of Prediction Model for Linked Data based on the Decision Tree for Track A, Task A1 Dongkyu Jeon and Wooju Kim Dept. of Information and Industrial Engineering, Yonsei University, Seoul, Korea

More information

RaDON Repair and Diagnosis in Ontology Networks

RaDON Repair and Diagnosis in Ontology Networks RaDON Repair and Diagnosis in Ontology Networks Qiu Ji, Peter Haase, Guilin Qi, Pascal Hitzler, and Steffen Stadtmüller Institute AIFB Universität Karlsruhe (TH), Germany {qiji,pha,gqi,phi}@aifb.uni-karlsruhe.de,

More information

Compilers Project Proposals

Compilers Project Proposals Compilers Project Proposals Dr. D.M. Akbar Hussain These proposals can serve just as a guide line text, it gives you a clear idea about what sort of work you will be doing in your projects. Still need

More information

Introduction to Information Systems

Introduction to Information Systems Table of Contents 1... 2 1.1 Introduction... 2 1.2 Architecture of Information systems... 2 1.3 Classification of Data Models... 4 1.4 Relational Data Model (Overview)... 8 1.5 Conclusion... 12 1 1.1 Introduction

More information

Category Theory in Ontology Research: Concrete Gain from an Abstract Approach

Category Theory in Ontology Research: Concrete Gain from an Abstract Approach Category Theory in Ontology Research: Concrete Gain from an Abstract Approach Markus Krötzsch Pascal Hitzler Marc Ehrig York Sure Institute AIFB, University of Karlsruhe, Germany; {mak,hitzler,ehrig,sure}@aifb.uni-karlsruhe.de

More information

Evaluating the Usefulness of Sentiment Information for Focused Crawlers

Evaluating the Usefulness of Sentiment Information for Focused Crawlers Evaluating the Usefulness of Sentiment Information for Focused Crawlers Tianjun Fu 1, Ahmed Abbasi 2, Daniel Zeng 1, Hsinchun Chen 1 University of Arizona 1, University of Wisconsin-Milwaukee 2 futj@email.arizona.edu,

More information

A Generic Transcoding Tool for Making Web Applications Adaptive

A Generic Transcoding Tool for Making Web Applications Adaptive A Generic Transcoding Tool for Making Applications Adaptive Zoltán Fiala 1, Geert-Jan Houben 2 1 Technische Universität Dresden Mommsenstr. 13, D-01062, Dresden, Germany zoltan.fiala@inf.tu-dresden.de

More information

RETRACTED ARTICLE. Web-Based Data Mining in System Design and Implementation. Open Access. Jianhu Gong 1* and Jianzhi Gong 2

RETRACTED ARTICLE. Web-Based Data Mining in System Design and Implementation. Open Access. Jianhu Gong 1* and Jianzhi Gong 2 Send Orders for Reprints to reprints@benthamscience.ae The Open Automation and Control Systems Journal, 2014, 6, 1907-1911 1907 Web-Based Data Mining in System Design and Implementation Open Access Jianhu

More information

Introduction to Data Mining and Data Analytics

Introduction to Data Mining and Data Analytics 1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns

More information

Information Agents for Competitive Market Monitoring in Production Chains

Information Agents for Competitive Market Monitoring in Production Chains Agents for Competitive Market Monitoring in Production Chains Gerhard Schiefer and Melanie Fritz University of Bonn, Business and Management e-mail: schiefer@uni-bonn.de m.fritz@uni-bonn.de Abstract The

More information

Mymory: Enhancing a Semantic Wiki with Context Annotations

Mymory: Enhancing a Semantic Wiki with Context Annotations Mymory: Enhancing a Semantic Wiki with Context Annotations Malte Kiesel, Sven Schwarz, Ludger van Elst, and Georg Buscher Knowledge Management Department German Research Center for Artificial Intelligence

More information

Inductive Logic Programming Using a MaxSAT Solver

Inductive Logic Programming Using a MaxSAT Solver Inductive Logic Programming Using a MaxSAT Solver Noriaki Chikara 1, Miyuki Koshimura 2, Hiroshi Fujita 2, and Ryuzo Hasegawa 2 1 National Institute of Technology, Tokuyama College, Gakuendai, Shunan,

More information

Automatic creation of mappings between classification systems for bibliographic data

Automatic creation of mappings between classification systems for bibliographic data Automatic creation of mappings between classification systems for bibliographic data Prof. Magnus Pfeffer Stuttgart Media University pfeffer@hdm-stuttgart.de Agenda Motivation Instance-based matching Current

More information

R07. FirstRanker. 7. a) What is text mining? Describe about basic measures for text retrieval. b) Briefly describe document cluster analysis.

R07. FirstRanker. 7. a) What is text mining? Describe about basic measures for text retrieval. b) Briefly describe document cluster analysis. www..com www..com Set No.1 1. a) What is data mining? Briefly explain the Knowledge discovery process. b) Explain the three-tier data warehouse architecture. 2. a) With an example, describe any two schema

More information

Semantic Web Lecture Part 1. Prof. Do van Thanh

Semantic Web Lecture Part 1. Prof. Do van Thanh Semantic Web Lecture Part 1 Prof. Do van Thanh Overview of the lecture Part 1 Why Semantic Web? Part 2 Semantic Web components: XML - XML Schema Part 3 - Semantic Web components: RDF RDF Schema Part 4

More information

Semantic Web Domain Knowledge Representation Using Software Engineering Modeling Technique

Semantic Web Domain Knowledge Representation Using Software Engineering Modeling Technique Semantic Web Domain Knowledge Representation Using Software Engineering Modeling Technique Minal Bhise DAIICT, Gandhinagar, Gujarat, India 382007 minal_bhise@daiict.ac.in Abstract. The semantic web offers

More information

Information System on Literature in the Field of ICT for Environmental Sustainability

Information System on Literature in the Field of ICT for Environmental Sustainability International Environmental Modelling and Software Society (iemss) 2010 International Congress on Environmental Modelling and Software Modelling for Environment s Sake, Fifth Biennial Meeting, Ottawa,

More information

INCONSISTENT DATABASES

INCONSISTENT DATABASES INCONSISTENT DATABASES Leopoldo Bertossi Carleton University, http://www.scs.carleton.ca/ bertossi SYNONYMS None DEFINITION An inconsistent database is a database instance that does not satisfy those integrity

More information

Part I Logic programming paradigm

Part I Logic programming paradigm Part I Logic programming paradigm 1 Logic programming and pure Prolog 1.1 Introduction 3 1.2 Syntax 4 1.3 The meaning of a program 7 1.4 Computing with equations 9 1.5 Prolog: the first steps 15 1.6 Two

More information

A Comparative Study of Data Mining Process Models (KDD, CRISP-DM and SEMMA)

A Comparative Study of Data Mining Process Models (KDD, CRISP-DM and SEMMA) International Journal of Innovation and Scientific Research ISSN 2351-8014 Vol. 12 No. 1 Nov. 2014, pp. 217-222 2014 Innovative Space of Scientific Research Journals http://www.ijisr.issr-journals.org/

More information

SDMX self-learning package No. 5 Student book. Metadata Structure Definition

SDMX self-learning package No. 5 Student book. Metadata Structure Definition No. 5 Student book Metadata Structure Definition Produced by Eurostat, Directorate B: Statistical Methodologies and Tools Unit B-5: Statistical Information Technologies Last update of content December

More information

The Semantic Web & Ontologies

The Semantic Web & Ontologies The Semantic Web & Ontologies Kwenton Bellette The semantic web is an extension of the current web that will allow users to find, share and combine information more easily (Berners-Lee, 2001, p.34) This

More information

Classification of Code Annotations and Discussion of Compiler-Support for Worst-Case Execution Time Analysis

Classification of Code Annotations and Discussion of Compiler-Support for Worst-Case Execution Time Analysis Proceedings of the 5th Intl Workshop on Worst-Case Execution Time (WCET) Analysis Page 41 of 49 Classification of Code Annotations and Discussion of Compiler-Support for Worst-Case Execution Time Analysis

More information

SISE Semantics Interpretation Concept

SISE Semantics Interpretation Concept SISE Semantics Interpretation Concept Karel Kisza 1 and Jiří Hřebíček 2 1 Masaryk University, Faculty of Infromatics, Botanická 68a Brno, Czech Republic kkisza@mail.muni.cz 2 Masaryk University, Faculty

More information

News Filtering and Summarization System Architecture for Recognition and Summarization of News Pages

News Filtering and Summarization System Architecture for Recognition and Summarization of News Pages Bonfring International Journal of Data Mining, Vol. 7, No. 2, May 2017 11 News Filtering and Summarization System Architecture for Recognition and Summarization of News Pages Bamber and Micah Jason Abstract---

More information

Decision Tree CE-717 : Machine Learning Sharif University of Technology

Decision Tree CE-717 : Machine Learning Sharif University of Technology Decision Tree CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adapted from: Prof. Tom Mitchell Decision tree Approximating functions of usually discrete

More information

Pre-Requisites: CS2510. NU Core Designations: AD

Pre-Requisites: CS2510. NU Core Designations: AD DS4100: Data Collection, Integration and Analysis Teaches how to collect data from multiple sources and integrate them into consistent data sets. Explains how to use semi-automated and automated classification

More information

MURDOCH RESEARCH REPOSITORY

MURDOCH RESEARCH REPOSITORY MURDOCH RESEARCH REPOSITORY http://researchrepository.murdoch.edu.au/ This is the author s final version of the work, as accepted for publication following peer review but without the publisher s layout

More information

How are XML-based Marc21 and Dublin Core Records Indexed and ranked by General Search Engines in Dynamic Online Environments?

How are XML-based Marc21 and Dublin Core Records Indexed and ranked by General Search Engines in Dynamic Online Environments? How are XML-based Marc21 and Dublin Core Records Indexed and ranked by General Search Engines in Dynamic Online Environments? A. Hossein Farajpahlou Professor, Dept. Lib. and Info. Sci., Shahid Chamran

More information

TIC: A Topic-based Intelligent Crawler

TIC: A Topic-based Intelligent Crawler 2011 International Conference on Information and Intelligent Computing IPCSIT vol.18 (2011) (2011) IACSIT Press, Singapore TIC: A Topic-based Intelligent Crawler Hossein Shahsavand Baghdadi and Bali Ranaivo-Malançon

More information

A Model of Machine Learning Based on User Preference of Attributes

A Model of Machine Learning Based on User Preference of Attributes 1 A Model of Machine Learning Based on User Preference of Attributes Yiyu Yao 1, Yan Zhao 1, Jue Wang 2 and Suqing Han 2 1 Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada

More information

KDD, SEMMA AND CRISP-DM: A PARALLEL OVERVIEW. Ana Azevedo and M.F. Santos

KDD, SEMMA AND CRISP-DM: A PARALLEL OVERVIEW. Ana Azevedo and M.F. Santos KDD, SEMMA AND CRISP-DM: A PARALLEL OVERVIEW Ana Azevedo and M.F. Santos ABSTRACT In the last years there has been a huge growth and consolidation of the Data Mining field. Some efforts are being done

More information

Authoring and Maintaining of Educational Applications on the Web

Authoring and Maintaining of Educational Applications on the Web Authoring and Maintaining of Educational Applications on the Web Denis Helic Institute for Information Processing and Computer Supported New Media ( IICM ), Graz University of Technology Graz, Austria

More information

Device Independent Principles for Adapted Content Delivery

Device Independent Principles for Adapted Content Delivery Device Independent Principles for Adapted Content Delivery Tayeb Lemlouma 1 and Nabil Layaïda 2 OPERA Project Zirst 655 Avenue de l Europe - 38330 Montbonnot, Saint Martin, France Tel: +33 4 7661 5281

More information

Metadata and Encoding Standards for Digital Initiatives: An Introduction

Metadata and Encoding Standards for Digital Initiatives: An Introduction Metadata and Encoding Standards for Digital Initiatives: An Introduction Maureen P. Walsh, The Ohio State University Libraries KSU-SLIS Organization of Information 60002-004 October 29, 2007 Part One Non-MARC

More information

Combining Different Business Rules Technologies:A Rationalization

Combining Different Business Rules Technologies:A Rationalization A research and education initiative at the MIT Sloan School of Management Combining Different Business Rules Technologies:A Rationalization Paper 116 Benjamin Grosof Isabelle Rouvellou Lou Degenaro Hoi

More information

Automatic Generation of Meta Tags for Intra-Semantic- Web

Automatic Generation of Meta Tags for Intra-Semantic- Web Automatic Generation of Meta Tags for Intra-Semantic- Web Dr. Damir avar and Dr. Uta Störl * Dresdner Bank AG IS-STA Software-Technologie und Architektur für Allianz-Gruppe Deutschland Research and Innovations

More information

Semantic Clickstream Mining

Semantic Clickstream Mining Semantic Clickstream Mining Mehrdad Jalali 1, and Norwati Mustapha 2 1 Department of Software Engineering, Mashhad Branch, Islamic Azad University, Mashhad, Iran 2 Department of Computer Science, Universiti

More information

Wither OWL in a knowledgegraphed, Linked-Data World?

Wither OWL in a knowledgegraphed, Linked-Data World? Wither OWL in a knowledgegraphed, Linked-Data World? Jim Hendler @jahendler Tetherless World Professor of Computer, Web and Cognitive Science Director, Rensselaer Institute for Data Exploration and Applications

More information

Viewpoint Review & Analytics

Viewpoint Review & Analytics The Viewpoint all-in-one e-discovery platform enables law firms, corporations and service providers to manage every phase of the e-discovery lifecycle with the power of a single product. The Viewpoint

More information

A SMIL Editor and Rendering Tool for Multimedia Synchronization and Integration

A SMIL Editor and Rendering Tool for Multimedia Synchronization and Integration A SMIL Editor and Rendering Tool for Multimedia Synchronization and Integration Stephen J.H. Yang 1, Norman W.Y. Shao 2, Kevin C.Y. Kuo 3 National Central University 1 National Kaohsiung First University

More information

Inductively Generated Pointcuts to Support Refactoring to Aspects

Inductively Generated Pointcuts to Support Refactoring to Aspects Inductively Generated Pointcuts to Support Refactoring to Aspects Tom Tourwé Centrum voor Wiskunde en Informatica P.O. Box 94079, NL-1090 GB Amsterdam The Netherlands Email: tom.tourwe@cwi.nl Andy Kellens

More information

International Journal for Management Science And Technology (IJMST)

International Journal for Management Science And Technology (IJMST) Volume 4; Issue 03 Manuscript- 1 ISSN: 2320-8848 (Online) ISSN: 2321-0362 (Print) International Journal for Management Science And Technology (IJMST) GENERATION OF SOURCE CODE SUMMARY BY AUTOMATIC IDENTIFICATION

More information

Inductive Programming

Inductive Programming Inductive Programming A Unifying Framework for Analysis and Evaluation of Inductive Programming Systems Hofmann, Kitzelmann, Schmid Cognitive Systems Group University of Bamberg AGI 2009 CogSys Group (Univ.

More information

A B2B Search Engine. Abstract. Motivation. Challenges. Technical Report

A B2B Search Engine. Abstract. Motivation. Challenges. Technical Report Technical Report A B2B Search Engine Abstract In this report, we describe a business-to-business search engine that allows searching for potential customers with highly-specific queries. Currently over

More information

OCL Support in MOF Repositories

OCL Support in MOF Repositories OCL Support in MOF Repositories Joachim Hoessler, Michael Soden Department of Computer Science Technical University Berlin hoessler@cs.tu-berlin.de, soden@cs.tu-berlin.de Abstract From metamodels that

More information

Re-designing Online Terminology Resources for German Grammar

Re-designing Online Terminology Resources for German Grammar Re-designing Online Terminology Resources for German Grammar Project Report Karolina Suchowolec, Christian Lang, and Roman Schneider Institut für Deutsche Sprache (IDS), Mannheim, Germany {suchowolec,

More information

Extracting the Range of cps from Affine Typing

Extracting the Range of cps from Affine Typing Extracting the Range of cps from Affine Typing Extended Abstract Josh Berdine, Peter W. O Hearn Queen Mary, University of London {berdine, ohearn}@dcs.qmul.ac.uk Hayo Thielecke The University of Birmingham

More information

The Rough Set Engine GROBIAN

The Rough Set Engine GROBIAN The Rough Set Engine GROBIAN Ivo Düntsch School of Information and Software Engineering University of Ulster Newtownabbey, BT 37 0QB, N.Ireland I.Duentsch@ulst.ac.uk Günther Gediga FB Psychologie / Methodenlehre

More information

Probabilistic Information Integration and Retrieval in the Semantic Web

Probabilistic Information Integration and Retrieval in the Semantic Web Probabilistic Information Integration and Retrieval in the Semantic Web Livia Predoiu Institute of Computer Science, University of Mannheim, A5,6, 68159 Mannheim, Germany livia@informatik.uni-mannheim.de

More information

6. Relational Algebra (Part II)

6. Relational Algebra (Part II) 6. Relational Algebra (Part II) 6.1. Introduction In the previous chapter, we introduced relational algebra as a fundamental model of relational database manipulation. In particular, we defined and discussed

More information

Building Web Annotation Stickies based on Bidirectional Links

Building Web Annotation Stickies based on Bidirectional Links Building Web Annotation Stickies based on Bidirectional Links Hiroyuki Sano, Taiki Ito, Tadachika Ozono and Toramatsu Shintani Dept. of Computer Science and Engineering Graduate School of Engineering,

More information

Intrusion Detection Using Data Mining Technique (Classification)

Intrusion Detection Using Data Mining Technique (Classification) Intrusion Detection Using Data Mining Technique (Classification) Dr.D.Aruna Kumari Phd 1 N.Tejeswani 2 G.Sravani 3 R.Phani Krishna 4 1 Associative professor, K L University,Guntur(dt), 2 B.Tech(1V/1V),ECM,

More information

On Reduct Construction Algorithms

On Reduct Construction Algorithms 1 On Reduct Construction Algorithms Yiyu Yao 1, Yan Zhao 1 and Jue Wang 2 1 Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 {yyao, yanzhao}@cs.uregina.ca 2 Laboratory

More information

A graphical user interface for service adaptation

A graphical user interface for service adaptation A graphical user interface for service adaptation Christian Gierds 1 and Niels Lohmann 2 1 Humboldt-Universität zu Berlin, Institut für Informatik, Unter den Linden 6, 10099 Berlin, Germany gierds@informatik.hu-berlin.de

More information

On the Reduction of Dublin Core Metadata Application Profiles to Description Logics and OWL

On the Reduction of Dublin Core Metadata Application Profiles to Description Logics and OWL On the Reduction of Dublin Core Metadata Application Profiles to Description Logics and OWL Dimitrios A. Koutsomitropoulos High Performance Information Systems Lab, Computer Engineering and Informatics

More information

A Simplified Correctness Proof for a Well-Known Algorithm Computing Strongly Connected Components

A Simplified Correctness Proof for a Well-Known Algorithm Computing Strongly Connected Components A Simplified Correctness Proof for a Well-Known Algorithm Computing Strongly Connected Components Ingo Wegener FB Informatik, LS2, Univ. Dortmund, 44221 Dortmund, Germany wegener@ls2.cs.uni-dortmund.de

More information

Integrating SysML and OWL

Integrating SysML and OWL Integrating SysML and OWL Henson Graves Lockheed Martin Aeronautics Company Fort Worth Texas, USA henson.graves@lmco.com Abstract. To use OWL2 for modeling a system design one must be able to construct

More information

Dynamic Data in terms of Data Mining Streams

Dynamic Data in terms of Data Mining Streams International Journal of Computer Science and Software Engineering Volume 1, Number 1 (2015), pp. 25-31 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining

More information

SK International Journal of Multidisciplinary Research Hub Research Article / Survey Paper / Case Study Published By: SK Publisher

SK International Journal of Multidisciplinary Research Hub Research Article / Survey Paper / Case Study Published By: SK Publisher ISSN: 2394 3122 (Online) Volume 2, Issue 1, January 2015 Research Article / Survey Paper / Case Study Published By: SK Publisher P. Elamathi 1 M.Phil. Full Time Research Scholar Vivekanandha College of

More information

The UDK Approach: the 4th Generation of an Environmental Data. Catalogue Introduced in Austria and Germany

The UDK Approach: the 4th Generation of an Environmental Data. Catalogue Introduced in Austria and Germany The UDK Approach: the 4th Generation of an Environmental Data Catalogue Introduced in Austria and Germany Walter Swoboda Fred Kruse Ministry of Environment of Lower Saxony UDK coordination center Archivstrasse

More information

Collaborative Framework for Testing Web Application Vulnerabilities Using STOWS

Collaborative Framework for Testing Web Application Vulnerabilities Using STOWS Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,

More information

XML FOR FLEXIBILITY AND EXTENSIBILITY OF DESIGN INFORMATION MODELS

XML FOR FLEXIBILITY AND EXTENSIBILITY OF DESIGN INFORMATION MODELS XML FOR FLEXIBILITY AND EXTENSIBILITY OF DESIGN INFORMATION MODELS JOS P. VAN LEEUWEN AND A.J. JESSURUN Eindhoven University of Technology, The Netherlands Faculty of Building and Architecture, Design

More information

Keywords Data alignment, Data annotation, Web database, Search Result Record

Keywords Data alignment, Data annotation, Web database, Search Result Record Volume 5, Issue 8, August 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Annotating Web

More information

EMF Metrics: Specification and Calculation of Model Metrics within the Eclipse Modeling Framework

EMF Metrics: Specification and Calculation of Model Metrics within the Eclipse Modeling Framework EMF Metrics: Specification and Calculation of Model Metrics within the Eclipse Modeling Framework Thorsten Arendt a, Pawel Stepien a, Gabriele Taentzer a a Philipps-Universität Marburg, FB12 - Mathematics

More information

Transformations. WFLP-2013 September 13, 2013

Transformations. WFLP-2013 September 13, 2013 Over WFLP-2013 September 13, 2013 Over Algorithmic Running for AD FB Informatik und Informationswissenschaft Universität Konstanz Email: claus.zinn@uni-konstanz.de WWW: http://www.inf.uni-konstanz.de/~zinn

More information

Forgetting and Compacting data in Concept Learning

Forgetting and Compacting data in Concept Learning Forgetting and Compacting data in Concept Learning Gunther Sablon and Luc De Raedt Department of Computer Science, Katholieke Universiteit Leuven Celestijnenlaan 200A, B-3001 Heverlee, Belgium Email: {Gunther.Sablon,Luc.DeRaedt}@cs.kuleuven.ac.be

More information

Semantic Web Search Model for Information Retrieval of the Semantic Data *

Semantic Web Search Model for Information Retrieval of the Semantic Data * Semantic Web Search Model for Information Retrieval of the Semantic Data * Okkyung Choi 1, SeokHyun Yoon 1, Myeongeun Oh 1, and Sangyong Han 2 Department of Computer Science & Engineering Chungang University

More information

XETA: extensible metadata System

XETA: extensible metadata System XETA: extensible metadata System Abstract: This paper presents an extensible metadata system (XETA System) which makes it possible for the user to organize and extend the structure of metadata. We discuss

More information

XML Information Set. Working Draft of May 17, 1999

XML Information Set. Working Draft of May 17, 1999 XML Information Set Working Draft of May 17, 1999 This version: http://www.w3.org/tr/1999/wd-xml-infoset-19990517 Latest version: http://www.w3.org/tr/xml-infoset Editors: John Cowan David Megginson Copyright

More information