Hypermedia for Information Retrieval. University of Padua - Italy. University of Glasgow - Scotland

Size: px
Start display at page:

Download "Hypermedia for Information Retrieval. University of Padua - Italy. University of Glasgow - Scotland"

Transcription

1 Automatic Authoring and Construction of Hypermedia for Information Retrieval Maristella Agosti, Massimo Melucci Department of Electronics and Informatics University of Padua - Italy Fabio Crestani Department of Computing Science University of Glasgow - Scotland Address to which correspondence should be sent: Maristella Agosti, Department of Electronics and Informatics, University of Padua, Via Gradenigo, 6/a, Padova, Italy. Voice: ext Fax: agosti@ipdunivx.unipd.it. 1

2 Abstract The paper describes a complete process and a tool for the automatic construction of a multimedia hypertext starting from a large multimedia document collection. Through the use of an authoring methodology the document collection is automatically authored and the result is a multimedia hypertext, also called hypermedia, written in HTML, almost a standard among hypermedia mark-up languages. The resulting hypermedia can be browsed and queried using Mosaic, an interface developed in the framework of the World Wide Web project. In particular, the set of methods and techniques used for the automatic construction of the hypermedia is described in the paper and their relevance in the context of Multimedia Information Retrieval is highlighted. Keywords: Information Storage and Retrieval, Content Analysis and Indexing, Content Based Retrieval, Hypertext/Hypermedia, Automatic Authoring of Hypermedia. 1 Introduction In Information Retrieval (IR) systems, the user starts to search the documents pertinent to his informative requirements by entering a query. The system replies to the user by retrieving from a large collection the documents matching the user's query. This querying strategy might be considered as a batch process since it seems that the user cannot adequately interact with the system. On the contrary, inhypermedia systems, the user-system interaction by browsing is the main feature. People often think that only hypermedia can provide browsing. This is untrue, the ability to move between related topics or documents can also be provided by IR systems supporting relevance feedback [van Rijsbergen, 1979]. Unlike hypermedia, which generally has links statically xed by an expert user, relevance feedback allows the user to dynamically create links at run time by searching for documents similar to some others marked as relevant. However, users will only use browsing if it is easy to do. Browsing by means of relevance feedback isavery complex process and most of the existing IR systems supporting relevance feedback do not have a good user interface for browsing. Moreover, though early work on browsing text collections in IR dates back to the seventies [Oddy, 1975], only very few experimental IR systems allow browsing [Thompson, 1989, Frisse, 1988], and only fairly recently there has been a new impulse in this research direction (see for example [Agosti et al., 1989, Dunlop, 1991]). Systems providing either browsing and querying search strategies allow users of accessing ahypermedia by browsing after a query has been issued. So users are also given access to documents that have not matched the query. In particular, given a retrieved document, 2

3 the user can be now pick the document neighbours up, if they have not matched the query. This mixed access way is useful especially if the collection is made up of multimedia documents as well. Indeed, the multimedia document indexing process is rather dicult because of the number of kinds, dierent nature and representation way of media. For these reasons, multimedia document indexing should require more methodological and experimental work, whereas textual document indexing has been deeply studied through experiments carried out in several contexts. Some approaches have been proposed to index multimedia document collections: we adopt the approach proposed by Dunlop in [Dunlop, 1991], because the non-textual document indexing is possible using the neighbour documents. In fact cluster based techniques are used to relate indexed documents that are neighbour to the multimedia document. In the same way multimedia portions of a complete document can be indexed and interconnected to construct an hypermedia document. For example, a Fig. of a document is indexed using the descriptors of its caption and these descriptors are related to the descriptors that are in neighbour clustered text portions of the complete document. The approach presented in this paper aims at enabling users of large multimedia document collections to browse the document base in a natural way, navigating through connections representing statistical or semantic relationships between multimedian IR (MIR) objects. A MIR object can be text, Fig., term, picture, concept, etc.. The approach is content based because it uses in a coherent wayvarious IR techniques of content representation for linking MIR objects. It should be noted that these techniques have been developed separately and now connected together in a complete approach for the automatic authoring of a multimedia collection to construct and make available an IR hypermedia. The model presented in [Agosti and Crestani, 1993] provides a conceptual reference for the network structure of an IR hypermedia to be built. An IR hypermedia isamultimedia document base which allows access to multimedia documents mainly by browsing, but it has been authored using IR techniques of content representation and linking. An IR hypermedia is composed of nodes, that are stores of information, and links, that are connections between nodes. The user makes his browsing navigating from node to node using links. The series of navigational choices which are made leads, hopefully, through the document base to the desired information. The automatic authoring of multimedia documents is made easier by the chosen indexing approach presented in [Dunlop, 1991]. Through this approach, authoring a multimedia document is marking the neighbour textual documents up. This means that a multimedia node is inserted into the IR hypermedia if one of more neighbours are nodes as well. In other words, the descriptors representing a textual document are used for representing content-based links between non-textual documents. From this, it appears that the main core of the work is the automatic authoring and contruction of IR hypermedia starting from the document collection of texts. Therefore, the paper concentrates on the presentation of the research results that permit the building-up of the IR hypermedia. Manual authoring is feasible if the collection to be authored is not as large as the ones typically managed by an IR system. This is because manual authoring is a time-consuming 3

4 process and it is feasible only if it is not a hard task for an expert user in terms of time. Moreover, authoring \by hand" depends on who is marking documents up and then on his subjective criteria. On the contrary, automatic authoring represents the way to construct a hypermedia from a large collection of multimedia documents, without suering either the limitations of time and the expert user subjectiveness. In fact, the methodology we propose is based on well known and sound IR techniques and it allows user to construct a hypermedia that is the result of a unbiased process, because links are xed according to statistical measures. The presence of dictionaries and thesauri helps the user during the query formulation and browsing while he is looking for documents relevant to his informative requirements. Automatic authoring is becoming more and more important as a task within the electronic publishing, information dissemination and retrieval processes, because a lot of information are indeed contained in journal issues and in general in written form. If novel hypermedia techniques are used, such as the automatic authoring, the user can overcome the traditional linear reading of documents previously available only in textual format on paper. For example, the ACM document collection could be automatically authored; the querying and the brwosing processes could be easier for the user that would have also the possibiliy of using also the ACM classication scheme as a content based representation tool. We feel that, while maintaining the same scientic value, such a document collection could be accessed and browsed from remote sites using non traditional tools that make easier retrieval, reading and understanding of the documents content. 2 The Approach for Automatic Authoring and Construction The starting point of the approach is the usual set of IR raw data: a at large document collection. Documents are available as individual unrelated objects. The approach has the following aims. Each aim concerns with the setting of: a homogeneous collection of terms, namely the index terms collection the concept collection the network of links within each collection: documents (D-D links), terms (T-T links), and concepts (C-C links) the network of links between a pair of collections: documents - terms (D-T links), terms - concepts (T-C links). 4

5 Each aim can be reached in one or more steps. With the exception of the rst aim, which must be reached before any others, there is no strict or unique order for the reaching of the remaining aims. Some aims can also be reached in parallel for a faster construction of the hypertext. The order we use in the presentation follows a simple idea: rst determine the dierent objects, then build up links between homogeneous objects and at last between objects of dierent collections. For the presentation of specic methodological details, related to this approach, the reader is referred to [Agosti and Crestani, 1993]. 1. Construction of the Collection of Index Terms During this step the index terms are created and connected to documents (D-T links). The collection of index terms is created by extracting terms from documents using an automatic process known as automatic indexing. It is by means of this process that individual or groups of terms found in documents become index terms, assuming a representational power that places them on a higher level of abstraction than the documents. The indexing process is a very complex process which has been studied for long time in IR. It constitutes the core of the IR research because it is the technique by means of which a document informative content is represented and the content-based retrieval made possible. In fact the obtained description of the document informative content can then be used to nd an answer to the user information need by means of a matching process with the user query, which is represented by means of the same indexing process. There are many ways to perform indexing on a set of documents. The most complete way to perform it can be divided in: term extraction, stop terms removal, conation, weighting. We adopt this complete set of techniques of performing indexing; please note that these techniques are well described in classical IR textbooks [van Rijsbergen, 1979, Salton and McGill, 1983]. 2. Links between Concepts using Semantic Relationships We assume that we are able to identify a set of concepts of the application domain. From an IR point of view, there is no operative advantage in having a set of application domain concepts if they are not connected to each other according to their semantics. It is by looking at the relationships a concept has with other concepts that we can understand the \meaning" of the concept in the context of the application domain. When this meaning has been fully understood, it is also possible to understand the \usage" of the index terms connected to the concept. In fact, they just represent the way the concept has been addressed in the documents belonging to the collection. The way a concept has been addressed by authors of documents in the collection could dier from the way the user of 5

6 the IR system is addressing it. Using a very precise term in addressing a concept increases the precision of the retrieval. However, the user could be interested in considering a concept in a loose way. This can be done using index terms expressing concepts semantically related to the concept which is central to the user's information need. In this way it is possible to increase also the recall of the retrieval. The utility of having a tool which provides for each concept a set of semantically related concepts has long been recognised in IR. A thesaurus is a tool which provides for each term in a specic application domain a set of terms related to it by some well dened semantic relationships. For its nature, the structure of associations represented in a thesaurus can be directly mapped into a network structure: concepts are mapped to nodes (C nodes) and concept relationships to links (C-C links). Sometimes a thesaurus on the specic application domain is not available. In this case it becomes necessary to build up the network of concepts manually. The rst essential step is to identify a set of concepts and their relationships. The fundamental types of semantic relationships commonly expressed in a thesaurus are: scope, equivalence, hierarchical and associative relationships (see for example [Srinivasdan, 1992]). They provide a useful frame of reference on the kind of relationships to be taken into consideration for a manual construction of a network of concepts. 3. Association between Index Terms and Concepts The semantic association between index terms and concepts can be built using dierent formal approaches. The approach described in [Agosti and Marchetti, 1992] and named \semantic association" permits the automatic construction of links between index terms and concepts (T-C links). The cited paper reports a complete description of this technique. 4. Statistically Determined Relationships between Index Terms There are many techniques for identifying relationships between index terms, for example, using the concept network it is possible to relate index terms by means of objects on a higher level of abstraction. In this work we use a technique for nding relationships between index terms using only information present on the same level of abstraction. This technique does not involve the semantics of index terms but only information provided by statistical analysis of index term occurrence in documents. See [Agosti and Crestani, 1993] for a detailed description of the technique used for the construction of T-T links. 6

7 Concept level Index term level Auxiliary data Document level Concept (C) Index term (T) Document (D) Collection of documents Figure 1: A conceptual schema of the IR hypermedia 5. Automatic Determination of Relationships between Documents For an automatic set up of links between documents (D-D links) it is possible to use statistical techniques very similar to those employed for the construction of links between index terms. Other techniques for setting up a network of related documents make use of bibliographic citations. Bibliographic citations can be used to build up a network implicitly assuming that the documents cited by a document must be somehow related to it. Most operational IR systems use only the D-T links in the retrieval process. These are represented in the inverted le structure which is the most common storage structure in IR. Only very few operational IR systems enable the user take advantage of relationships like those established by C-C and T-C links, and they are used only as an aid to query formulation. Relationships like those represented by T-T and D-D links are used only in few experimental IR systems. A schema of the IR hypermedia produced by the authoring process is depicted in Fig The Automatic Authoring of an IR Hypermedia A library of MIR object classes has been developed using C++. The library implements the basic IR structures and its abstract interfaces allow user to use IR functionalities. It is important to note that the class library has been developed as independent ofany specic application as possible so that using it a designer can nd and re-use basic IR structures and functionalities without having to re-implement them. According to this independence requirement, the class library includes classes that are independent of a specic application 7

8 so it can be used as a generic IR framework. Using this class library the automatic authoring of an IR hypermedia is done from a collection of documents. The automatic authoring process produces a hypertext in which each MIR object is connected to other ones by means of links. Links connecting MIR objects are set up on the basis of dierent criteria, such as, similarity among documents (D-D links), synonymy and contiguity among index terms (T-T links), pertinence between documents and index terms (D-T links), and semantics between index terms and concepts (T-C links). An instance of an object of the document class refers to the set of index terms extracted from it and describing its informative content. D-D links are placed on the basis of the measures of similarity among documents. The reference of a document towards another document is an attribute encapsulated by the former. We can represent a collection by dening an ad-hoc sub-class of the document class. An instance of the auxiliary data class represents an auxiliary data and it is used to represent the semantic content of a set of documents. The abstract interface of the auxiliary data class enables the application designer to access a generic auxiliary data, without considering the specic criterion by which the auxiliary data has been constructed and associated to the pertinent documents. This means that the class auxiliary data provides an \umbrella" to manage a generic auxiliary data sub-class specialised from it. This provides the designer with ad-hoc tools to manage specic types of auxiliary data. Our approach provides two types of auxiliary data: index terms and concepts. For their distinctive characteristics we think it is useful to distinguish the sub-classes concept and index term by specialising them from the auxiliary data class to emphasise the specic feature of concepts and index terms with respect to a generic auxiliary data. Index terms are auxiliary data which have been automatically extracted from documents through an indexing process. An index term is associated with its frequency of occurrence within the collection; it is also associated to the set of documents from which it has been extracted. T-T and D-T links are placed on the basis of information provided by statistical analysis of index terms and documents occurrence respectively. The references of a document towards its extracted index terms, the references of an index term towards another index term and towards the pertinent documents are attributes encapsulated by the objects representing documents and index terms. Concepts are represented through instances of the class concept that implements the third level of the conceptual architecture. C-C links are set up on the basis of the semantic relationships among concepts. A relationship between two concepts is an entity holding the semantics that has to be represented. A semantic relationship between two concepts is represented through an instance of the relationship class. Since there can exist dierent types of relationship between concepts, it is useful to break the class relationship up into dierent sub-classes. Thus, the relationship class is specialised in more sub-classes describing the fundamental types of semantic relationships commonly expressed in a seman- 8

9 tic structure; these sub-classes are: scope, hierarchical, synonymy and associative. The sub-class hierarchical has been further specialised in the class specialisation to represent the relationship between a concept to the more specic ones. It has been previously stressed how it is almost always necessary to model \by hand" the semantic relationships between concepts. It is the user himself or a team of domain experts who has to build up the semantic structure which represents important and useful application domain knowledge. However, if this semantic structure is represented and stored in a machine readable form, the prototype is able to build up automatically the network of concepts. This means that, the tool can recognise concepts and relationships among concepts that are coded in a machine readable form, and it is no longer necessary to manually build up the network of concepts and it is also possible to set up automatically the T-C links. We have previously described the semantics of these links: concepts can be linked to several index terms and an index term can be linked to dierent concepts; for example, the concept \Information Retrieval" is linked to the index terms \Information" and \Retrieval", but the index term \Information" is also linked to the concept \Information Processing". In general, an automatic indexing algorithm considers index terms made by one word, such as \Information" or \Retrieval". The indexing algorithm we developed treats index terms in that way too. However, our class library provides functions to split concepts up into one or more terms; if a split term is an index term, the connection with the concept is set up. In analogous way, index terms can be concatenated to construct multi-word term; if the latter is a concept, the T-C link is completed. Therefore, the passage from the index term level to the concept level, and vice-versa, that is, the T-C linking mechanism, is possible through the operations of splitting of a concept up into terms and concatenation of terms to build a concept. 4 The Automatic Authoring Process The automatic authoring process makes use of the class library presented in the previous Section. The process is depicted in Fig. 2: the process input is a at document collection and the output is an IR hypermedia that is written in HTML (HyperText Mark-up Language). IR hypermedia can be browsed and queried using Mosaic (for machines with a graphical interface) or Lynx (for machines able to deal only with text). It is importantto highlight that our approach is general and applicable to several types of collection, as long as it is possible to have them in some standard machine readable forms. At present we can handle plain ASCII, LaT E X, BibTEX, BIDS (standard used by the Institute of scientic information Data Service at Bath), and INSPEC. We are currently adding capabilities for translating into HTML document formats written with other standards. In addition, we have automatically authored the ACM classication scheme to provide an IR hypermedia with a widespread and wide-ranging concept collection for the computing and computer science domain. This means that, the ACM document collections could be automatically 9

10 flat document collection documents representation stop words removal conflation indexing dictionary weighting concept collection concept network construction HTML document base automatic authoring MOSAIC Figure 2: The automatic authoring process authored and queried using the ACM classication scheme itself. Automatic authoring becomes more important if one takes into account the way documents on machine readable form and on-line bibliographies do broaden over the Internet. Documents are indeed physically stored on dierent Internet sites, but they are interrelated through citation links. If this approach were used, it would make available all those documents connected by means of content based links. The authoring process is divided into the following sequence of steps: 1. Collection loading and document representation. The collection is analysed to produce document representations in terms of objects of the class document. 2. Indexing process. The aim of this task is populating the class index term that makes up the dictionary of the collection. As words are extracted from documents, they are removed if they are stop words or, otherwise, they are conated. The Porter's stemming algorithm [Porter, 1980] is used to conate words to index term. Index terms extracted from each document item are merged into a unique list and associated to the document. 3. Semantic structure loading and concepts representation corresponds to the second phase of the design process. Like the rst task, this one too depends on the particular semantic structure. We have previously outlined how the tool is able to automatically read, represent and manage a semantic structure if the latter is stored in some standard format and including the information for setting up the relationships among concepts. Therefore the C-C links are set up during this task. 4. Automatic authoring. This step can be considered as the core of the entire process 10

11 because it makes available to the user an automatically constructed IR hypermedia. It is this task that implements the last three phases of the design process. The computation of the similarity measures and the automatic setting of the D-D, D-T, T-T and T-C links are performed during this step. The operations performed during this task store items of the three levels of the conceptual schema into a collection of HTML documents. The HTML documents are linked among themselves using the mechanisms made available by the hypertext mark-up language. A HTML document is linked to another or to a part of itself by means of a pair of tags, say, \link" and \anchor" tags; Mosaic is provided with the functionality of retrieving and displaying the anchored document after the user has clicked on the link tag. A document node is authored with link tags on the index terms extracted from its text. Special tags give access to the dictionary and to the classication system. An index term node is authored by linking all the documents from which it is extracted, all the index terms similar to it, and all the related concepts of the classication system. 5 Browsing and Querying an IR Hypermedia Using Mosaic the user can access any MIR object of the MIR hypertext by means of two dierent procedures: browsing and querying. In [Agosti and Crestani, 1993] we have stressed and justied the importance of using browsing and querying together to access IR documents. On the network structure of the IR hypermedia it is possible to browse among concepts, index terms, and documents, exploring the large document and auxiliary data space. It is also possible to query the IR hypermedia using the keyword search procedure available through Mosaic. A more complex technique for query processing is under development. In fact, using an IR hypermedia, the process of querying can be enhanced through spreading activation techniques (see, for example, [Salton and Buckley, 1988]). Once the user has entered the network structure of the IR hypermedia using a concept, an index term, or a document, he can go on building up a query by browsing over other concepts, index terms, or documents and including in the query those that he thinks are relevant to his information need. After the user has built up a query by browsing an automatic procedure can be activated. This makes use of the dierent semantics associated to links and node types can spread activation over the network and use concepts, index terms or documents that are closely related to those indicated by the user in the query. The user can provide some feedback to the system by marking the nodes that he considers relevant in the retrieved list. In this way the user assesses if the spreading has been successful or not in including new MIR objects to his query. This process is similar to the relevance feedback technique used in advanced IR systems. In those systems, relevance feedback is used to modify the query terms according to the suggestions the user gives back to the system after he has marked the relevant documents. New query terms are determined by the system on the 11

12 Figure 3: TACHIR home page basis of the weights of the previous query terms. Our approach does not only provide query modication based on statistical analysis (D-T and T-T links), but also on semantic relationships (T-C and C-C links). After a new query has been formulated, the user can start a new spreading activation process and continue its search in an iterative and interactive process controlled (constrained) by the system. 6 Initial Experimental Results of Constructing and Browsing of an IR Hypermedia We have developed a tool for the automatic construction of an IR hypermedia which makes use of the class library presented in the previous Section. Wehave called it TACHIR, which stands for: Tool for the Automatic Construction of Hypermedia for Information Retrieval. TACHIR can be activated from inside a Mosaic session by clicking on the devoted button of the home page. In Fig. 3 we can see the \Automatic Construction of an IR hypermedia" button which activates such a function. The user is asked to indicate the location of a collection of documents and a collection of concepts such as a thesaurus or the ACM classication scheme, if existing. TACHIR builds up automatically the corresponding IR hypermedia that can be browsed and queried using Mosaic. If one or more an IR hypermedia are already available, the user can otherwise pick up the \Browsing and Querying of an IR Hypermedia" to eectively browse or query the chosen IR hypermedia having available the functionalities illustrated in Section 5. At present, only document collections following the BIDS, LaT E X, BibTEXand plain ASCII 12

13 documents have been used to automatically generate an IR hypermedia. At the current stage, we tested the prototype by adopting a quite large BibTEXbibliographic reference collection. It is important to highlight that our approach is general and applicable to several collections, as long as it is possible to have them in a machine readable form. We are implementing new TACHIR functionalities to translate other types of collections into an IR hypermedia. Some collections are made up of documents that include reference to Fig.s and bibliographic references, other than their usual structured full-text: for example, LaT E X documents. Nowadays, tools for translating LaT E X documents into HTML documents are available, but they lack in the automatic authoring. In the following, a sort of guided tour to the construction of the IR hypermedia of a BibTEXcollection is presented to explain with a real case study the complete approach and construction of an IR hypermedia. Of course, this tour cannot be exhaustive, but it is representative of the possibilities available to the users of TACHIR. We have considered as input raw data a BibTEXcollection, since a BibTEXcollection can be comprehensive of abstracts that are full-text documents. A BibTEXitem is a bibliographic record including dierent kinds of entry: keywords, abstract, other than the usual data elds, such as title, authors, aliation, and so forth. In the following, it is used a BibTEXcollection of 18,000 entries on object-orientation. As we have previously pointed out, we have also chosen the ACM classication scheme to be the semantic structure placed at the third level of the architecture, since it is one of the most widespread semantic structure in the computing and computer science domain. The ACM classication scheme entries are hierarchically organised and each entry is a concept that can hold one or more narrower concepts and has a broader concept. Each entry of the ACM classication that can be picked up by the user is underlined by the prototype. The user can select a whole entry or a part of it; for example, given a document title, it is possible to retrieve the entire document or the information associated to a title term, whether the user does click on the whole title or on a title term respectively. However, it must be noted that these characteristics are typical of other collections as well. Once the user has selected the document collection of his interest, the BibTEXcollection in this guided tour, he chooses the starting point of the browsing among the three levels of the architecture that is depicted in Fig. 1: 1. the collection of documents, 2. the set of index terms that are automatically extracted from the documents, 3. the set of concepts that take part in the ACM classication scheme. Let us suppose that the user has chosen the second level, namely, the dictionary term level. After the user has chosen the index term object, a page containing links to the related information is displayed. The information related to an index term are represented by 13

14 Figure 4: An index term and its related information three buttons linking similar terms, pertinent documents and related concepts. Each of these sets of information represents a \direction" along which an index term spreads its semantics: two directions are vertical ones, towards the higher and lower levels, and the other is horizontal one, the level where the term is placed. Clicking one of these \directions" permits the user to get the concepts explaining the index term semantics or the documents whose semantics is explained by the index term. At this point it should be noted that such way of starting is one of the possible ways: another strategy is based on querying, but we are now interested mainly in browsing since we are addressing content-based hypermedia functionalities. When the user is looking at an index term he can pick upanentry among dierent ones; for example, the user might be interested in looking up the documents that are pertinent to the term object. Index terms have been associated to a triplet of sets: the sets of similar terms, pertinent documents and related concepts. Let us suppose that the user clicks on the anchor of the pertinent documents. In Fig. 5, the list of documents pertinent to object is presented. The user does pick a specic document up from the list: the document identied by Abiteboul84 is picked up by the user because he infers that it could be of interest for him. After being selected, it is presented to the user (Fig. 6). Only after the selected document has been read through, the user can decide if it is effectively relevant to his informative requirements. Sometimes a user is not able to nd the relevant documents after the rst selection. In that case, he has to reformulate the query, by clicking the button of terms extracted from the selected document, or by going back to the term level, or reading through the list of the similar documents. After having 14

15 Figure 5: The list of documents pertinent to object Figure 6: A document pertinent to object 15

16 Figure 7: Terms extracted from a document chosen the list of extracted terms, the user selects the term database by trying a kind of query reformulation. Picking up the term database among the extracted terms appearing in Fig. 7, the user could collect the documents pertinent to it; from these documents it is possible to choose another document of the list and to take it into a page. However, the term database is rather general and it is not semantically meaningful to the user. Then, it is possible that the user wishes to access the third level by looking for the concepts related to that term. The concept database management is useful just to clarify to the user the possible contexts in which the term database is used. Fig. 8 displays such concept and the underlined terms are those output by the indexing process. ACM classication scheme entries are alphanumeric strings representing concepts. These concepts are organised in a hierarchical manner according to a narrower-broader relationship. Such a entry is made of one or more terms, and some of these terms can be index terms and, then, belong to the second level of the architecture. Accessing concepts through an index term allows user to see the concepts related to it. The index term-concept (T-C) association rule is used during the automatic authoring process: each index term is connected to the concepts containing it, and each concept is connected to the component index terms. This rule is based on a quite straightforward mechanism: when the user is browsing the IR hypermedia at the second level, he can retrieve the concepts whose componentwords equals the pointed index term, together with the available narrower and broader concepts; symmetrically, the user can ask the system to retrieve the index terms forming the concept he is considering during the browsing of the third level. We are going to address the diculties in relating terms to concepts in a more \intelligent" way, since, for example, there could be more concepts related to a term. However, we are aware of the complexity of this task and of those similar ones: automatic construction of thesauri and passage retrieval, to name but a few. 16

17 Figure 8: A concept related to database If the user is not able to nd some useful information out of the concept database management, he should have to go down to the second level to try another strategy, such as, the retrieval of the pertinent documents that are relevant to database, like in the Fig. 5, from which a relevant document is retrieved (Fig. 9). Conclusions We have presented a complete content based approach for the automatic construction of an IR hypermedia and an eective tool based on it. Such a tool has been developed by us to enable the user to produce automatically a hypertext structure written in HTML from a collection of documents. This hypertext, that we called IR hypermedia because it can be enriched by multimedia documents as well, can be browsed and queried using any of the World Wide Web graphical interfaces supporting HTML, for example Mosaic, running on various dierent platforms, and ranging from UNIX to Macintosh. The availability of such a tool can make large collections of documents available for browsing and querying to a large number of users throughout the Internet. At present we are addressing the problems connected to the enhancement of the querying capabilities. In particular we are developing a querying tool that uses a form of constrained spreading activation over the IR hypermedia to produce and present to the user a ranking of the documents because a form of spreading activation could permit the engagement of the user in an iterative and interactive browsing/querying process. 17

18 Acknowledgements Figure 9: A document relevant to database This work was partially funded by a 1993 MURST grant of the Italian Ministry of University. The work of Massimo Melucci has been supported in part by a grant of IDOMENEUS, the ESPRIT Network of Excellence No on Information and Data on Open Media for Networks of Users, for a visiting period at the Department of Computing Science of the University of Glasgow (Scotland). References [Agosti and Crestani, 1993] M. Agosti and F. Crestani. A methodology for the automatic construction of a Hypertext for Information Retrieval. In Proceedings of the ACM Symposium on Applied Computing, pages 745{753, Indianapolis, USA, February [Agosti and Marchetti, 1992] M. Agosti and P.G. Marchetti. User navigation in the IRS conceptual structure through a semantic association function. The Computer Journal, 35(3), [Agosti et al., 1989] M. Agosti, G. Gradenigo, and P. Mattiello. The Hypertext as an Eective Information Retrieval Tool for the Final User. In Antonio A. Martino, editor, Pre-proceedings of the 3rd International Conference onlogics, Informatics and Law, pages 1{19, Florence (Italy), [Dunlop, 1991] M. Dunlop. Multimedia Information Retrieval. PhD Thesis, Department of Computing Science, University of Glasgow, Glasgow, UK, October

19 [Frisse, 1988] M.E. Frisse. Searching for information in a medical handbook. Communications of the ACM, 31(7):880{886, [Oddy, 1975] R.N. Oddy. Reference retrieval based on user inducted dynamic clustering. Phd thesis, University of Newcastle upon Tyne, UK, Computing Science Department, [Porter, 1980] M.F. Porter. An algorithm for sux stripping. Program, 14(3):130{137, [Salton and Buckley, 1988] G. Salton and C. Buckley. On the use of spreading activation methods in automatic Information Retrieval. In Yves Chiaramella, editor, Proceedings of ACM SIGIR, Grenoble, France, June Laboratoire IMAG Genie Informatique. [Salton and McGill, 1983] G. Salton and M.J. McGill. Introduction to modern Information Retrieval. McGraw-Hill, New York, [Srinivasdan, 1992] P. Srinivasdan. Thesaurus construction. In W.B. Frakes and R. Baeza- Yates, editors, Information Retrieval: data structures and algorithms., chapter 9. Prentice Hall, Englewood Clis, New Jersey, USA, [Thompson, 1989] R.H. Thompson. The design and implementation of an intelligent interface for Information Retrieval. Technical report, Computer and Information Science Department, University of Massachusetts, [van Rijsbergen, 1979] C.J. van Rijsbergen. Information Retrieval. Butterworths, London, second edition,

second_language research_teaching sla vivian_cook language_department idl

second_language research_teaching sla vivian_cook language_department idl Using Implicit Relevance Feedback in a Web Search Assistant Maria Fasli and Udo Kruschwitz Department of Computer Science, University of Essex, Wivenhoe Park, Colchester, CO4 3SQ, United Kingdom fmfasli

More information

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data American Journal of Applied Sciences (): -, ISSN -99 Science Publications Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data Ibrahiem M.M. El Emary and Ja'far

More information

An Evaluation of Information Retrieval Accuracy. with Simulated OCR Output. K. Taghva z, and J. Borsack z. University of Massachusetts, Amherst

An Evaluation of Information Retrieval Accuracy. with Simulated OCR Output. K. Taghva z, and J. Borsack z. University of Massachusetts, Amherst An Evaluation of Information Retrieval Accuracy with Simulated OCR Output W.B. Croft y, S.M. Harding y, K. Taghva z, and J. Borsack z y Computer Science Department University of Massachusetts, Amherst

More information

Balancing Manual and Automatic Indexing for Retrieval of Paper Abstracts

Balancing Manual and Automatic Indexing for Retrieval of Paper Abstracts Balancing Manual and Automatic Indexing for Retrieval of Paper Abstracts Kwangcheol Shin 1, Sang-Yong Han 1, and Alexander Gelbukh 1,2 1 Computer Science and Engineering Department, Chung-Ang University,

More information

A Model and a Visual Query Language for Structured Text. handle structure. language. These indices have been studied in literature and their

A Model and a Visual Query Language for Structured Text. handle structure. language. These indices have been studied in literature and their A Model and a Visual Query Language for Structured Text Ricardo Baeza-Yates Gonzalo Navarro Depto. de Ciencias de la Computacion, Universidad de Chile frbaeza,gnavarrog@dcc.uchile.cl Jesus Vegas Pablo

More information

Automatic Construction of News Hypertext. Theodore Dalamagas

Automatic Construction of News Hypertext. Theodore Dalamagas Automatic Construction of News Hypertext Theodore Dalamagas 15 Sep 1997 Abstract Hypertext information retrieval systems combine hypertext and information retrieval capabilities by providing retrieval

More information

CHAPTER 8 Multimedia Information Retrieval

CHAPTER 8 Multimedia Information Retrieval CHAPTER 8 Multimedia Information Retrieval Introduction Text has been the predominant medium for the communication of information. With the availability of better computing capabilities such as availability

More information

b A HYPERTEXT FOR AN INTERACTIVE

b A HYPERTEXT FOR AN INTERACTIVE b A HYPERTEXT FOR AN INTERACTIVE VISIT TO A SCIENCE AND TECHNOLOGY MUSEUM 0. Signore, S. Malasoma, R. Tarchi, L. Tunno and G. Fresta CNUCE Institute of CNR Pisa (Italy) According to Nielsen (1990), "hypertext

More information

2. PRELIMINARIES MANICURE is specically designed to prepare text collections from printed materials for information retrieval applications. In this ca

2. PRELIMINARIES MANICURE is specically designed to prepare text collections from printed materials for information retrieval applications. In this ca The MANICURE Document Processing System Kazem Taghva, Allen Condit, Julie Borsack, John Kilburg, Changshi Wu, and Je Gilbreth Information Science Research Institute University of Nevada, Las Vegas ABSTRACT

More information

Adaptable and Adaptive Web Information Systems. Lecture 1: Introduction

Adaptable and Adaptive Web Information Systems. Lecture 1: Introduction Adaptable and Adaptive Web Information Systems School of Computer Science and Information Systems Birkbeck College University of London Lecture 1: Introduction George Magoulas gmagoulas@dcs.bbk.ac.uk October

More information

INFORMATION RETRIEVAL SYSTEM: CONCEPT AND SCOPE

INFORMATION RETRIEVAL SYSTEM: CONCEPT AND SCOPE 15 : CONCEPT AND SCOPE 15.1 INTRODUCTION Information is communicated or received knowledge concerning a particular fact or circumstance. Retrieval refers to searching through stored information to find

More information

2 Data Reduction Techniques The granularity of reducible information is one of the main criteria for classifying the reduction techniques. While the t

2 Data Reduction Techniques The granularity of reducible information is one of the main criteria for classifying the reduction techniques. While the t Data Reduction - an Adaptation Technique for Mobile Environments A. Heuer, A. Lubinski Computer Science Dept., University of Rostock, Germany Keywords. Reduction. Mobile Database Systems, Data Abstract.

More information

Transactions on Information and Communications Technologies vol 7, 1994 WIT Press, ISSN

Transactions on Information and Communications Technologies vol 7, 1994 WIT Press,   ISSN Teaching object oriented techniques with IR_Framework: a class library for information retrieval S. Wade, P. Braekevelt School of Computing and Mathematics, University of Huddersfield, UK Abstract This

More information

A Top-Down Visual Approach to GUI development

A Top-Down Visual Approach to GUI development A Top-Down Visual Approach to GUI development ROSANNA CASSINO, GENNY TORTORA, MAURIZIO TUCCI, GIULIANA VITIELLO Dipartimento di Matematica e Informatica Università di Salerno Via Ponte don Melillo 84084

More information

An Adaptive Agent for Web Exploration Based on Concept Hierarchies

An Adaptive Agent for Web Exploration Based on Concept Hierarchies An Adaptive Agent for Web Exploration Based on Concept Hierarchies Scott Parent, Bamshad Mobasher, Steve Lytinen School of Computer Science, Telecommunication and Information Systems DePaul University

More information

A Model for Information Retrieval Agent System Based on Keywords Distribution

A Model for Information Retrieval Agent System Based on Keywords Distribution A Model for Information Retrieval Agent System Based on Keywords Distribution Jae-Woo LEE Dept of Computer Science, Kyungbok College, 3, Sinpyeong-ri, Pocheon-si, 487-77, Gyeonggi-do, Korea It2c@koreaackr

More information

Making Retrieval Faster Through Document Clustering

Making Retrieval Faster Through Document Clustering R E S E A R C H R E P O R T I D I A P Making Retrieval Faster Through Document Clustering David Grangier 1 Alessandro Vinciarelli 2 IDIAP RR 04-02 January 23, 2004 D a l l e M o l l e I n s t i t u t e

More information

Video Representation. Video Analysis

Video Representation. Video Analysis BROWSING AND RETRIEVING VIDEO CONTENT IN A UNIFIED FRAMEWORK Yong Rui, Thomas S. Huang and Sharad Mehrotra Beckman Institute for Advanced Science and Technology University of Illinois at Urbana-Champaign

More information

Generalized Document Data Model for Integrating Autonomous Applications

Generalized Document Data Model for Integrating Autonomous Applications 6 th International Conference on Applied Informatics Eger, Hungary, January 27 31, 2004. Generalized Document Data Model for Integrating Autonomous Applications Zsolt Hernáth, Zoltán Vincellér Abstract

More information

A RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH

A RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH A RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH A thesis Submitted to the faculty of the graduate school of the University of Minnesota by Vamshi Krishna Thotempudi In partial fulfillment of the requirements

More information

Improving Adaptive Hypermedia by Adding Semantics

Improving Adaptive Hypermedia by Adding Semantics Improving Adaptive Hypermedia by Adding Semantics Anton ANDREJKO Slovak University of Technology Faculty of Informatics and Information Technologies Ilkovičova 3, 842 16 Bratislava, Slovak republic andrejko@fiit.stuba.sk

More information

Using XML Logical Structure to Retrieve (Multimedia) Objects

Using XML Logical Structure to Retrieve (Multimedia) Objects Using XML Logical Structure to Retrieve (Multimedia) Objects Zhigang Kong and Mounia Lalmas Queen Mary, University of London {cskzg,mounia}@dcs.qmul.ac.uk Abstract. This paper investigates the use of the

More information

An Architecture to Share Metadata among Geographically Distributed Archives

An Architecture to Share Metadata among Geographically Distributed Archives An Architecture to Share Metadata among Geographically Distributed Archives Maristella Agosti, Nicola Ferro, and Gianmaria Silvello Department of Information Engineering, University of Padua, Italy {agosti,

More information

A Semi-automatic Support to Adapt E-Documents in an Accessible and Usable Format for Vision Impaired Users

A Semi-automatic Support to Adapt E-Documents in an Accessible and Usable Format for Vision Impaired Users A Semi-automatic Support to Adapt E-Documents in an Accessible and Usable Format for Vision Impaired Users Elia Contini, Barbara Leporini, and Fabio Paternò ISTI-CNR, Pisa, Italy {elia.contini,barbara.leporini,fabio.paterno}@isti.cnr.it

More information

Universita degli Studi di Roma Tre. Dipartimento di Informatica e Automazione. Design and Maintenance of. Data-Intensive Web Sites

Universita degli Studi di Roma Tre. Dipartimento di Informatica e Automazione. Design and Maintenance of. Data-Intensive Web Sites Universita degli Studi di Roma Tre Dipartimento di Informatica e Automazione Via della Vasca Navale, 84 { 00146 Roma, Italy. Design and Maintenance of Data-Intensive Web Sites Paolo Atzeni y, Giansalvatore

More information

In the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google,

In the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google, 1 1.1 Introduction In the recent past, the World Wide Web has been witnessing an explosive growth. All the leading web search engines, namely, Google, Yahoo, Askjeeves, etc. are vying with each other to

More information

ARCHITECTURE AND IMPLEMENTATION OF A NEW USER INTERFACE FOR INTERNET SEARCH ENGINES

ARCHITECTURE AND IMPLEMENTATION OF A NEW USER INTERFACE FOR INTERNET SEARCH ENGINES ARCHITECTURE AND IMPLEMENTATION OF A NEW USER INTERFACE FOR INTERNET SEARCH ENGINES Fidel Cacheda, Alberto Pan, Lucía Ardao, Angel Viña Department of Tecnoloxías da Información e as Comunicacións, Facultad

More information

VIDEO SEARCHING AND BROWSING USING VIEWFINDER

VIDEO SEARCHING AND BROWSING USING VIEWFINDER VIDEO SEARCHING AND BROWSING USING VIEWFINDER By Dan E. Albertson Dr. Javed Mostafa John Fieber Ph. D. Student Associate Professor Ph. D. Candidate Information Science Information Science Information Science

More information

Semantic-Based Information Retrieval for Java Learning Management System

Semantic-Based Information Retrieval for Java Learning Management System AENSI Journals Australian Journal of Basic and Applied Sciences Journal home page: www.ajbasweb.com Semantic-Based Information Retrieval for Java Learning Management System Nurul Shahida Tukiman and Amirah

More information

Chapter 27 Introduction to Information Retrieval and Web Search

Chapter 27 Introduction to Information Retrieval and Web Search Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval

More information

ATLAS.ti 6 Distinguishing features and functions

ATLAS.ti 6 Distinguishing features and functions ATLAS.ti 6 Distinguishing features and functions This document is intended to be read in conjunction with the Choosing a CAQDAS Package Working Paper which provides a more general commentary of common

More information

MULTIMEDIA TECHNOLOGIES FOR THE USE OF INTERPRETERS AND TRANSLATORS. By Angela Carabelli SSLMIT, Trieste

MULTIMEDIA TECHNOLOGIES FOR THE USE OF INTERPRETERS AND TRANSLATORS. By Angela Carabelli SSLMIT, Trieste MULTIMEDIA TECHNOLOGIES FOR THE USE OF INTERPRETERS AND TRANSLATORS By SSLMIT, Trieste The availability of teaching materials for training interpreters and translators has always been an issue of unquestionable

More information

Theme Identification in RDF Graphs

Theme Identification in RDF Graphs Theme Identification in RDF Graphs Hanane Ouksili PRiSM, Univ. Versailles St Quentin, UMR CNRS 8144, Versailles France hanane.ouksili@prism.uvsq.fr Abstract. An increasing number of RDF datasets is published

More information

International Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE) Volume 1, Issue 2, July 2014.

International Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE) Volume 1, Issue 2, July 2014. A B S T R A C T International Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE) Information Retrieval Models and Searching Methodologies: Survey Balwinder Saini*,Vikram Singh,Satish

More information

International ejournals

International ejournals Available online at www.internationalejournals.com International ejournals ISSN 0976 1411 International ejournal of Mathematics and Engineering 112 (2011) 1023-1029 ANALYZING THE REQUIREMENTS FOR TEXT

More information

Text Mining. Representation of Text Documents

Text Mining. Representation of Text Documents Data Mining is typically concerned with the detection of patterns in numeric data, but very often important (e.g., critical to business) information is stored in the form of text. Unlike numeric data,

More information

BUILDING A CONCEPTUAL MODEL OF THE WORLD WIDE WEB FOR VISUALLY IMPAIRED USERS

BUILDING A CONCEPTUAL MODEL OF THE WORLD WIDE WEB FOR VISUALLY IMPAIRED USERS 1 of 7 17/01/2007 10:39 BUILDING A CONCEPTUAL MODEL OF THE WORLD WIDE WEB FOR VISUALLY IMPAIRED USERS Mary Zajicek and Chris Powell School of Computing and Mathematical Sciences Oxford Brookes University,

More information

Graph-based Automatic Suggestion of Relationships among Images of Illuminated Manuscripts

Graph-based Automatic Suggestion of Relationships among Images of Illuminated Manuscripts Graph-based Automatic Suggestion of Relationships among Images of Illuminated Manuscripts ABSTRACT Maristella Agosti agosti@dei.unipd.it Nicola Ferro ferro@dei.unipd.it Department of Information Engineering,

More information

highest cosine coecient [5] are returned. Notice that a query can hit documents without having common terms because the k indexing dimensions indicate

highest cosine coecient [5] are returned. Notice that a query can hit documents without having common terms because the k indexing dimensions indicate Searching Information Servers Based on Customized Proles Technical Report USC-CS-96-636 Shih-Hao Li and Peter B. Danzig Computer Science Department University of Southern California Los Angeles, California

More information

THE FACT-SHEET: A NEW LOOK FOR SLEUTH S SEARCH ENGINE. Colleen DeJong CS851--Information Retrieval December 13, 1996

THE FACT-SHEET: A NEW LOOK FOR SLEUTH S SEARCH ENGINE. Colleen DeJong CS851--Information Retrieval December 13, 1996 THE FACT-SHEET: A NEW LOOK FOR SLEUTH S SEARCH ENGINE Colleen DeJong CS851--Information Retrieval December 13, 1996 Table of Contents 1 Introduction.........................................................

More information

A Tagging Approach to Ontology Mapping

A Tagging Approach to Ontology Mapping A Tagging Approach to Ontology Mapping Colm Conroy 1, Declan O'Sullivan 1, Dave Lewis 1 1 Knowledge and Data Engineering Group, Trinity College Dublin {coconroy,declan.osullivan,dave.lewis}@cs.tcd.ie Abstract.

More information

Designing a System Engineering Environment in a structured way

Designing a System Engineering Environment in a structured way Designing a System Engineering Environment in a structured way Anna Todino Ivo Viglietti Bruno Tranchero Leonardo-Finmeccanica Aircraft Division Torino, Italy Copyright held by the authors. Rubén de Juan

More information

In both systems the knowledge of certain server addresses is required for browsing. In WWW Hyperlinks as the only structuring tool (Robert Cailliau: \

In both systems the knowledge of certain server addresses is required for browsing. In WWW Hyperlinks as the only structuring tool (Robert Cailliau: \ The Hyper-G Information System Klaus Schmaranz (Institute for Information Processing and Computer Supported New Media (IICM), Graz University of Technology, Austria kschmar@iicm.tu-graz.ac.at) June 2,

More information

I&R SYSTEMS ON THE INTERNET/INTRANET CITES AS THE TOOL FOR DISTANCE LEARNING. Andrii Donchenko

I&R SYSTEMS ON THE INTERNET/INTRANET CITES AS THE TOOL FOR DISTANCE LEARNING. Andrii Donchenko International Journal "Information Technologies and Knowledge" Vol.1 / 2007 293 I&R SYSTEMS ON THE INTERNET/INTRANET CITES AS THE TOOL FOR DISTANCE LEARNING Andrii Donchenko Abstract: This article considers

More information

Fausto Giunchiglia and Mattia Fumagalli

Fausto Giunchiglia and Mattia Fumagalli DISI - Via Sommarive 5-38123 Povo - Trento (Italy) http://disi.unitn.it FROM ER MODELS TO THE ENTITY MODEL Fausto Giunchiglia and Mattia Fumagalli Date (2014-October) Technical Report # DISI-14-014 From

More information

3.4 Data-Centric workflow

3.4 Data-Centric workflow 3.4 Data-Centric workflow One of the most important activities in a S-DWH environment is represented by data integration of different and heterogeneous sources. The process of extract, transform, and load

More information

Link Recommendation Method Based on Web Content and Usage Mining

Link Recommendation Method Based on Web Content and Usage Mining Link Recommendation Method Based on Web Content and Usage Mining Przemys law Kazienko and Maciej Kiewra Wroc law University of Technology, Wyb. Wyspiańskiego 27, Wroc law, Poland, kazienko@pwr.wroc.pl,

More information

Txt2vz: a new tool for generating graph clouds

Txt2vz: a new tool for generating graph clouds Txt2vz: a new tool for generating graph clouds HIRSCH, L and TIAN, D Available from Sheffield Hallam University Research Archive (SHURA) at: http://shura.shu.ac.uk/6619/

More information

Towards the integration of security patterns in UML Component-based Applications

Towards the integration of security patterns in UML Component-based Applications Towards the integration of security patterns in UML Component-based Applications Anas Motii 1, Brahim Hamid 2, Agnès Lanusse 1, Jean-Michel Bruel 2 1 CEA, LIST, Laboratory of Model Driven Engineering for

More information

TREC-3 Ad Hoc Retrieval and Routing. Experiments using the WIN System. Paul Thompson. Howard Turtle. Bokyung Yang. James Flood

TREC-3 Ad Hoc Retrieval and Routing. Experiments using the WIN System. Paul Thompson. Howard Turtle. Bokyung Yang. James Flood TREC-3 Ad Hoc Retrieval and Routing Experiments using the WIN System Paul Thompson Howard Turtle Bokyung Yang James Flood West Publishing Company Eagan, MN 55123 1 Introduction The WIN retrieval engine

More information

A New Measure of the Cluster Hypothesis

A New Measure of the Cluster Hypothesis A New Measure of the Cluster Hypothesis Mark D. Smucker 1 and James Allan 2 1 Department of Management Sciences University of Waterloo 2 Center for Intelligent Information Retrieval Department of Computer

More information

Using Statistical Properties of Text to Create. Metadata. Computer Science and Electrical Engineering Department

Using Statistical Properties of Text to Create. Metadata. Computer Science and Electrical Engineering Department Using Statistical Properties of Text to Create Metadata Grace Crowder crowder@cs.umbc.edu Charles Nicholas nicholas@cs.umbc.edu Computer Science and Electrical Engineering Department University of Maryland

More information

Publishing Model for Web Applications: A User-Centered Approach

Publishing Model for Web Applications: A User-Centered Approach 226 Paiano, Mangia & Perrone Chapter XII Publishing Model for Web Applications: A User-Centered Approach Roberto Paiano University of Lecce, Italy Leonardo Mangia University of Lecce, Italy Vito Perrone

More information

TREC 2016 Dynamic Domain Track: Exploiting Passage Representation for Retrieval and Relevance Feedback

TREC 2016 Dynamic Domain Track: Exploiting Passage Representation for Retrieval and Relevance Feedback RMIT @ TREC 2016 Dynamic Domain Track: Exploiting Passage Representation for Retrieval and Relevance Feedback Ameer Albahem ameer.albahem@rmit.edu.au Lawrence Cavedon lawrence.cavedon@rmit.edu.au Damiano

More information

Information Retrieval (Part 1)

Information Retrieval (Part 1) Information Retrieval (Part 1) Fabio Aiolli http://www.math.unipd.it/~aiolli Dipartimento di Matematica Università di Padova Anno Accademico 2008/2009 1 Bibliographic References Copies of slides Selected

More information

Routing and Ad-hoc Retrieval with the. Nikolaus Walczuch, Norbert Fuhr, Michael Pollmann, Birgit Sievers. University of Dortmund, Germany.

Routing and Ad-hoc Retrieval with the. Nikolaus Walczuch, Norbert Fuhr, Michael Pollmann, Birgit Sievers. University of Dortmund, Germany. Routing and Ad-hoc Retrieval with the TREC-3 Collection in a Distributed Loosely Federated Environment Nikolaus Walczuch, Norbert Fuhr, Michael Pollmann, Birgit Sievers University of Dortmund, Germany

More information

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent

More information

A World Wide Web-based HCI-library Designed for Interaction Studies

A World Wide Web-based HCI-library Designed for Interaction Studies A World Wide Web-based HCI-library Designed for Interaction Studies Ketil Perstrup, Erik Frøkjær, Maria Konstantinovitz, Thorbjørn Konstantinovitz, Flemming S. Sørensen, Jytte Varming Department of Computing,

More information

HyperFrame - A Framework for Hypermedia Authoring

HyperFrame - A Framework for Hypermedia Authoring HyperFrame - A Framework for Hypermedia Authoring S. Crespo, M. F. Fontoura, C. J. P. Lucena, D. Schwabe Pontificia Universidade Católica do Rio de Janeiro - Departamento de Informática Universidade do

More information

Java4350: Form Processing with JSP

Java4350: Form Processing with JSP OpenStax-CNX module: m48085 1 Java4350: Form Processing with JSP R.L. Martinez, PhD This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 Abstract This module

More information

Ontology Extraction from Heterogeneous Documents

Ontology Extraction from Heterogeneous Documents Vol.3, Issue.2, March-April. 2013 pp-985-989 ISSN: 2249-6645 Ontology Extraction from Heterogeneous Documents Kirankumar Kataraki, 1 Sumana M 2 1 IV sem M.Tech/ Department of Information Science & Engg

More information

The Utrecht Blend: Basic Ingredients for an XML Retrieval System

The Utrecht Blend: Basic Ingredients for an XML Retrieval System The Utrecht Blend: Basic Ingredients for an XML Retrieval System Roelof van Zwol Centre for Content and Knowledge Engineering Utrecht University Utrecht, the Netherlands roelof@cs.uu.nl Virginia Dignum

More information

The ToCAI Description Scheme for Indexing and Retrieval of Multimedia Documents 1

The ToCAI Description Scheme for Indexing and Retrieval of Multimedia Documents 1 The ToCAI Description Scheme for Indexing and Retrieval of Multimedia Documents 1 N. Adami, A. Bugatti, A. Corghi, R. Leonardi, P. Migliorati, Lorenzo A. Rossi, C. Saraceno 2 Department of Electronics

More information

Using Uncertainty in Information Retrieval

Using Uncertainty in Information Retrieval Using Uncertainty in Information Retrieval Adrian GIURCA Abstract The use of logic in Information Retrieval (IR) enables one to formulate models that are more general than other well known IR models. Indeed,

More information

Digital Archives: Extending the 5S model through NESTOR

Digital Archives: Extending the 5S model through NESTOR Digital Archives: Extending the 5S model through NESTOR Nicola Ferro and Gianmaria Silvello Department of Information Engineering, University of Padua, Italy {ferro, silvello}@dei.unipd.it Abstract. Archives

More information

Motivating Ontology-Driven Information Extraction

Motivating Ontology-Driven Information Extraction Motivating Ontology-Driven Information Extraction Burcu Yildiz 1 and Silvia Miksch 1, 2 1 Institute for Software Engineering and Interactive Systems, Vienna University of Technology, Vienna, Austria {yildiz,silvia}@

More information

A probabilistic description-oriented approach for categorising Web documents

A probabilistic description-oriented approach for categorising Web documents A probabilistic description-oriented approach for categorising Web documents Norbert Gövert Mounia Lalmas Norbert Fuhr University of Dortmund {goevert,mounia,fuhr}@ls6.cs.uni-dortmund.de Abstract The automatic

More information

A Graphical User Interface for Structured Document Retrieval

A Graphical User Interface for Structured Document Retrieval A Graphical User Interface for Structured Document Retrieval Jesús Vegas 1, Pablo de la Fuente 1, and Fabio Crestani 2 1 Dpto. Informática Universidad de Valladolid Valladolid, Spain jvegas@infor.uva.es

More information

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN ,

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN , An Integrated Neural IR System. Victoria J. Hodge Dept. of Computer Science, University ofyork, UK vicky@cs.york.ac.uk Jim Austin Dept. of Computer Science, University ofyork, UK austin@cs.york.ac.uk Abstract.

More information

EVALUATION OF SEARCHER PERFORMANCE IN DIGITAL LIBRARIES

EVALUATION OF SEARCHER PERFORMANCE IN DIGITAL LIBRARIES DEFINING SEARCH SUCCESS: EVALUATION OF SEARCHER PERFORMANCE IN DIGITAL LIBRARIES by Barbara M. Wildemuth Associate Professor, School of Information and Library Science University of North Carolina at Chapel

More information

CHAPTER THREE INFORMATION RETRIEVAL SYSTEM

CHAPTER THREE INFORMATION RETRIEVAL SYSTEM CHAPTER THREE INFORMATION RETRIEVAL SYSTEM 3.1 INTRODUCTION Search engine is one of the most effective and prominent method to find information online. It has become an essential part of life for almost

More information

indexing and query processing. The inverted le was constructed for the retrieval target collection which contains full texts of two years' Japanese pa

indexing and query processing. The inverted le was constructed for the retrieval target collection which contains full texts of two years' Japanese pa Term Distillation in Patent Retrieval Hideo Itoh Hiroko Mano Yasushi Ogawa Software R&D Group, RICOH Co., Ltd. 1-1-17 Koishikawa, Bunkyo-ku, Tokyo 112-0002, JAPAN fhideo,mano,yogawag@src.ricoh.co.jp Abstract

More information

Modeling Systems Using Design Patterns

Modeling Systems Using Design Patterns Modeling Systems Using Design Patterns Jaroslav JAKUBÍK Slovak University of Technology Faculty of Informatics and Information Technologies Ilkovičova 3, 842 16 Bratislava, Slovakia jakubik@fiit.stuba.sk

More information

Evaluation and Design Issues of Nordic DC Metadata Creation Tool

Evaluation and Design Issues of Nordic DC Metadata Creation Tool Evaluation and Design Issues of Nordic DC Metadata Creation Tool Preben Hansen SICS Swedish Institute of computer Science Box 1264, SE-164 29 Kista, Sweden preben@sics.se Abstract This paper presents results

More information

Processing Structural Constraints

Processing Structural Constraints SYNONYMS None Processing Structural Constraints Andrew Trotman Department of Computer Science University of Otago Dunedin New Zealand DEFINITION When searching unstructured plain-text the user is limited

More information

DesignMinders: A Design Knowledge Collaboration Approach

DesignMinders: A Design Knowledge Collaboration Approach DesignMinders: A Design Knowledge Collaboration Approach Gerald Bortis and André van der Hoek University of California, Irvine Department of Informatics Irvine, CA 92697-3440 {gbortis, andre}@ics.uci.edu

More information

CANDIDATE LINK GENERATION USING SEMANTIC PHEROMONE SWARM

CANDIDATE LINK GENERATION USING SEMANTIC PHEROMONE SWARM CANDIDATE LINK GENERATION USING SEMANTIC PHEROMONE SWARM Ms.Susan Geethu.D.K 1, Ms. R.Subha 2, Dr.S.Palaniswami 3 1, 2 Assistant Professor 1,2 Department of Computer Science and Engineering, Sri Krishna

More information

A taxonomy of race. D. P. Helmbold, C. E. McDowell. September 28, University of California, Santa Cruz. Santa Cruz, CA

A taxonomy of race. D. P. Helmbold, C. E. McDowell. September 28, University of California, Santa Cruz. Santa Cruz, CA A taxonomy of race conditions. D. P. Helmbold, C. E. McDowell UCSC-CRL-94-34 September 28, 1994 Board of Studies in Computer and Information Sciences University of California, Santa Cruz Santa Cruz, CA

More information

A modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems

A modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems A modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems Anestis Gkanogiannis and Theodore Kalamboukis Department of Informatics Athens University

More information

ResPubliQA 2010

ResPubliQA 2010 SZTAKI @ ResPubliQA 2010 David Mark Nemeskey Computer and Automation Research Institute, Hungarian Academy of Sciences, Budapest, Hungary (SZTAKI) Abstract. This paper summarizes the results of our first

More information

Computational Electronic Mail And Its Application In Library Automation

Computational Electronic Mail And Its Application In Library Automation Computational electronic mail and its application in library automation Foo, S. (1997). Proc. of Joint Pacific Asian Conference on Expert Systems/Singapore International Conference on Intelligent Systems

More information

PROJECT PERIODIC REPORT

PROJECT PERIODIC REPORT PROJECT PERIODIC REPORT Grant Agreement number: 257403 Project acronym: CUBIST Project title: Combining and Uniting Business Intelligence and Semantic Technologies Funding Scheme: STREP Date of latest

More information

Mymory: Enhancing a Semantic Wiki with Context Annotations

Mymory: Enhancing a Semantic Wiki with Context Annotations Mymory: Enhancing a Semantic Wiki with Context Annotations Malte Kiesel, Sven Schwarz, Ludger van Elst, and Georg Buscher Knowledge Management Department German Research Center for Artificial Intelligence

More information

Inference Networks for Document Retrieval. A Dissertation Presented. Howard Robert Turtle. Submitted to the Graduate School of the

Inference Networks for Document Retrieval. A Dissertation Presented. Howard Robert Turtle. Submitted to the Graduate School of the Inference Networks for Document Retrieval A Dissertation Presented by Howard Robert Turtle Submitted to the Graduate School of the University of Massachusetts in partial fulllment of the requirements for

More information

Formulating XML-IR Queries

Formulating XML-IR Queries Alan Woodley Faculty of Information Technology, Queensland University of Technology PO Box 2434. Brisbane Q 4001, Australia ap.woodley@student.qut.edu.au Abstract: XML information retrieval systems differ

More information

Image Access and Data Mining: An Approach

Image Access and Data Mining: An Approach Image Access and Data Mining: An Approach Chabane Djeraba IRIN, Ecole Polythechnique de l Université de Nantes, 2 rue de la Houssinière, BP 92208-44322 Nantes Cedex 3, France djeraba@irin.univ-nantes.fr

More information

A model of information searching behaviour to facilitate end-user support in KOS-enhanced systems

A model of information searching behaviour to facilitate end-user support in KOS-enhanced systems A model of information searching behaviour to facilitate end-user support in KOS-enhanced systems Dorothee Blocks Hypermedia Research Unit School of Computing University of Glamorgan, UK NKOS workshop

More information

Interrogation System Architecture of Heterogeneous Data for Decision Making

Interrogation System Architecture of Heterogeneous Data for Decision Making Interrogation System Architecture of Heterogeneous Data for Decision Making Cécile Nicolle, Youssef Amghar, Jean-Marie Pinon Laboratoire d'ingénierie des Systèmes d'information INSA de Lyon Abstract Decision

More information

Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines

Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines Zhou B. B., Brent R. P. and Tridgell A. y Computer Sciences Laboratory The Australian National University Canberra,

More information

SYSTEMS FOR NON STRUCTURED INFORMATION MANAGEMENT

SYSTEMS FOR NON STRUCTURED INFORMATION MANAGEMENT SYSTEMS FOR NON STRUCTURED INFORMATION MANAGEMENT Prof. Dipartimento di Elettronica e Informazione Politecnico di Milano INFORMATION SEARCH AND RETRIEVAL Inf. retrieval 1 PRESENTATION SCHEMA GOALS AND

More information

Information Retrieval

Information Retrieval Information Retrieval CSC 375, Fall 2016 An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have

More information

Domain Specific Search Engine for Students

Domain Specific Search Engine for Students Domain Specific Search Engine for Students Domain Specific Search Engine for Students Wai Yuen Tang The Department of Computer Science City University of Hong Kong, Hong Kong wytang@cs.cityu.edu.hk Lam

More information

Using Attribute Grammars to Uniformly Represent Structured Documents - Application to Information Retrieval

Using Attribute Grammars to Uniformly Represent Structured Documents - Application to Information Retrieval Using Attribute Grammars to Uniformly Represent Structured Documents - Application to Information Retrieval Alda Lopes Gançarski Pierre et Marie Curie University, Laboratoire d Informatique de Paris 6,

More information

Enhancing Internet Search Engines to Achieve Concept-based Retrieval

Enhancing Internet Search Engines to Achieve Concept-based Retrieval Enhancing Internet Search Engines to Achieve Concept-based Retrieval Fenghua Lu 1, Thomas Johnsten 2, Vijay Raghavan 1 and Dennis Traylor 3 1 Center for Advanced Computer Studies University of Southwestern

More information

A User Study on Features Supporting Subjective Relevance for Information Retrieval Interfaces

A User Study on Features Supporting Subjective Relevance for Information Retrieval Interfaces A user study on features supporting subjective relevance for information retrieval interfaces Lee, S.S., Theng, Y.L, Goh, H.L.D., & Foo, S. (2006). Proc. 9th International Conference of Asian Digital Libraries

More information

ANIMATION OF ALGORITHMS ON GRAPHS

ANIMATION OF ALGORITHMS ON GRAPHS Master Informatique 1 ère année 2008 2009 MASTER 1 ENGLISH REPORT YEAR 2008 2009 ANIMATION OF ALGORITHMS ON GRAPHS AUTHORS : TUTOR : MICKAEL PONTON FREDERIC SPADE JEAN MARC NICOD ABSTRACT Among the units

More information

Server 1 Server 2 CPU. mem I/O. allocate rec n read elem. n*47.0. n*20.0. select. n*1.0. write elem. n*26.5 send. n*

Server 1 Server 2 CPU. mem I/O. allocate rec n read elem. n*47.0. n*20.0. select. n*1.0. write elem. n*26.5 send. n* Information Needs in Performance Analysis of Telecommunication Software a Case Study Vesa Hirvisalo Esko Nuutila Helsinki University of Technology Laboratory of Information Processing Science Otakaari

More information

Automatic Query Type Identification Based on Click Through Information

Automatic Query Type Identification Based on Click Through Information Automatic Query Type Identification Based on Click Through Information Yiqun Liu 1,MinZhang 1,LiyunRu 2, and Shaoping Ma 1 1 State Key Lab of Intelligent Tech. & Sys., Tsinghua University, Beijing, China

More information

Information Retrieval and Web Search

Information Retrieval and Web Search Information Retrieval and Web Search Introduction to IR models and methods Rada Mihalcea (Some of the slides in this slide set come from IR courses taught at UT Austin and Stanford) Information Retrieval

More information

A QUERY BY EXAMPLE APPROACH FOR XML QUERYING

A QUERY BY EXAMPLE APPROACH FOR XML QUERYING A QUERY BY EXAMPLE APPROACH FOR XML QUERYING Flávio Xavier Ferreira, Daniela da Cruz, Pedro Rangel Henriques Department of Informatics, University of Minho Campus de Gualtar, 4710-057 Braga, Portugal flavioxavier@di.uminho.pt,

More information