This paper studies methods to enhance cross-language retrieval of domain-specific

Size: px
Start display at page:

Download "This paper studies methods to enhance cross-language retrieval of domain-specific"

Transcription

1 Keith A. Gatlin. Enhancing Cross-Language Retrieval of Comparable Corpora Through Thesaurus-Based Translation and Citation Indexing. A master s paper for the M.S. in I.S. degree. April, pages. Advisor: Robert Losee. This paper studies methods to enhance cross-language retrieval of domain-specific documents. English- and German-language comparable corpora are used as the subject of the study. A multilingual thesaurus is developed to facilitate query translation, and reference citations are indexed to provide a language-neutral method to retrieve documents. These new retrieval methods are tested against actual user queries to measure the improvement of retrieval quality over an existing Boolean system. Experimental results suggest that a manually produced thesaurus can greatly increase the recall of documents, while the use of the citation index leads to high precision retrieval when compared to a standard Boolean system without these enhancements. Both methods provide cross-language retrieval of documents given monolingual search terms, thus automatically expanding the scope of a user s query. Headings: Information retrieval cross-language Information retrieval comparable corpora Thesaurus compilation Citation indexing

2 ENHANCING CROSS-LANGUAGE RETRIEVAL OF COMPARABLE CORPORA THROUGH THESAURUS-BASED TRANSLATION AND CITATION INDEXING by Keith A. Gatlin A Master s paper submitted to the faculty of the School of Information and Library Science of the University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the degree of Master of Science in Information Science. Chapel Hill, North Carolina April 2005 Approved by Robert Losee

3 1 TABLE OF CONTENTS INTRODUCTION... 2 BACKGROUND... 4 THESAURUS CREATION AND EVALUATION... 7 THESAURUS IMPLEMENTATION REFERENCE CITATION INDEXING METHODOLOGY RESULTS DISCUSSION REFERENCES... 19

4 2 INTRODUCTION The problem of cross-language retrieval is one that has become increasingly important as the volume and availability of machine-readable source data has increased. We now have vast amounts of computer-readable text in multiple languages. To provide convenient access to this information, a retrieval system must be able to return documents in one or more target languages for any query. In this paper, I will examine methods to enhance the cross-language retrieval of text from domain-specific English and German corpora. The focus of this study is on increasing retrieval effectiveness when English-language queries are used to retrieve documents from both corpora. As such, the target user of this system is a person who is most comfortable phrasing queries in English, but has enough knowledge of German to interpret the German-language results. The corpora examined in this study are composed of 90,000 English and German catalog entries extracted from online sources. The catalog entries are domain specific and represent comparable corpora. Thus, they share the same concepts and ideas, but documents in one corpus are not direct translations of those in the other. Without the help of a cross-language retrieval system, users must take one of two approaches to retrieving documents from the corpora: 1) Phrase the search query using only language-neutral terms so it will match documents in both languages, or 2) Express the query using terms in more than one language. Both methods are unsatisfactory. The first method will almost never achieve perfect recall, as documents are rarely so similar to allow for such a onesize-fits-all query. The second method, on the other hand, can achieve very high recall, but it assumes the user is very familiar with both languages (and the domain-specific terms within each) an unlikely scenario. Existing search log evidence suggests that

5 3 most users employ the first retrieval method. Therefore, we can assume that users are not retrieving all relevant documents in the corpora. Another barrier to retrieval is the nature of information representation in the corpora. Most catalog entries include citations to major reference works in the body of the text. Because of the domain-specific nature of the corpora, these citations can be crucial to locating relevant documents. Users will often know which reference number they are seeking when they perform a search. However, because the documents are captured from various sources, they do not use a consistent citation format for these references. Therefore, users can never be sure which format to use for their query even if they know exactly which reference they are seeking. The problem this paper attempts to solve is how to create a bridge between the documents to join similar concepts, terms, and reference citations. The methods I propose to achieve this are twofold. First, a multilanguage thesaurus can be used to translate English query terms into their German counterparts. This translation will automatically expand a user s query to include German terms. Second, the language-neutral reference citations can be identified and indexed separately from the textual portion of the catalog entry. This will allow users to input reference queries in a uniform format to be retrieved separately from the document text without the risk of mismatches.

6 4 BACKGROUND One of the most troublesome areas of cross-language IR and one that has surfaced many times in previous studies is its need for advanced disambiguation of terms (Rogati and Yang, 2004). For a system to be successful, we must be able to specify the correct translation for a term having more than one meaning. This sentiment is further reinforced by Ballesteros and Croft (1997), who link the success of cross-language retrieval with its ability to resolve term ambiguity. This ambiguity can be at least partially avoided if we can make assumptions about the contents of the corpora. If the corpora are focused on the same, narrow domain as they are in this study then term meanings are unlikely to vary with context. In this situation, a thesaurus can serve as a controlled vocabulary, specifying the synonyms, translations, and preferred variants for a standard set of commonly encountered terms. With the thesaurus as a base, terms are constrained to their domain-specific meanings, thus avoiding the problem of ambiguity. Multilingual thesauri have been used extensively in previous cross-language IR research. Most experiments have used thesauri as a method to translate query terms on the fly, thus expanding the query to include one or more target languages (Ballesteros and Croft, 1996). The foundation of this method is automatic query expansion, a technique long used in IR systems to add relevant terms to a query (Qiu, 1993). When terms are added onto an existing query, we would expect that the number of documents returned to increase. This reasoning has been applied to multilingual retrieval, where query expansion is used to append translations of query terms to the user s original input (Han, 1994). This query expansion method operates separately on individual terms and does not provide a context-sensitive translation of the query. Research has shown that such

7 5 dictionary-based translations tend to work better for short, targeted queries rather than long ones (Oard, 1998). This constraint is not a problem in the context of this study, because the documents themselves are very short (averaging about 100 terms), and users queries are unlikely to contain long, complex ideas. The performance of a thesaurus-based cross-language IR system depends most heavily on the coverage and accuracy of the underlying thesaurus. The UMLS metathesaurus has been successfully applied as an automated method for cross-language retrieval (Eichmann, Ruiz, and Srinivasan, 1998). However, such high quality thesauri are not always readily available for a given application. In this case, the thesaurus must be custom-built. Soergel (1997) provides a framework for building multilingual thesauri. He emphasizes the user-centered approach to indexing, in which actual user queries and interests are used as a basis for thesaurus construction. Sager (1990) advocates a similar approach to term compilation and stresses the importance of high quality translations that are reversible (that is, approachable from either language). To quickly produce a thesaurus based on available corpora, we might consider automated methods. Attempts to automatically construct monolingual thesauri have met with some success (Jing and Croft, 1994). However, the creation of a multilingual thesaurus is much more difficult, especially when we need to ensure that the thesaurus contains a domain-specific, controlled vocabulary in two languages. Although some parts of the thesaurus construction process can be automated (i.e., term compilation), the actual translation and evaluation process requires manual effort by someone with knowledge of the domain (Soergel, 1997). We can therefore expect that the thesaurus construction process will not be a straightforward, data-driven task. Instead, it will require terminological research and some familiarity with both the English and German terms related to the field. Citation analysis and indexing have long been studied as methods to find relationships among documents (Small, 1973). Garfield (1979) suggested bibliographic

8 6 citations as a basis for retrieval. Naturally, citation indexing can be expanded to include cross-language applications as well assuming that the same citations are used among documents in different languages. The biggest challenge in citation indexing is the parsing of the citations, which may appear differently depending on their context and source. Lawrence et al. (1999) discusses this non-trivial problem in the context of autonomously identifying citations in Web-based articles. One of the techniques suggested is the development of heuristics based on regular expressions to handle the variations in citation styles. This is a relatively standard technique that can be augmented by term frequency analysis and lists of commonly encountered citation components (authors, journal titles, etc.). Once a thesaurus and citation index have been built, they must be integrated into a retrieval system. One method to achieve this is by mapping Boolean operators to SQL queries to retrieve documents from a relational database. Grossman et al. (1997) discuss the advantages of using SQL queries to retrieve both unstructured and structured data. In this study, we can use a similar method to convert user queries for both unstructured text and structured reference citations into SQL queries. This method allows us to support Boolean operators and easily integrate structured information into user searches.

9 7 THESAURUS CREATION AND EVALUATION The most important quality of a multilingual thesaurus is that it includes all concepts relevant to a domain as they exist in each of the languages (Soergel, 1997). For this experiment, our intent is to improve retrieval within a relatively limited domain. Therefore, we do not have to capture every concept just the most important ones. In addition, we know that concepts in German are not always expressed in lexically similar ways as the same concepts in English. This is a major concern for the creators of general thesauri. However, our application is targeted at comparable corpora, where the concepts expressed in one corpus have reliable and discernable counterparts in the other corpus. Users of the system are expected to know the vocabulary of the domain, (i.e., a controlled vocabulary), so this assumption greatly decreases the possible scope of the thesaurus. Construction of a multilingual thesaurus typically begins with an analysis of search requests, common document terms, and other thesauri (Soergel, 1997). The goal of this process is to create a list of the most important terms related to a particular domain. As a starting point, I concentrated on a frequency analysis of user query terms as extracted from a search log. This ensures that the thesaurus will cover, at the very least, concepts that have been most frequently requested by users. The next step in the process is to group terms by category (i.e., geographical locations, proper names, etc.) and identify synonyms within each language. My term frequency analysis of the search query log uncovered several series of terms relevant to the domain. Many of these terms fit into some framework, or subset, of the domain. For example, series of chronological events, names, and places arise when the terms are grouped by topic. I was able to use both my

10 8 domain knowledge and some existing thesauri and indices to help fill in missing elements of these series. Once a monolingual term list is available, the next step is to translate the terms into their target language (in this case, German). I carried out this task almost entirely by hand, using cross-language dictionaries to help find term translations. The validity of these translations is solely dependent on whether they actually appear in the target corpus. Therefore, we must ensure that any translations of the English terms are present in the German corpus. In addition, they must be used in the same context. Otherwise, the translations will not facilitate retrieval across the corpora. After compiling the thesaurus, I exported it as a text file in a format the retrieval system can read. The thesaurus is arranged according to base terms in English. These are the preferred terms and are most likely to be encountered in an English corpus. Each row of the thesaurus begins with a base term. After the base term, any applicable English synonyms are listed. Next come the German translations, with the preferred translation first followed by any variations (such as spelling variants). The layout of the thesaurus thus allows the search system to match English query terms with thesaurus entries and expand them to include synonyms and German translations. Ultimately, the success of a multilingual thesaurus will be reflected in the performance of the retrieval system into which it is integrated. Because of its query expansion effect, we would expect that a good thesaurus will greatly increase the number of relevant results to any monolingual query. Another good measure of the thesaurus effectiveness is its coverage i.e., what proportion of users queries has a match in the thesaurus. Finally, we can evaluate the accuracy of the thesaural translations to see if they reflect the true translation of a term as it appears in the corpus of the target language. This evaluation step should occur during thesaurus construction, possibly with the help of experts fluent in the target language.

11 9 In summary, I propose three methods of thesaurus evaluation: 1. Measure the number of results returned for a monolingual query both before and after thesaurus implementation 2. Calculate the proportion of user query terms that appear in the thesaurus (and thus can be translated by it) 3. Evaluate thesaurus translations to ensure they are reversible and appear in the target corpus The first evaluation method will occur after the thesaurus is implemented and will be discussed in the Methodology section. The second and third evaluation methods should be undertaken as an ongoing part of the thesaurus compilation process. We would assume that a good thesaurus would cover as many user query terms as possible within the limits of time and expense. In addition, high quality translations should be the basis for thesaurus development. Finally, evaluation must be carried out continuously, so the thesaurus can be updated in response to any changes in user queries or corpus content.

12 10 THESAURUS IMPLEMENTATION The system into which I implemented the thesaurus uses Boolean retrieval. It accepts queries with the AND, NOT, and OR operators. For each term or phrase in the query string, the system does a lookup in the thesaurus. Query expansion occurs only when a term or phrase has an exact match. In this case, the original query term is expanded to include preferred synonyms and translations appearing in the thesaurus. Some research in this area has assigned weights to the terms and phrases used in the query expansion process, but experimentation is required to find optimum values for the weights (Jing, 1994). The system in this experiment is much simpler, using no weighting for the terms. Instead, the query expansion uses the Boolean OR operator to add translated terms or phrases to the user s original query. For example, a user query with the English noun branch would become branch OR zweig after the thesaurus lookup and translation. This Boolean query would then be mapped to an equivalent SQL query to retrieve results from the underlying relational database. For this study, the translation process occurs automatically in the background, expanding user terms to include entries from the thesaurus. The user does not see the expanded query, nor are they allowed to choose which of the suggested query terms to include. As such, this system does not offer a method for user feedback.

13 11 REFERENCE CITATION INDEXING As a method to bridge between languages, reference citations can be very powerful. In the domain of this study, these references appear in catalog entries to refer the reader to standard, paper-based sources. For example, in the legal domain, Congressional bills are often cited with an abbreviation and a number, such as H. R. 145, which refers to bill number 145 in the House of Representatives. Another example of such a citation is a reference to a specific Bible verse, as in 1 Cor. 1:13. In both examples, the numerical portions of the reference citations are language-neutral and can be extracted from a document in any language. The German and English corpora in this study take their references from similar, well known sources. As a result, reference numbers provide a good example of a language-neutral bridge between documents in different languages. The trouble with the reference citations as they exist in catalog entries is in their formatting: Documents captured from different sources tend to use different citation styles. Returning to the example of Congressional bills, some sources use punctuation in their citations, such as H. R. 326 or S. 120, while other sources leave out the periods and use different spacing, such as HR 326 or S 120. If users do not phrase their search queries exactly as citations appear in the source documents, then the retrieval system will not find them. To address this problem, I chose the top seven most frequently cited reference works, removed their reference citations from the corpora, and indexed them separately from the document text. I carried out this process by generating a list of regular expressions that would match the numerical portion of each reference citation. The citations were then placed in their own database fields so they can be searched separately from the document text. This index of citations is similar to database

14 12 normalization, in that its goal is to extract atomic, structured data from an unstructured text field. Despite their ability to serve as a bridge between documents in different languages, reference citations do have one drawback: They are not always unique. Therefore, they may not produce the desired documents if used alone in a search query. Instead, they often need to be used in combination with other query terms to identify relevant catalog entries. For example, when referring to bills under consideration in the U.S. Senate, a citation such as S. 648 might be used in the text of a document. However, this citation is incomplete when taken out of context (i.e., when extracted programmatically). To refer to a unique bill, the reference citation must be accompanied by a Senate number, such as 109 th Congress, because bill numbers are reset before each new Congress. For this study, we can assume that users of the system are familiar with these references and their limitations, and that they will phrase their queries accordingly.

15 13 METHODOLOGY The methodology I used to test and evaluate retrieval performance is focused on the two improvements I made to the system. First, I measured the performance of the multilingual thesaurus. Second, I gauged how well the system performs when using the index of reference citations to retrieve documents. To adequately test the multilingual thesaurus, we must evaluate it based on the metrics discussed in the thesaurus compilation process. We are most interested in how successfully the thesaurus uses query expansion to find new documents. This can be measured by counting the number of documents returned for each search query. We can compare the result of this test to the old search method (which did not use a thesaurus) to see how the use of thesaural translations increases document recall. To carry out this evaluation, I randomly selected 200 search queries from the existing query log. For each query, I ran a search using both the old and new systems and recorded the number of hits. The second evaluation component focuses on the effect of reference citations in cross-language retrieval. To test the effect of these references, I randomly selected 50 user search queries that include a reference citation. I then broke these queries into two components: the numerical citation, and any other search terms that were part of the query. I used the numerical portion as an input to search for matches among the reference entries in the database. The other search terms were input into the thesaurusbased translation system. For a query to match a document, both the numerical reference citation and the query terms must have matches in the database. In this way, we take advantage of the thesaural query expansion as well as the index of citations. For each query, I ensured that enough information was available to guarantee that the reference

16 14 citation would be unique. This restriction allows us to judge the relevance of the results, because we know which catalog entries a user was trying to find based on their query. Therefore, for the list of results of each query, we can calculate the number of relevant and nonrelevant documents.

17 15 RESULTS The thesaurus-based query expansion system improved the search results for 37 of the 200 randomly selected queries. No changes were observed for the other 163 queries. Of those queries that saw improved results, the number of hits increased by an average of 115 percent so the thesaurus more than doubled the average number of documents retrieved. Although I did not evaluate the relevance of each retrieved document, initial results indicate that the query expansion of English terms is very effective at returning German documents that would not have been retrieved without query translation. Therefore, we can confidently say that the thesaurus does a good job of expanding search results when it can translate at least one query term. Inevitably, though, not all user queries have matches in the thesaurus. Some queries simply contain concepts not covered by the thesaurus. Others do not match thesaurus entries because of misspellings. Analysis of these non-matching terms is valuable, as they may prove to be candidates for addition to the thesaurus either as new entries or as extensions of an existing entry. Whereas the thesaurus evaluation offers information about the recall of the retrieval system, the reference citations can be employed to evaluate its precision. The results of this portion of the evaluation were very good. Of the 50 citation-specific search queries examined, all but two produced at least some relevant results. Overall, the 50 queries achieved very high precision. On average, 87 percent of the retrieved documents were relevant to the query. As with the evaluation of the thesaurus-based query expansion, the documents retrieved in this test were highly representative of the corpora from which they were drawn; that is, they consisted of both German and English documents with varying styles of citations. From these results, we can conclude that

18 16 the index of reference citations greatly improves cross-language retrieval because of its ability to reconcile differing reference styles. When implemented alongside the thesaurusbased translation system, this citation index provides a reliable, language-neutral method to retrieve specific catalog entries from the corpora.

19 17 DISCUSSION This study has demonstrated how cross-language retrieval of German and English documents can be enhanced by the use of a multilingual thesaurus and an index of reference citations. The thesaurus compilation process was completed almost entirely with manual methods. Manual compilation, though very labor intensive, gives a thesaurus several important properties. First, the thesaurus was built with the help of domainspecific knowledge of the corpora. This allows us to ensure the quality of translations and, at least subjectively, judge the completeness of the thesaurus entries. In addition, we can limit the scope of the thesaurus to cover only the most important domain concepts. These properties of a manually constructed thesaurus allow it to fit well into the framework of such a narrowly defined domain, where the use of general thesauri or machine translation would not be able to handle the specialized vocabulary. Like thesaurus-based translations, the index of reference citations is an attempt to bridge between the English and German documents. Reference citations are a particularly powerful retrieval method because they represent structured information within unstructured document text. That is, the references cited in a document appear in a structured format that we can extract and interpret programmatically. This structure does show subtle variations among documents (reflecting the differences in citation styles), but most can be matched with the use of regular expressions. We can therefore parse a document for the numerical portion of a citation and index this separately from the document text, providing a key on which to find documents containing a particular citation. In any corpus of unstructured text, such citations are valuable tools for retrieval. If they can be extracted and indexed apart from the document text in a normalized

20 18 fashion, then we can rely on simple database queries rather than retrieval algorithms to match documents to a query. Several aspects of the implemented retrieval methods could benefit from improvement. To allow for an expansion of the number of user queries that can be translated by the thesaurus, term compilation and thesaurus evaluation have to be ongoing processes. As the coverage of the thesaurus increases, cross-language retrieval performance will similarly improve. The query expansion process might also benefit from user feedback; instead of being an automatic, behind-the-scenes process, the system could show users suggested synonyms and translations. This function would allow users to customize their search by choosing which terms to add to a query or by editing the suggested terms. Finally, the scope of the reference citations index could be expanded to include more reference works. As implemented, the system indexes the top seven citation sources. We could greatly expand the coverage of this resource by indexing other, less frequently cited reference works. As a result of this study, we recommend that designers of domain-specific, crosslanguage retrieval systems carefully evaluate the potential for developing a custom thesaurus and citation index to enhance retrieval performance. Within a narrow domain, thesaurus-based translations of important query terms can provide an extremely powerful method to present users with documents that otherwise would not have been returned by a monolingual query. Similarly, language-neutral reference citations should be exploited wherever possible by developing parsing techniques that can identify citations and index them. This method allows users who are familiar with standard references to find documents in multiple languages.

21 19 REFERENCES Ballesteros, L. and W. B. Croft. (1996). Dictionary methods for cross-lingual information retrieval. Proceedings of the 7th International DEXA Conference on Database and Expert Systems Applications. Ballesteros, L. and W. B. Croft. (1998). Resolving ambiguity for cross-language retrieval. Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Eichmann, D., M. Ruiz, and P. Srinivasan. (1998). Cross-language information retrieval with the UMLS metathesaurus. Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Garfield, E. (1979) Citation indexing: Its theory and application in science, technology, and humanities. New York: Wiley-Interscience. Grossman, D. A., O. Frieder, D. O. Holmes, and D. C. Roberts. (1997). Integrating structured data and text: A relational approach. Journal of the American Society for Information Science, 48(2), Han, C., H. Fujii, and W. B. Croft. (1994). Automatic query expansion for Japanese text retrieval. Technical report, Department of Computer Science, University of Massachusetts, Amherst. Jing, Y. and B. Croft. (1994). An association thesaurus for information retrieval. Proceedings of the Intelligent Multimedia Retrieval Systems and Management Conference, Lawrence, S., C.L. Giles, and K. Bollacker. (1999). Digital libraries and autonomous citation indexing. IEEE Computer, 32(6), Oard, D. W. (1998). A comparative study of query and document translation for crosslanguage information retrieval. Proceedings of the 3rd Conference of the Association for Machine Translation in the Americas, Qiu, Y. and H. P. Frei. (1993). Concept based query expansion. Proceedings of the 16th International Conference on Research and Development in IR (SIGIR),

22 20 Rogati, M. and Y. Yang. (2004) Resource Selection for Domain Specific CLIR. Proceedings of the 2004 International Conference on Research and Development in IR. Sager, J. C. (1990). A Practical Course in Terminology Processing. Amsterdam: John Benjamins Publishing Company. Small, H. (1973). Co-citation in scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science, 24, Soergel, D. (1997). Multilingual thesauri in cross-language text and speech retrieval. Proceedings of the AAAI Symposium on Cross-Language Text and Speech Retrieval.

PORTAL RESOURCES INFORMATION SYSTEM: THE DESIGN AND DEVELOPMENT OF AN ONLINE DATABASE FOR TRACKING WEB RESOURCES.

PORTAL RESOURCES INFORMATION SYSTEM: THE DESIGN AND DEVELOPMENT OF AN ONLINE DATABASE FOR TRACKING WEB RESOURCES. PORTAL RESOURCES INFORMATION SYSTEM: THE DESIGN AND DEVELOPMENT OF AN ONLINE DATABASE FOR TRACKING WEB RESOURCES by Richard Spinks A Master s paper submitted to the faculty of the School of Information

More information

TEXT CHAPTER 5. W. Bruce Croft BACKGROUND

TEXT CHAPTER 5. W. Bruce Croft BACKGROUND 41 CHAPTER 5 TEXT W. Bruce Croft BACKGROUND Much of the information in digital library or digital information organization applications is in the form of text. Even when the application focuses on multimedia

More information

An Evaluation of Information Retrieval Accuracy. with Simulated OCR Output. K. Taghva z, and J. Borsack z. University of Massachusetts, Amherst

An Evaluation of Information Retrieval Accuracy. with Simulated OCR Output. K. Taghva z, and J. Borsack z. University of Massachusetts, Amherst An Evaluation of Information Retrieval Accuracy with Simulated OCR Output W.B. Croft y, S.M. Harding y, K. Taghva z, and J. Borsack z y Computer Science Department University of Massachusetts, Amherst

More information

Cross-Language Information Retrieval using Dutch Query Translation

Cross-Language Information Retrieval using Dutch Query Translation Cross-Language Information Retrieval using Dutch Query Translation Anne R. Diekema and Wen-Yuan Hsiao Syracuse University School of Information Studies 4-206 Ctr. for Science and Technology Syracuse, NY

More information

ResPubliQA 2010

ResPubliQA 2010 SZTAKI @ ResPubliQA 2010 David Mark Nemeskey Computer and Automation Research Institute, Hungarian Academy of Sciences, Budapest, Hungary (SZTAKI) Abstract. This paper summarizes the results of our first

More information

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent

More information

The Effectiveness of a Dictionary-Based Technique for Indonesian-English Cross-Language Text Retrieval

The Effectiveness of a Dictionary-Based Technique for Indonesian-English Cross-Language Text Retrieval University of Massachusetts Amherst ScholarWorks@UMass Amherst Computer Science Department Faculty Publication Series Computer Science 1997 The Effectiveness of a Dictionary-Based Technique for Indonesian-English

More information

Headings: Academic Libraries. Database Management. Database Searching. Electronic Information Resource Searching Evaluation. Web Portals.

Headings: Academic Libraries. Database Management. Database Searching. Electronic Information Resource Searching Evaluation. Web Portals. Erin R. Holmes. Reimagining the E-Research by Discipline Portal. A Master s Project for the M.S. in IS degree. April, 2014. 20 pages. Advisor: Emily King This project presents recommendations and wireframes

More information

A RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH

A RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH A RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH A thesis Submitted to the faculty of the graduate school of the University of Minnesota by Vamshi Krishna Thotempudi In partial fulfillment of the requirements

More information

INFORMATION RETRIEVAL SYSTEM: CONCEPT AND SCOPE

INFORMATION RETRIEVAL SYSTEM: CONCEPT AND SCOPE 15 : CONCEPT AND SCOPE 15.1 INTRODUCTION Information is communicated or received knowledge concerning a particular fact or circumstance. Retrieval refers to searching through stored information to find

More information

Reconsidering DCRM in the light of RDA: A Discussion Paper

Reconsidering DCRM in the light of RDA: A Discussion Paper Reconsidering DCRM in the light of RDA: A Discussion Paper I. Introduction The DCRM documents acknowledge both a historical and a normative relationship with AACR2. As the accepted standard for the cataloging

More information

Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching

Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching Wolfgang Tannebaum, Parvaz Madabi and Andreas Rauber Institute of Software Technology and Interactive Systems, Vienna

More information

Taxonomies and controlled vocabularies best practices for metadata

Taxonomies and controlled vocabularies best practices for metadata Original Article Taxonomies and controlled vocabularies best practices for metadata Heather Hedden is the taxonomy manager at First Wind Energy LLC. Previously, she was a taxonomy consultant with Earley

More information

Document Structure Analysis in Associative Patent Retrieval

Document Structure Analysis in Associative Patent Retrieval Document Structure Analysis in Associative Patent Retrieval Atsushi Fujii and Tetsuya Ishikawa Graduate School of Library, Information and Media Studies University of Tsukuba 1-2 Kasuga, Tsukuba, 305-8550,

More information

Noida institute of engineering and technology,greater noida

Noida institute of engineering and technology,greater noida Impact Of Word Sense Ambiguity For English Language In Web IR Prachi Gupta 1, Dr.AnuragAwasthi 2, RiteshRastogi 3 1,2,3 Department of computer Science and engineering, Noida institute of engineering and

More information

Fundamentals of STEP Implementation

Fundamentals of STEP Implementation Fundamentals of STEP Implementation David Loffredo loffredo@steptools.com STEP Tools, Inc., Rensselaer Technology Park, Troy, New York 12180 A) Introduction The STEP standard documents contain such a large

More information

Question 1: What is a code walk-through, and how is it performed?

Question 1: What is a code walk-through, and how is it performed? Question 1: What is a code walk-through, and how is it performed? Response: Code walk-throughs have traditionally been viewed as informal evaluations of code, but more attention is being given to this

More information

Semantic Annotation for Semantic Social Networks. Using Community Resources

Semantic Annotation for Semantic Social Networks. Using Community Resources Semantic Annotation for Semantic Social Networks Using Community Resources Lawrence Reeve and Hyoil Han College of Information Science and Technology Drexel University, Philadelphia, PA 19108 lhr24@drexel.edu

More information

Using a Medical Thesaurus to Predict Query Difficulty

Using a Medical Thesaurus to Predict Query Difficulty Using a Medical Thesaurus to Predict Query Difficulty Florian Boudin, Jian-Yun Nie, Martin Dawes To cite this version: Florian Boudin, Jian-Yun Nie, Martin Dawes. Using a Medical Thesaurus to Predict Query

More information

Automatic Translation in Cross-Lingual Access to Legislative Databases

Automatic Translation in Cross-Lingual Access to Legislative Databases Automatic Translation in Cross-Lingual Access to Legislative Databases Catherine Bounsaythip, Aarno Lehtola, Jarno Tenni VTT Information Technology P. Box 1201, FIN-02044 VTT, Finland Phone: +358 9 456

More information

Making Sense Out of the Web

Making Sense Out of the Web Making Sense Out of the Web Rada Mihalcea University of North Texas Department of Computer Science rada@cs.unt.edu Abstract. In the past few years, we have witnessed a tremendous growth of the World Wide

More information

Access ERIC from the GOS-ICH Library website: hhttps://

Access ERIC from the GOS-ICH Library website: hhttps:// The ERIC (Educational Resources Information Center) database is sponsored by the U.S. Department of Education to provide access to educational-related literature. ERIC provides coverage of journal articles,

More information

Searchers Selection of Search Keys: III. Searching Styles

Searchers Selection of Search Keys: III. Searching Styles Searchers Selection of Search Keys: III. Searching Styles Raya Fidel Graduate School of Library and information Science, University of Washington, Seattle, WA 98195 Individual searching style has a primary

More information

Information Retrieval Tools for Efficient Data Searching using Big Data

Information Retrieval Tools for Efficient Data Searching using Big Data ISSN: 2393-8528 Contents lists available at www.ijicse.in International Journal of Innovative Computer Science & Engineering Volume 4 Issue 4; July-August-2017; Page No. 06-12 Information Retrieval Tools

More information

Sheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms

Sheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms Sheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms Yikun Guo, Henk Harkema, Rob Gaizauskas University of Sheffield, UK {guo, harkema, gaizauskas}@dcs.shef.ac.uk

More information

Access IBSS from the ICH Library website:

Access IBSS from the ICH Library website: The International Bibliography of the Social Sciences (IBSS), produced by the London School of Economics and Political Science, includes over 3 million references to journal articles, books, reviews and

More information

21. Search Models and UIs for IR

21. Search Models and UIs for IR 21. Search Models and UIs for IR INFO 202-10 November 2008 Bob Glushko Plan for Today's Lecture The "Classical" Model of Search and the "Classical" UI for IR Web-based Search Best practices for UIs in

More information

MIRACLE at ImageCLEFmed 2008: Evaluating Strategies for Automatic Topic Expansion

MIRACLE at ImageCLEFmed 2008: Evaluating Strategies for Automatic Topic Expansion MIRACLE at ImageCLEFmed 2008: Evaluating Strategies for Automatic Topic Expansion Sara Lana-Serrano 1,3, Julio Villena-Román 2,3, José C. González-Cristóbal 1,3 1 Universidad Politécnica de Madrid 2 Universidad

More information

This literature review provides an overview of the various topics related to using implicit

This literature review provides an overview of the various topics related to using implicit Vijay Deepak Dollu. Implicit Feedback in Information Retrieval: A Literature Analysis. A Master s Paper for the M.S. in I.S. degree. April 2005. 56 pages. Advisor: Stephanie W. Haas This literature review

More information

Joining Collaborative and Content-based Filtering

Joining Collaborative and Content-based Filtering Joining Collaborative and Content-based Filtering 1 Patrick Baudisch Integrated Publication and Information Systems Institute IPSI German National Research Center for Information Technology GMD 64293 Darmstadt,

More information

Applying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task

Applying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task Applying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task Walid Magdy, Gareth J.F. Jones Centre for Next Generation Localisation School of Computing Dublin City University,

More information

FACETs. Technical Report 05/19/2010

FACETs. Technical Report 05/19/2010 F3 FACETs Technical Report 05/19/2010 PROJECT OVERVIEW... 4 BASIC REQUIREMENTS... 4 CONSTRAINTS... 5 DEVELOPMENT PROCESS... 5 PLANNED/ACTUAL SCHEDULE... 6 SYSTEM DESIGN... 6 PRODUCT AND PROCESS METRICS...

More information

Query Modifications Patterns During Web Searching

Query Modifications Patterns During Web Searching Bernard J. Jansen The Pennsylvania State University jjansen@ist.psu.edu Query Modifications Patterns During Web Searching Amanda Spink Queensland University of Technology ah.spink@qut.edu.au Bhuva Narayan

More information

Identification and Classification of A/E/C Web Sites and Pages

Identification and Classification of A/E/C Web Sites and Pages Construction Informatics Digital Library http://itc.scix.net/ paper w78-2002-34.content Theme: Title: Author(s): Institution(s): E-mail(s): Abstract: Keywords: Identification and Classification of A/E/C

More information

R 2 D 2 at NTCIR-4 Web Retrieval Task

R 2 D 2 at NTCIR-4 Web Retrieval Task R 2 D 2 at NTCIR-4 Web Retrieval Task Teruhito Kanazawa KYA group Corporation 5 29 7 Koishikawa, Bunkyo-ku, Tokyo 112 0002, Japan tkana@kyagroup.com Tomonari Masada University of Tokyo 7 3 1 Hongo, Bunkyo-ku,

More information

INFORMATION RETRIEVAL SYSTEM USING FUZZY SET THEORY - THE BASIC CONCEPT

INFORMATION RETRIEVAL SYSTEM USING FUZZY SET THEORY - THE BASIC CONCEPT ABSTRACT INFORMATION RETRIEVAL SYSTEM USING FUZZY SET THEORY - THE BASIC CONCEPT BHASKAR KARN Assistant Professor Department of MIS Birla Institute of Technology Mesra, Ranchi The paper presents the basic

More information

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data American Journal of Applied Sciences (): -, ISSN -99 Science Publications Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data Ibrahiem M.M. El Emary and Ja'far

More information

The Library's Website,

The Library's Website, 1 A Short English Guide to The Library's Website, and to Database Searches CONTENTS: A: THE LIBRARY'S HOMEPAGE... 2 1) THE LIBRARY'S HOMEPAGE...2 2) DATABASES OPENING PAGE...3 3) DATABASES BY SUBJECTS

More information

Query Expansion of Zero-Hit Subject Searches: Using a Thesaurus in Conjunction with NLP Techniques

Query Expansion of Zero-Hit Subject Searches: Using a Thesaurus in Conjunction with NLP Techniques Query Expansion of Zero-Hit Subject Searches: Using a Thesaurus in Conjunction with NLP Techniques Sarantos Kapidakis 1, Anna Mastora 1, and Manolis Peponakis 2 1 Laboratory on Digital Libraries & Electronic

More information

TEXT PREPROCESSING FOR TEXT MINING USING SIDE INFORMATION

TEXT PREPROCESSING FOR TEXT MINING USING SIDE INFORMATION TEXT PREPROCESSING FOR TEXT MINING USING SIDE INFORMATION Ms. Nikita P.Katariya 1, Prof. M. S. Chaudhari 2 1 Dept. of Computer Science & Engg, P.B.C.E., Nagpur, India, nikitakatariya@yahoo.com 2 Dept.

More information

In the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google,

In the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google, 1 1.1 Introduction In the recent past, the World Wide Web has been witnessing an explosive growth. All the leading web search engines, namely, Google, Yahoo, Askjeeves, etc. are vying with each other to

More information

Chapter 1: The Cochrane Library Search Tour

Chapter 1: The Cochrane Library Search Tour Chapter : The Cochrane Library Search Tour Chapter : The Cochrane Library Search Tour This chapter will provide an overview of The Cochrane Library Search: Learn how The Cochrane Library new search feature

More information

Modeling Crisis Management System With the Restricted Use Case Modeling Approach

Modeling Crisis Management System With the Restricted Use Case Modeling Approach Modeling Crisis Management System With the Restricted Use Case Modeling Approach Gong Zhang 1, Tao Yue 2, and Shaukat Ali 3 1 School of Computer Science and Engineering, Beihang University, Beijing, China

More information

COCHRANE LIBRARY. Contents

COCHRANE LIBRARY. Contents COCHRANE LIBRARY Contents Introduction... 2 Learning outcomes... 2 About this workbook... 2 1. Getting Started... 3 a. Finding the Cochrane Library... 3 b. Understanding the databases in the Cochrane Library...

More information

CA Productivity Accelerator 12.1 and Later

CA Productivity Accelerator 12.1 and Later CA Productivity Accelerator 12.1 and Later Localize Content Localize Content Once you have created content in one language, you might want to translate it into one or more different languages. The Developer

More information

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS 1 WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS BRUCE CROFT NSF Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts,

More information

This session will provide an overview of the research resources and strategies that can be used when conducting business research.

This session will provide an overview of the research resources and strategies that can be used when conducting business research. Welcome! This session will provide an overview of the research resources and strategies that can be used when conducting business research. Many of these research tips will also be applicable to courses

More information

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 1 Department of Electronics & Comp. Sc, RTMNU, Nagpur, India 2 Department of Computer Science, Hislop College, Nagpur,

More information

A Comparison between Users free-text queries and RILM index terms. Shuheng Wu Queens College, CUNY Yun F. Henshaw RILM

A Comparison between Users free-text queries and RILM index terms. Shuheng Wu Queens College, CUNY Yun F. Henshaw RILM A Comparison between Users free-text queries and RILM index Shuheng Wu Queens College, CUNY Yun F. Henshaw RILM 1 Introduction Research questions Study design Agenda Data analysis results & suggestions

More information

irnational Standard 5963

irnational Standard 5963 5 1 3 8 5 DO irnational Standard 5963 INTERNATIONAL ORGANIZATION FOR STANDARDIZATION«ME)KflyHAPOflHAn 0PrAHM3ALlHH F1O CTAHflAPTL13AU.Hl

More information

Quoogle: A Query Expander for Google

Quoogle: A Query Expander for Google Quoogle: A Query Expander for Google Michael Smit Faculty of Computer Science Dalhousie University 6050 University Avenue Halifax, NS B3H 1W5 smit@cs.dal.ca ABSTRACT The query is the fundamental way through

More information

Enterprise Multimedia Integration and Search

Enterprise Multimedia Integration and Search Enterprise Multimedia Integration and Search José-Manuel López-Cobo 1 and Katharina Siorpaes 1,2 1 playence, Austria, 2 STI Innsbruck, University of Innsbruck, Austria {ozelin.lopez, katharina.siorpaes}@playence.com

More information

E B S C O h o s t U s e r G u i d e P s y c I N F O

E B S C O h o s t U s e r G u i d e P s y c I N F O E B S C O h o s t U s e r G u i d e P s y c I N F O PsycINFO User Guide Last Updated: 1/11/12 Table of Contents What is PsycINFO... 3 What is EBSCOhost... 3 System Requirements...3 Choosing Databases to

More information

Text Mining. Munawar, PhD. Text Mining - Munawar, PhD

Text Mining. Munawar, PhD. Text Mining - Munawar, PhD 10 Text Mining Munawar, PhD Definition Text mining also is known as Text Data Mining (TDM) and Knowledge Discovery in Textual Database (KDT).[1] A process of identifying novel information from a collection

More information

Wikipedia and the Web of Confusable Entities: Experience from Entity Linking Query Creation for TAC 2009 Knowledge Base Population

Wikipedia and the Web of Confusable Entities: Experience from Entity Linking Query Creation for TAC 2009 Knowledge Base Population Wikipedia and the Web of Confusable Entities: Experience from Entity Linking Query Creation for TAC 2009 Knowledge Base Population Heather Simpson 1, Stephanie Strassel 1, Robert Parker 1, Paul McNamee

More information

Implementation of a High-Performance Distributed Web Crawler and Big Data Applications with Husky

Implementation of a High-Performance Distributed Web Crawler and Big Data Applications with Husky Implementation of a High-Performance Distributed Web Crawler and Big Data Applications with Husky The Chinese University of Hong Kong Abstract Husky is a distributed computing system, achieving outstanding

More information

A Model for Information Retrieval Agent System Based on Keywords Distribution

A Model for Information Retrieval Agent System Based on Keywords Distribution A Model for Information Retrieval Agent System Based on Keywords Distribution Jae-Woo LEE Dept of Computer Science, Kyungbok College, 3, Sinpyeong-ri, Pocheon-si, 487-77, Gyeonggi-do, Korea It2c@koreaackr

More information

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and

More information

EVALUATION OF SEARCHER PERFORMANCE IN DIGITAL LIBRARIES

EVALUATION OF SEARCHER PERFORMANCE IN DIGITAL LIBRARIES DEFINING SEARCH SUCCESS: EVALUATION OF SEARCHER PERFORMANCE IN DIGITAL LIBRARIES by Barbara M. Wildemuth Associate Professor, School of Information and Library Science University of North Carolina at Chapel

More information

Searching the Evidence in the Cochrane Library

Searching the Evidence in the Cochrane Library CAMBRIDGE UNIVERSITY LIBRARY MEDICAL LIBRARY Searching the Evidence Searching the Evidence in the Cochrane Library January 2014 (due for revision July 2014) Searching the Evidence 1. How to access The

More information

Information Retrieval

Information Retrieval Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,

More information

Cognitive Walkthrough Evaluation

Cognitive Walkthrough Evaluation Columbia University Libraries / Information Services Digital Library Collections (Beta) Cognitive Walkthrough Evaluation by Michael Benowitz Pratt Institute, School of Library and Information Science Executive

More information

Search Interface for Z39.50 Compliant Online Catalogs Over The Internet

Search Interface for Z39.50 Compliant Online Catalogs Over The Internet Search Interface for Z39.50 Compliant Online Catalogs Over The Internet Danny C.C. POO Teck Kang TOH School of Computing National University of Singapore Lower Kent Ridge Road, Singapore 119260 dpoo@comp.nus.edu.sg

More information

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A.

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. Knowledge Retrieval Franz J. Kurfess Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. 1 Acknowledgements This lecture series has been sponsored by the European

More information

Extracting Visual Snippets for Query Suggestion in Collaborative Web Search

Extracting Visual Snippets for Query Suggestion in Collaborative Web Search Extracting Visual Snippets for Query Suggestion in Collaborative Web Search Hannarin Kruajirayu, Teerapong Leelanupab Knowledge Management and Knowledge Engineering Laboratory Faculty of Information Technology

More information

BUILDING A CONCEPTUAL MODEL OF THE WORLD WIDE WEB FOR VISUALLY IMPAIRED USERS

BUILDING A CONCEPTUAL MODEL OF THE WORLD WIDE WEB FOR VISUALLY IMPAIRED USERS 1 of 7 17/01/2007 10:39 BUILDING A CONCEPTUAL MODEL OF THE WORLD WIDE WEB FOR VISUALLY IMPAIRED USERS Mary Zajicek and Chris Powell School of Computing and Mathematical Sciences Oxford Brookes University,

More information

CABI Training Materials. Ovid Silver Platter (SP) platform. Simple Searching of CAB Abstracts and Global Health KNOWLEDGE FOR LIFE.

CABI Training Materials. Ovid Silver Platter (SP) platform. Simple Searching of CAB Abstracts and Global Health KNOWLEDGE FOR LIFE. CABI Training Materials Ovid Silver Platter (SP) platform Simple Searching of CAB Abstracts and Global Health www.cabi.org KNOWLEDGE FOR LIFE Contents The OvidSP Database Selection Screen... 3 The Ovid

More information

Organizing Information. Organizing information is at the heart of information science and is important in many other

Organizing Information. Organizing information is at the heart of information science and is important in many other Dagobert Soergel College of Library and Information Services University of Maryland College Park, MD 20742 Organizing Information Organizing information is at the heart of information science and is important

More information

A model of information searching behaviour to facilitate end-user support in KOS-enhanced systems

A model of information searching behaviour to facilitate end-user support in KOS-enhanced systems A model of information searching behaviour to facilitate end-user support in KOS-enhanced systems Dorothee Blocks Hypermedia Research Unit School of Computing University of Glamorgan, UK NKOS workshop

More information

INFORMATION RETRIEVAL SYSTEMS: Theory and Implementation

INFORMATION RETRIEVAL SYSTEMS: Theory and Implementation INFORMATION RETRIEVAL SYSTEMS: Theory and Implementation THE KLUWER INTERNATIONAL SERIES ON INFORMATION RETRIEVAL Series Editor W. Bruce Croft University of Massachusetts Amherst, MA 01003 Also in the

More information

Inter and Intra-Document Contexts Applied in Polyrepresentation

Inter and Intra-Document Contexts Applied in Polyrepresentation Inter and Intra-Document Contexts Applied in Polyrepresentation Mette Skov, Birger Larsen and Peter Ingwersen Department of Information Studies, Royal School of Library and Information Science Birketinget

More information

Enhancing E-Journal Access In A Digital Work Environment

Enhancing E-Journal Access In A Digital Work Environment Enhancing e-journal access in a digital work environment Foo, S. (2006). Singapore Journal of Library & Information Management, 34, 31-40. Enhancing E-Journal Access In A Digital Work Environment Schubert

More information

Web Information Retrieval using WordNet

Web Information Retrieval using WordNet Web Information Retrieval using WordNet Jyotsna Gharat Asst. Professor, Xavier Institute of Engineering, Mumbai, India Jayant Gadge Asst. Professor, Thadomal Shahani Engineering College Mumbai, India ABSTRACT

More information

VISUALIZING NP-COMPLETENESS THROUGH CIRCUIT-BASED WIDGETS

VISUALIZING NP-COMPLETENESS THROUGH CIRCUIT-BASED WIDGETS University of Portland Pilot Scholars Engineering Faculty Publications and Presentations Shiley School of Engineering 2016 VISUALIZING NP-COMPLETENESS THROUGH CIRCUIT-BASED WIDGETS Steven R. Vegdahl University

More information

From Scratch to the Web: Terminological Theses at the University of Innsbruck

From Scratch to the Web: Terminological Theses at the University of Innsbruck Peter Sandrini University of Innsbruck From Scratch to the Web: Terminological Theses at the University of Innsbruck Terminology Diploma Theses (TDT) have been well established in the training of translators

More information

A World Wide Web-based HCI-library Designed for Interaction Studies

A World Wide Web-based HCI-library Designed for Interaction Studies A World Wide Web-based HCI-library Designed for Interaction Studies Ketil Perstrup, Erik Frøkjær, Maria Konstantinovitz, Thorbjørn Konstantinovitz, Flemming S. Sørensen, Jytte Varming Department of Computing,

More information

Online Expansion of Rare Queries for Sponsored Search

Online Expansion of Rare Queries for Sponsored Search Online Expansion of Rare Queries for Sponsored Search Peter Ciccolo, Evgeniy Gabrilovich, Vanja Josifovski, Don Metzler, Lance Riedel, Jeff Yuan Yahoo! Research 1 Sponsored Search 2 Sponsored Search in

More information

VIDEO SEARCHING AND BROWSING USING VIEWFINDER

VIDEO SEARCHING AND BROWSING USING VIEWFINDER VIDEO SEARCHING AND BROWSING USING VIEWFINDER By Dan E. Albertson Dr. Javed Mostafa John Fieber Ph. D. Student Associate Professor Ph. D. Candidate Information Science Information Science Information Science

More information

1.0 Abstract. 2.0 TIPSTER and the Computing Research Laboratory. 2.1 OLEADA: Task-Oriented User- Centered Design in Natural Language Processing

1.0 Abstract. 2.0 TIPSTER and the Computing Research Laboratory. 2.1 OLEADA: Task-Oriented User- Centered Design in Natural Language Processing Oleada: User-Centered TIPSTER Technology for Language Instruction 1 William C. Ogden and Philip Bernick The Computing Research Laboratory at New Mexico State University Box 30001, Department 3CRL, Las

More information

QUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL

QUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL QUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL David Parapar, Álvaro Barreiro AILab, Department of Computer Science, University of A Coruña, Spain dparapar@udc.es, barreiro@udc.es

More information

Performance of relational database management

Performance of relational database management Building a 3-D DRAM Architecture for Optimum Cost/Performance By Gene Bowles and Duke Lambert As systems increase in performance and power, magnetic disk storage speeds have lagged behind. But using solidstate

More information

Enhanced retrieval using semantic technologies:

Enhanced retrieval using semantic technologies: Enhanced retrieval using semantic technologies: Ontology based retrieval as a new search paradigm? - Considerations based on new projects at the Bavarian State Library Dr. Berthold Gillitzer 28. Mai 2008

More information

STRUCTURE-BASED QUERY EXPANSION FOR XML SEARCH ENGINE

STRUCTURE-BASED QUERY EXPANSION FOR XML SEARCH ENGINE STRUCTURE-BASED QUERY EXPANSION FOR XML SEARCH ENGINE Wei-ning Qian, Hai-lei Qian, Li Wei, Yan Wang and Ao-ying Zhou Computer Science Department Fudan University Shanghai 200433 E-mail: wnqian@fudan.edu.cn

More information

Web Query Translation with Representative Synonyms in Cross Language Information Retrieval

Web Query Translation with Representative Synonyms in Cross Language Information Retrieval Web Query Translation with Representative Synonyms in Cross Language Information Retrieval August 25, 2005 Bo-Young Kang, Qing Li, Yun Jin, Sung Hyon Myaeng Information Retrieval and Natural Language Processing

More information

Prior Art Retrieval Using Various Patent Document Fields Contents

Prior Art Retrieval Using Various Patent Document Fields Contents Prior Art Retrieval Using Various Patent Document Fields Contents Metti Zakaria Wanagiri and Mirna Adriani Fakultas Ilmu Komputer, Universitas Indonesia Depok 16424, Indonesia metti.zakaria@ui.edu, mirna@cs.ui.ac.id

More information

Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman

Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman Abstract We intend to show that leveraging semantic features can improve precision and recall of query results in information

More information

Knowledge enrichment through dynamic annotation

Knowledge enrichment through dynamic annotation Knowledge enrichment through dynamic annotation Abstract This paper describes a technique for interceding between users and the information that they browse. This facility, that we term dynamic annotation,

More information

Search Engines Chapter 8 Evaluating Search Engines Felix Naumann

Search Engines Chapter 8 Evaluating Search Engines Felix Naumann Search Engines Chapter 8 Evaluating Search Engines 9.7.2009 Felix Naumann Evaluation 2 Evaluation is key to building effective and efficient search engines. Drives advancement of search engines When intuition

More information

Retrieval Evaluation. Hongning Wang

Retrieval Evaluation. Hongning Wang Retrieval Evaluation Hongning Wang CS@UVa What we have learned so far Indexed corpus Crawler Ranking procedure Research attention Doc Analyzer Doc Rep (Index) Query Rep Feedback (Query) Evaluation User

More information

Actionable User Intentions for Real-Time Mobile Assistant Applications

Actionable User Intentions for Real-Time Mobile Assistant Applications Actionable User Intentions for Real-Time Mobile Assistant Applications Thimios Panagos, Shoshana Loeb, Ben Falchuk Applied Research, Telcordia Technologies One Telcordia Drive, Piscataway, New Jersey,

More information

Speed and Accuracy using Four Boolean Query Systems

Speed and Accuracy using Four Boolean Query Systems From:MAICS-99 Proceedings. Copyright 1999, AAAI (www.aaai.org). All rights reserved. Speed and Accuracy using Four Boolean Query Systems Michael Chui Computer Science Department and Cognitive Science Program

More information

Terminologies Services Strawman

Terminologies Services Strawman Terminologies Services Strawman Background This document was drafted for discussion for a meeting at the Metropolitan Museum of Art on September 12, 2007. This document was not intended to represent a

More information

1DL321: Kompilatorteknik I (Compiler Design 1) Introduction to Programming Language Design and to Compilation

1DL321: Kompilatorteknik I (Compiler Design 1) Introduction to Programming Language Design and to Compilation 1DL321: Kompilatorteknik I (Compiler Design 1) Introduction to Programming Language Design and to Compilation Administrivia Lecturer: Kostis Sagonas (kostis@it.uu.se) Course home page: http://www.it.uu.se/edu/course/homepage/komp/h18

More information

1DL321: Kompilatorteknik I (Compiler Design 1)

1DL321: Kompilatorteknik I (Compiler Design 1) Administrivia 1DL321: Kompilatorteknik I (Compiler Design 1) Introduction to Programming Language Design and to Compilation Lecturer: Kostis Sagonas (kostis@it.uu.se) Course home page: http://www.it.uu.se/edu/course/homepage/komp/ht16

More information

Word Disambiguation in Web Search

Word Disambiguation in Web Search Word Disambiguation in Web Search Rekha Jain Computer Science, Banasthali University, Rajasthan, India Email: rekha_leo2003@rediffmail.com G.N. Purohit Computer Science, Banasthali University, Rajasthan,

More information

Improving the Effectiveness of Information Retrieval with Local Context Analysis

Improving the Effectiveness of Information Retrieval with Local Context Analysis Improving the Effectiveness of Information Retrieval with Local Context Analysis JINXI XU BBN Technologies and W. BRUCE CROFT University of Massachusetts Amherst Techniques for automatic query expansion

More information

About the Library APA style Preparing to search Searching library e-resources for articles Searching the Internet

About the Library APA style Preparing to search Searching library e-resources for articles Searching the Internet About the Library APA style Preparing to search Searching library e-resources for articles Searching the Internet ENG 206 Report Presentation for Community Service Workers 13 May 2015 David McCarthy, Professor;

More information

CSA4020. Multimedia Systems:

CSA4020. Multimedia Systems: CSA4020 Multimedia Systems: Adaptive Hypermedia Systems Lecture 4: Automatic Indexing & Performance Evaluation Multimedia Systems: Adaptive Hypermedia Systems 1 Automatic Indexing Document Retrieval Model

More information

Metadata for Digital Collections: A How-to-Do-It Manual

Metadata for Digital Collections: A How-to-Do-It Manual Chapter 4 Supplement Resource Content and Relationship Elements Questions for Review, Study, or Discussion 1. This chapter explores information and metadata elements having to do with what aspects of digital

More information

EBSCOhost Web 6.0. User s Guide EBS 2065

EBSCOhost Web 6.0. User s Guide EBS 2065 EBSCOhost Web 6.0 User s Guide EBS 2065 6/26/2002 2 Table Of Contents Objectives:...4 What is EBSCOhost...5 System Requirements... 5 Choosing Databases to Search...5 Using the Toolbar...6 Using the Utility

More information