Deliverable 6.1 Results of a Workshop on Roadmap Activities

Size: px
Start display at page:

Download "Deliverable 6.1 Results of a Workshop on Roadmap Activities"

Transcription

1 DEEPTHOUGHT Hybrid Deep and Shallow Methods for Knowledge-Intensive Information Extraction Deliverable 6.1 Results of a Workshop on Roadmap Activities The Consortium April

2 PROJECT REF. NO. Project acronym Project full title - Hybrid Deep and Shallow Methods for Knowledge-Intensive Information Extraction Security (distribution level) Public Contractual date of delivery M6: April 2003 Actual date of delivery M6: April 15, 2003 Deliverable number 6.1 Deliverable name Type Status & version Number of pages 36 WP contributing to the deliverable WP / Task responsible Other contributors Author(s) EC Project Officer Keywords Abstract Results of a workshop on roadmap activities Report Final WP6 WP6 Melanie Siegel Yves Paternoster Roadmap, natural language processing The deliverable contains a report on the results of a workshop on roadmaps for natural language processing in February in Berlin. 2

3 Table of Contents 1 Executive Summary 5 2 General Idea 5 3 Agenda of the Workshop 6 4 Workshop Report Introduction Innovative NLP Application Ideas Language Learning Interactive e-language learning (presented by Frederique Segond) Language learning (presented by Dan Flickinger) Language learning support (presented by Melanie Siegel) Personal Data Organization Voice Assistant (presented by Gregor Erbach) Intelligent capture of conversations and meetings (presented by Andreas Eisele) Document Authoring Document Authoring (presented by Manfred Stede) Creation of Structured Knowledge (presented by Andrew Bredenkamp) Intelligent Information Access Ubiquitous Business Intelligence (presented by Luca Dini) Context Matching (presented by Fabio Pianesi) Communication Management (presented by Klaus Netter) Job Search (presented by Melanie Siegel) Dynamic Newspaper (presented by Andreas Eisele) Voice-based access to encyclopaedic knowledge (presented by Andreas Eisele) Technologies underlying the proposed applications Grammar-based generation Data mining technologies Large coverage robust deep processing Discourse and Dialogue Processing Grammar and style checking Automatic Hyperlinking

4 4.3.7 Information Retrieval Information Extraction Language guessing Lexicon Morphological analysis Machine Translation Ontologies Part-Of-Speech Tagging Question-answering Search engines Semantic Web technologies Speech Recognition Spell checking Statistical NLP Summarization Terminology extraction Web services Summary Impact on the Project Overview of the Project Applying the Workshop Results to

5 1 Executive Summary The deliverable contains a report on the results of a workshop on roadmaps for natural language processing. This workshop was held on the 23rd February 2003 in Berlin. There were participants from the consortium, as well as from other public institutions and industry. Consortium participants : Hans Uszkoreit (Saarland University) Melanie Siegel (Saarland University) Ulrich Callmeier (Saarland University) John Carroll (University of Sussex) Luca Dini (CELI s.r.l.) Andreas Eisele (Saarland University) Dan Flickinger (Stanford University) Klaus Netter (XtraMind GmbH, Saarbrücken) Participants from outside the consortium : Andrew Bredenkamp (Acrolinx GmbH, Berlin) Tania Avgustinova (DFKI GmbH, Saarbrücken) Stephan Busemann (DFKI GmbH, Saarbrücken) Gregor Erbach (Saarland University) Alex Fang (University of Sussex) Petter Haugereid (NTNU, Trondheim) Steven Krauwer (Universiteit Utrecht) Fabio Pianesi (ITC-IRST, Trento) Frederique Segond (XRCE, Grenoble) Manfred Stede (University of Potsdam) The participation of two members of the ELSNET initiative (Steven Krauwer and Stephan Busemann) documents the close cooperation with the broader roadmap activities of the European Network of Language and Speech. 2 General Idea Technology roadmaps have served as a useful instrument for high-tech research, development and marketing planning since the eighties. A road map comprises an analysis of the present situation, a vision of where we want to be in ten years from now, and a number 5

6 of milestones that would help in setting intermediate goals and in measuring progress towards our goals. In order to be of true value, it should also contain dependencies among technologies, decisions and milestones on the one hand and between scientific disciplines and industrial sectors on the other hand. The workshop brought together computational linguists and computer scientists both to report advances in human language technology, their application to knowledge management. The project aims to establish a roadmap for Language and Information Technologies for the next decade. The starting point for our workshop was innovative applications for language technology in the future. We asked the participants to very briefly describe applications that might be possible/desirable in some years. Innovation was the most important point in generating these ideas. During the discussion, we tried to answer the following questions: What problems does the application solve? How does the application look like? What kinds of technologies and resources are basic for this application and when will we be able to provide them? Who will be the users of the application? Who will we possibly cooperate with? What are the chances for this application? When could the application and its technological bases be realized? Do we want this kind of application? What strategies should we adopt to influence this technology? Based on the discussion results, we will set up a roadmap for language technology and possible applications. 3 Agenda of the Workshop Hans Uszkoreit Stephan Busemann Frederique Segond Dan Flickinger Introduction to the Roadmap Idea How to Write Roadmaps for Natural Language Processing Application Language Learning Application Language Learning 6

7 Gregor Erbach Application Voice Assistant Manfred Stede Application Document Authoring Andrew Bredenkamp Application Creation of Structured Knowledge Luca Dini Application Ubiquitous Business Intelligence Fabio Pianesi Application Context Matching Klaus Netter Application Communication Management Andreas Eisele Applications Dynamic Newspaper, Voicebased access to encyclopaedic knowledge, Intelligent capture of conversations and meetings 4 Workshop Report 4.1 Introduction Technology roadmaps have served as a useful instrument for high-tech research, development and marketing planning since the eighties. A roadmap comprises an analysis of the present situation, a vision of where we want to be in ten years from now, and a number of milestones that would help in setting intermediate goals and in measuring progress towards our goals. In order to be of true value, it should also contain dependencies among technologies, decisions and milestones on the one hand and between scientific disciplines and industrial sectors on the other hand. The workshop brought together computational linguists and computer scientists both to report advances in human language technology, their application to knowledge management and to establish a roadmap for Language and Information Technologies for the next decade. Based on experiences of previous roadmap workshops at ELSNET, we tried a new format, giving us the possibility for short and interesting presentations and intensive discussions. The starting point for our workshop was innovative applications for language technology in the future. We asked the participants to very briefly describe applications that might be possible/desirable in some years. The most important point in generating these ideas was innovation. We did not want to have a system demonstration of something that is available 7

8 now and being sold, but new ideas and perspectives for the future, based on discussions and ideas already being discussed in the research community. During the discussion, we tried to answer the following questions: What problems does the application solve? How does the application look like? Who will be the users of the application? What kinds of technologies and resources are basic for this application and when will we be able to provide them? When could the application and its technological bases be realized? Do we want this kind of application? What are the chances for this application and how can we influence these? What strategies should we adopt to influence this technology? Who will we possibly cooperate with? Participants of the Roadmap Workshop were: Tania Avgustinova, Andrew Bredenkamp, Stephan Busemann, Ulrich Callmeier, John Carroll, Luca Dini, Andreas Eisele, Gregor Erbach, Alex Fang, Dan Flickinger, Petter Haugereid, Steven Krauwer, Klaus Netter, Fabio Pianesi, Frederique Segond, Melanie Siegel, Manfred Stede and Hans Uszkoreit. This report reflects the innovative application ideas and the answers to the above mentioned questions. The underlying technologies are listed and their usage in the applications is explained. The presentations on applications and technologies are summarized and conclusions are drawn. These are applied to the project ideas. Stephan Busemann presented the ELSNET roadmap tool that allows to set up and visualize roadmaps. Based on the discussion results, we will set up a roadmap for language technology and possible applications. 4.2 Innovative NLP Application Ideas Language Learning Interactive e-language learning (presented by Frederique Segond) Applications and users 8

9 What problems does the application solve? A system for game-based language learning, using virtual-reality ideas and based on the notion of scenario. It is personalized towards the user: o thematically o concerning language ability o concerning the reason to learn a specific language (scenario) Information access on the web is added. Interactivity is pushed via NLP tools as well as VR. The system insists more on pushing people s performance rather than people competence. How does the application look like? There are robots and avatars with language ability in a virtual environment. The user has the possibility to chat. Exercises (for instance fill-in-a-gap) concerning speech acts, grammar, as well as interactive activities are included. The users can learn both a work process and a language. The products can be easily adapted to different audiences (schools, life-long training, edutainment, companies (work specifically on their own documents thanks to NLP technologies (part of the personalisation aspect)). Who will be the users of the application? Schools, ordinary peoples in life-long training, general public: edutainment, companies Technological development What kinds of technologies and resources are basic for this application and when will we be able to provide them? o Speech Recognition for different types of people (elderly, kids, ) o Voice over IP o Web Cam o Virtual reality o Information Retrieval o Language Guesser o Dictionary lookup in context, incl. syntactic and semantic disambiguation o Spell checking o Morphology o Grammar checking o Multilingual meta-search engines Basically all resources are already there. As far as NLP technologies are concerned speech technologies of better quality would be an asset. The same goes for more reliable grammar checker and semantic technologies. 9

10 When could the application and its technological bases be realized? It already exists. It can be improved with the technologies quoted above Steps towards realization Do we want this kind of application? Personally yes What are the chances for this application and how can we influence these? It looks (from language learning specialists reaction as well as partnerships proposals already on the table) that this type of applications has big chances to succeed in the near future. It is still seen as bit visionary mostly because of the need for training teachers with technologies. In general NLP can be a good asset for the e-learning market and not only for language learning. This being said the role of NLP in the market will remain modest (some components here and there). The way to influence is probable to do proselistism and inform e-learning actors (solution integrators and distributors) about linguistic technologies. What strategies should we adopt to influence this technology? Again establish contact with other research domains such as 3d communities, publishers, teachers (for pedagogy) etc. Who will we possibly cooperate with? Teachers, content publishers, language schools, e-learning companies Language learning (presented by Dan Flickinger) Applications and users What problems does the application solve? A system to correct the syntax of students of English. Teachers in California are not allowed to speak in other languages than English to students in classes, which should be overcome by an automatic language learning system. The setting shall be a classroom setting. Educational testing institutions need to evaluate the language ability of learners. How does the application look like? Use a template structure for a dialogue topic, use a mix of grammar-based and stochastic processing methods to robustly analyse the student's input (their turn in the dialogue), and provide advice and corrections on their input. Language competence is automatically tested. The system is adapted to the native language of the students. 10

11 Who will be the users of the application? Students of English as a 2 nd language Technological development What kinds of technologies and resources are basic for this application and when will we be able to provide them? o Robust parsing o Grammar-based generation o Discourse-level processing When could the application and its technological bases be realized? Steps towards realization What are the chances for this application and how can we influence these? There is a significant financial market, at least for English language testing Language learning support (presented by Melanie Siegel) Applications and users What problems does the application solve? Modern language learning requires dynamic adaptation to language changes (lexicon, topics, grammar). For better motivation, user-adapted material would be nice to have. How does the application look like? A user determines what text sort, topic and language levels s/he is interested in. The system adapts its teaching material: The user chooses a text/texts s/he wants to work with. These texts are annotated with hyperlinks containing vocabulary, translations (adapted to users need and native language) and trees. Vocabulary is extracted from texts; exercises are generated, based on earlier sessions of the student. Grammar exercises are generated from text material and based on earlier sessions. Questions to the text are automatically generated. The text is summarized. Included is search for grammatical phenomena in texts, text databases or the Internet. Grammar and style checking supports the user. Who will be the users of the application? Language learners Technological development What kinds of technologies and resources are basic for this application and when will we be able to provide them? o Large coverage morphology. 11

12 o Terminology extraction. o Large coverage robust deep processing. o Machine Translation. o Hyperlinking. o Grammar and style checking. When could the application and its technological bases be realized? Steps towards realization What are the chances for this application and how can we influence these? There should be a market for efficient and motivating language learning. It would be necessary to cooperate with language teachers having experiences and new ideas for motivating language education. What strategies should we adopt to influence this technology? Build prototypes and start projects. Who will we possibly cooperate with? Language teachers Personal Data Organization Voice Assistant (presented by Gregor Erbach) Applications and users What problems does the application solve? Information and service access without a mouse or keyboard, especially for mobile or handsfree situations. How does the application look like? For the user, it can be just a (mobile) phone or wearable PC, probably with a headset. The user will just talk into the microphone to access information and services. The application will ask questions for clarification or narrowing down the search, if necessary. The application will take over some of the functions of a semi-intelligent secretary. Who will be the users of the application? Average persons Technological development 12

13 What kinds of technologies and resources are basic for this application and when will we be able to provide them? o Large-vocabulary speech recognition o For web search: intelligent ways of partitioning large sets of search results, and verbalizing the partitions (needed for narrowing down the search in an interactive dialogue) o Improved question-answering capabilities o Open-domain information extraction from heterogeneous document collections o Dialogues about the user's document collection, about very large document spaces (web search results), and about web services When could the application and its technological bases be realized? but intermediate results can be of great use Steps towards realization What are the chances for this application and how can we influence these? Unlikely in the full form, as AI probably required. Who will we possibly cooperate with? ASR vendors, telcos, LT researchers Intelligent capture of conversations and meetings (presented by Andreas Eisele) Applications and users What problems does the application solve? During professional conversations and meetings, a large amount of important knowledge is presented and exchanged, but this knowledge is quite difficult to access for non-participants or even for participants, who often have diverging memories of what has been said. How does the application look like? Conversation is captured with cameras and microphones, and rendered accessible via searchable transcriptions. Optionally, long contributions can be presented in a summarized form or annotated with marginal keywords, so that it is easier to browse long records for the interesting parts. Who will be the users of the application? Professionals who work in variable teams and who want to optimise the efficiency of their meetings and work flow 13

14 Technological development What kinds of technologies and resources are basic for this application and when will we be able to provide them? o Sound and video recording techniques o Time-aligned storage of a meeting and the presentations that were given in it o Voice indexing, based on Large-vocabulary, noise-tolerant ASR When could the application and its technological bases be realized? One could start now, but high-quality ASR in a noisy environment and good summarization may take as long as 2015 to be feasible Steps towards realization Do we want this kind of application? Yes, if individuals retain the right to censor or post-edit their own contributions. What are the chances for this application and how can we influence these? Fairly high, as soon as ASR and meeting transcripts will be sufficiently good.. What strategies should we adopt to influence this technology? Start projects aiming at the improved accessibility to existing archives. Who will we possibly cooperate with? Vendors of video-conferencing tools, TV channels (who also would like to make their archives more accessible), ASR providers Document Authoring Document Authoring (presented by Manfred Stede) Applications and users What problems does the application solve? Help writing a letter/presentation in any language. How does the application look like? The application contains grammar and style checking. It supplies information about punctuation, text conventions (text structure, ), words and idioms. It is multilingual and integrated in a text processing system. Voice control is possible. The documents are active (being able to click on linguistic units). The system inserts standard phrases if wanted. A language learning ability is integrated. 14

15 Who will be the users of the application? Anyone who writes frequently, in various languages Technological development What kinds of technologies and resources are basic for this application and when will we be able to provide them? o Lexical resources, word nets plus idioms o Grammar o Text structure library o Rule-based and statistical analysis o Implemented theory of style o Integration of keyboard and voice o Integration of regular typing, template filling and canned text. When could the application and its technological bases be realized? Some ingredients are already realized in professional document authoring environments they have to be enhanced by the LT components and brought also to standard word processing software. An optimistic guess: The application development could be done within 5 years Steps towards realization Do we want this kind of application? Sure! E.g., for academic writing it will much improve the quality of publications (for authors who don t write in their mother tongue). What are the chances for this application and how can we influence these? The system should suggest, but never prescribe. It should show up and go away on clicks and should be user-adaptable. What strategies should we adopt to influence this technology? Push the word processing vendors toward integrating more LT make the potential benefits transparent to the potential users. Tackle individual questions and parts of the overall task in research projects Creation of Structured Knowledge (presented by Andrew Bredenkamp) Applications and users What problems does the application solve? 15

16 o The system creates structured information. While writing a text an intelligent agent automatically indexes, categorizes, hyperlinks. The application structures information given by a user Technological development What kinds of technologies and resources are basic for this application and when will we be able to provide them? o NE recognition o Normalization o Disambiguation o Hyperlinking o Indexing o Summarization. When could the application and its technological bases be realized? Intelligent Information Access Ubiquitous Business Intelligence (presented by Luca Dini) With the renewed interest for quantitative techniques for business decision-making, the application will let the user the possibility of understanding market, products, customers and opportunities along a contextualized and dynamically evolving dimension Applications and users What problems does the application solve? When having to take a decision in a business context quantitative information is crucial. Quantitative information must be 1) available; 2) easy to consult. In business life people are confronted with problems which are very often new. "How would the British market react to this product?" "What's the attitude of Spanish people towards this new mobile technology?" "What's the best selling period for a French wine on the Italian market?" The application will be able to solve all these kinds of questions either immediately if data is already available in a structured format, or with delays of hours/days if data needs to be extracted from text and passed to a database to be suitable for the application of data mining technologies. How does the application look like? The application will need a device for textual input (ranging from SMS, s, web text area, graphic interface of a program) and a graphical device for output, where results can be visualized either as graphs, tables, or navigable spaces. 16

17 Who will be the users of the application? Mostly Managers, ranging from SME to multinational companies Technological development What kinds of technologies and resources are basic for this application and when will we be able to provide them? o Information extraction for decoding the question. o Adaptive data mining technologies for building the optimal solution strategy. o Language-based web discovery for populating missing data from naturally occurring text. This in turn includes advanced spidering technologies, database technologies and in particular multilingual information extraction from web pages. o Semantic Web technologies for retrieving data already available in a machine digestible format. o Web services and service negotiation to gather needed data from on-line vendors. When could the application and its technological bases be realized? Two years after the moment when the Semantic Web will become a real thing and Web Services an industrial fact. While the availability of the Semantic Web is not a necessary condition from a technological point of view, it will serve to prepare the market to applications of the kind proposed here Steps towards realization Do we want this kind of application? This application will increase the need of advanced processing in several languages. The information extraction field will particularly benefit from it. What are the chances for this application and how can we influence these? It is hard to see the way in which the NLP community can influence the development of the application. The driving force must be represented by service companies active in the business intelligence/data mining domain. Being strategic corporate consultants they often have the force to convince the management to rely on less traditional technologies. What strategies should we adopt to influence this technology? Push research on advanced comprehension technologies. Couple them with advanced zoning techniques. Define standards where applications can exchange demands of gathering information and results from a gathering phase. Interact with web discovery technologies in order to enhance discovery of relevant sources. 17

18 Who will we possibly cooperate with? Data mining experts; Artificial Intelligence experts (or "problem solving experts"); Semantic Web/XML experts; System Integrators Context Matching (presented by Fabio Pianesi) Applications and users What problems does the application solve? Autonomous communities within an organization create their own (partial) conceptualisations of the world. They have local ontologies. The problem is to exchange meaningful information/knowledge among local communities and in co-operations. The local representations are due to changes, managed autonomous and sometimes disappear. How does the application look like? It answers the question: given concepts K1 and K2 in two different schemas, what is their relation? The relations are domain and context dependent. Included is a theory of domain. The system extracts candidate senses, using a lexical database (wordnet), disambiguates and filters senses, and contextualizes senses. This is solved as a SAT problem. The application is multilingual and contains of a module embeddable in knowledge-management systems, search engines, etc. Who will be the users of the application? Potentially everyone Technological development What kinds of technologies and resources are basic for this application and when will we be able to provide them? o Parsing o POS Tagging o Morphological analysis o WSD o Multilingual aligned repositories of senses, specialised terminologies, knowledge technologies When could the application and its technological bases be realized? Basics are already implemented. Simple ontologies can be available in More complex representation schemas shall then be exploited. 18

19 Communication Management (presented by Klaus Netter) Applications and users What problems does the application solve? The system supports the communication between customer and company. The problem of overflow in companies is addressed, where information is available, but must be found. How does the application look like? It is an auto response system. It supports service agents in a company and escalates s to experts, if necessary. The system automatically determines the optimal way to respond a request (based on customer time expert knowledge). It is multilingual. Who will be the users of the application? Companies Technological development What kinds of technologies and resources are basic for this application and when will we be able to provide them? o Advanced information retrieval o Parsing, sufficient analysis power to understand the nature of a customer request. o Generation When could the application and its technological bases be realized? Some basics are already available. An ideal version is expected in Job Search (presented by Melanie Siegel) Applications and users What problems does the application solve? People who search for jobs have a problem with job announcements that do not literally correspond to their job description or job designations they even don t know. Searching for a job in the German job centre is only possible with a code number that designs a concrete description, but does not include similar jobs. (There is no such code number for computational linguists) How does the application look like? The user writes a CV and personal profile. The machine searches for jobs that fit to the person in different online available sources (newsgroups, newspapers, internet sites), in different countries, if necessary. 19

20 Who will be the users of the application? Job searchers, job/career advisers Technological development What kinds of technologies and resources are basic for this application and when will we be able to provide them? o Deep processing of CV, personal profile and job announcements of various types. o Cross-language information extraction, template filling. When could the application and its technological bases be realized? First versions in Steps towards realization Do we want this kind of application? Yes, there are no objectives. What are the chances for this application and how can we influence these? Chances should be good as there are always many people searching for jobs and many new job titles arise. What strategies should we adopt to influence this technology? Further push information extraction technology. Who will we possibly cooperate with? Employment centre, job/career advisers Dynamic Newspaper (presented by Andreas Eisele) Applications and users What problems does the application solve? Newspapers only offer a tiny selection of available information that would be interesting to the reader. Typical readers read only a small fraction of the articles that cover topics of their special interest, but then would like to know more background about these topics. How does the application look like? Dynamic newspaper is distributed electronically, and read via handheld devices, electronic paper, or some other novel display technology. Published articles contain hyperlinks to background information. Additional background information can be requested using QA-type systems, possibly involving voice (+pen-based pointing) input. Interpretation of questions can profit from the context given in the article and in the preceding dialog. Alternatively, conventional paper-based newspaper can be made dynamic with the help of specialized 20

21 pointing device that is able to locate the exact position of the pointer on a printed page, plus some auxiliary display (or speech) device. The reading pattern of a reader can be mined for the user's interest profile, which can then be used to generate individually tailored versions of the newspaper. Who will be the users of the application? Average newspaper readers Technological development What kinds of technologies and resources are basic for this application and when will we be able to provide them? o Display technologies like electronic paper or lightweight/flexible flat screens o Infrastructure for (wireless) distribution of contents o Techniques for automatic document enrichment o Multi-document summarization o Broad-coverage QA technology o Voice-based dialogue technology When could the application and its technological bases be realized? Soon, if users are willing to use relatively heavy devices and/or pay relative high prices for accessing the information. Advanced QA and dialog facilities could be available perhaps in Steps towards realization Do we want this kind of application? Yes, if hardware is resistant against spilled coffee and crumbs. What are the chances for this application and how can we influence these? Web-based newspapers and magazines already exist, although the market penetration is still low. Research into broad-coverage QA and spoken dialogue technology could make these systems more attractive. What strategies should we adopt to influence this technology? Start projects to improve the required NLP technology. One can start with huge existing news archives such as the recently released Gigaword corpus. Who will we possibly cooperate with? Newspaper publishers, news agencies and providers of novel display technologies. 21

22 Voice-based access to encyclopaedic knowledge (presented by Andreas Eisele) Applications and users What problems does the application solve? Encyclopaedias in form of books are heavy and tedious to use, computer-based encyclopaedias currently require keyboard and large display. How does the application look like? The application runs on a PDA or mobile phone with graphics display. The user can ask a simple question, like "When was Napoleon born?" "What are the most important languages in India? to which the system gives a concise answer and offers to explain further details, and/or enters into a clarification dialogue if the question was ambiguous. Answers are displayed on a small, but high-resolution screen and summaries are spoken. Both forms of presentations contain (visible or audible) pointers to additional information. User can follow these links by pen- or voice-based input or combinations thereof, or can ask free-form followup questions, which are interpreted in the context of the dialog so far. Who will be the users of the application? Ordinary people, including pupils and students Technological development What kinds of technologies and resources are basic for this application and when will we be able to provide them? o Very-large-vocabulary ASR (speaker-adaptive, context-aware). o Question-answering and dialogue techniques. o Further miniaturization of memory devices. o Better displays in portable devices. o Infrastructure for (wireless) access to non-local content. When could the application and its technological bases be realized? First versions by 2005, the ultimate version around Steps towards realization What are the chances for this application and how can we influence these? Chances are fairly high; we could influence them by starting soon to build good prototypes. What strategies should we adopt to influence this technology? 22

23 One should adopt a staged development/early deployment strategy, where the easier parts of the functionality are already marketed and feed R&D required for the more sophisticated parts. Who will we possibly cooperate with? ASR vendors, vendors of hand-held devices, content providers, LT researchers. 4.3 Technologies underlying the proposed applications This chapter lists the technologies that are mentioned as being underlying to the presented application ideas. The given year comes from estimations of the presenters of the application ideas. The answered question was: When will the technology be available, such that it can be inserted into a complex, commercial application and it is available to the public? Attached is a technology definition from the language technology information centre LT- World ( if available Grammar-based generation Year: before 2006 Used in application: Language Learning, Communication Management In Language Learning, correct sentences have to be generated out of incorrect sentences of the students. In Communication Management, generation is used for the automatic production of responses. Definition in LT-World: The field of Natural Language Generation (NLG) is concerned with building computer software systems which can produce meaningful texts in human languages from some underlying non-linguistic representation of information. For document production, NLG systems use knowledge about human languages and possibly the application domain. NLG components are used for e.g. automatic report generation, authoring, concept-to-speech and machine translation systems. See also the corresponding HLT-Survey Section: Data mining technologies Used in application: Ubiquitous Business Intelligence Adaptive data mining technologies are used in Ubiquitous Business Intelligence for building the optimal solution strategy. 23

24 Definition in LT-World: Text data mining concerns the application of data mining (knowledge discovery in databases, KDD) to unstructured textual data. The goal of data mining is to discover or derive new information from data, finding patterns across datasets, and/or separating signal from noise. Core text mining algorithms decompose text in meaningful chunks that can then be used for true data mining purposes Large coverage robust deep processing Year: 2003, technology is already available; 2006 with a better coverage and robustness for some languages, others follow. Used in applications: Language learning support, Language learning, Job search, Communication Management, Context Matching, Document Authoring In Language Learning Support, deep processing is used for the search for examples of grammatical phenomena in texts, text annotations and automatic text summarization. In Language Learning, deep processing is used for robust analysis and grammar checking of student s input. The Job Search application needs deep processing for the CV, personal profile and job announcements of various types. The Communication Management Application needs deep processing for high precision information extraction. The processing needs to have sufficient analysis power to understand the nature of a customer request. In Context Matching, it is used for disambiguation. The Document Authoring application uses deep processing for grammar and style checking and integrated language learning. All applications using deep processing insist on large coverage, robustness and multilinguality. In order to achieve coverage and robustness, it will be useful to combine deep processing methods with shallow ones, such as stochastic models. For multilinguality, one might consider using HPSG-based grammars. Definition in LT-World: Parsing (from Latin "pars orationis" = parts of speech) is the syntactic analysis of languages. Natural Language Parsing is the syntactic analysis of natural languages, such as Finnish or Chinese. The objective of Natural Language is to determine parts of sentences (such as verbs, noun phrases, or relative clauses), and the relationships between then (such as subject or object). Unlike parsing of formally defined artificial languages (such as Java or predicate logic), parsing of natural languages presents problems due to ambiguity, and the productive and creative use of language. See also the corresponding HLT-Survey Section: 24

25 4.3.4 Discourse and Dialogue Processing Year: before 2006 with enough quality for Language Learning, 2050 for the Voice Assistant. Used in applications: Voice Assistant, Language Learning, Document Authoring, Dynamic Newspaper The Voice Assistant application uses discourse and dialogue processing for dialogues about the user's document collection, about very large document spaces (web search results), and about web services. This could be reached in In Language Learning, it is used to set up a template structure for a dialogue topic, which might be reached earlier. Definition in LT-World: A Discourse is a piece of language including more than one sentence. A Dialogue is a linguistic exchange involving more than one participant. Discourse and dialogue therefore encompasses almost all non-local phenomena in language, but in particular discourse coherence, anaphoric dependencies, dialogue structure and the relation between questions and answers. The most obvious practical application in this area is dialogue systems and, more recently, spoken dialogue systems. See also the corresponding HLT-Survey Section: Grammar and style checking Year basics 2003, better performance 2010 Used in applications: Language Learning Support, Language Learning, Document Authoring Both Language Learning applications use grammar and style checking for the correction and evaluation of the student s input. The Document Authoring application uses it for supporting the author of a document in correcting his/her mistakes in syntax or text conventions. The applications require multilinguality. Definition in LT-World: Language Checking comprises technologies used to detect and/or correct erroneous or inconsistent language use in documents. The scope of language checking technology ranges from general error correction, as performed by spell checkers and grammar checkers, to the implementation of corporate styles and terminology control (controlled language). Benefits of controlled languages are the enhancement of consistency within and across documents and the reduction of ambiguity and vagueness, yielding documents which are easier to process by both humans and 25

26 machines. See also the corresponding HLT-Survey Section: Automatic Hyperlinking Year 2005 Used in application: Language Learning Support, Creation of Structured Knowledge Language Learning Support uses automatic hyperlinking for the annotation of texts with hyperlinks containing vocabulary, translations and trees. In the application Creation of Structured Knowledge, structured information is created in a way that while writing a text an intelligent agent automatically indexes, categorizes, hyperlinks Information Retrieval Year: 2003 Used in application: Language Learning, Communication Management Information retrieval technologies are used in the Language Learning application and the Communication Management application for information access on the web, where lots of information is available, but must be found. The current mechanisms are already sufficient for the language learning, but must be extended for use in Communication Management. Definition in LT-World: Cross-language information retrieval means using queries in one language to search for documents in a different language. Multilingual information retrieval is a broader term, which includes the case where queries in different languages are used, but only for searching documents in the same language. See also the corresponding HLT-Survey Section: Information Extraction Year: 2003, some technology already available, 2018 with a better performance. Used in applications: Ubiquitous Business Intelligence, Voice Assistant, Job Search, Creation of Structured Knowledge The application Ubiquitous Business Intelligence needs information extraction for decoding the questions sent to the system and multilingual information extraction from web pages. A basic peculiarity of the Voice Assistant application is information and service access. It needs open-domain information extraction from heterogeneous document collections. The Job Search machine searches for jobs that fit to the person in different online available sources. 26

27 Here, cross-lingual information extraction is useful. The Creation of Structured Knowledge needs NE recognition techniques that are already available. Definition in LT-World: The goal of information extraction (IE) is to build systems that find and link relevant information from natural language text ignoring irrelevant information. The information of interest is typically pre-specified in form of uninstantiated frame-like structures also called templates. The templates are domain and task specific. The major task of an IE-system is then the identification of the relevant parts of the text which are used to fill a template's slots. See also the corresponding HLT-Survey Section: Language guessing Year: 2003 Used in application: Language Learning Language guessing is used in the Language Learning application for guessing the student s native language Lexicon Year: 2003 Used in application: Language Learning, Document Authoring Lexicon technologies are used in the Language Learning application for dictionary lookup in context, including syntactic and semantic disambiguation. In Document Authoring, lexical resources, word nets plus idioms are needed Morphological analysis Year 2003, morphological analysis is available for some languages, others have to follow for more languages and with a better performance. Used in application: Language Learning Support, Language Learning, Context Matching Large coverage multilingual morphological analysis is used in the Language Learning applications for text annotation with hyperlinks containing vocabulary, translations (adapted to users need and native language) and trees, for vocabulary extraction from texts and for generation of grammar exercises. The Context Matching application uses the technology for the set-up of local ontologies. 27

28 Definition in LT-World: The technologies for or the process of tracing the inflectional, derivational, and compounding processes in the formation of a given word in order to determine properties such as stem form, part-of-speech and inflectional information. As a crucial preprocessing step, morphological analysis is used in virtually all fields of natural language processing. See also the corresponding pages in the HLT-Survey Section: Machine Translation Year: 2003, as some technology is already available. MT will underly continuous amendments, 2012 with a good performance for some language pairs. Other languages have to follow. Used in application: Language Learning Support The application Language Learning Support uses machine translation for annotating texts with translations (adapted to users need and native language). Definition in LT-World: See also the corresponding HLT-Survey Section: Ontologies Year: simple ontologies in 2004, more complex representation schemas:?? Used in application: Context Matching The Context Matching application states that communities have their local (languagespecific) ontologies that must be matched, when they are communicating and co-operating. Multilingual aligned repositories of senses, specialised terminologies and knowledge technologies are needed. Local multilingual ontologies have to be interconnected. The technology to extract candidate senses, using a lexical database (wordnet), disambiguate and filter senses and to contextualize senses must be set up. Definition in LT-World: What are Ontologies? And why are they important for NLP? From a theoretical point of view, ontology is the metaphysical study of the nature of being and existence. In practice, an 28

29 ontology is normally viewed as a formal representation of all semantic objects and their connections in a Universe of Discourse. Mapping these semantic objects onto language units (words, phrases, text segments, etc.) is the task of semantic processing in NLP Part-Of-Speech Tagging Year: 2003, technology is already available Used in application: Context Matching POS-Tagging in the Context Matching application is used as a pre-processing to lexical disambiguation. Definition in LT-World: The technologies for or the process of determining the correct part-of-speech tag for a word given its local context. The task comprises disambiguation of multiple part-of-speech tags and guessing of the correct part-of-speech tag for unknown words. Part-of-speech tagging is frequently used as a pre-processing step for shallow and deep parsers. See also the related HLT-Survey Section: Question-answering Year First versions in 2005, improved question answering capabilities in 2009, advanced QA facilities perhaps in Used in application Voice Assistant, Dynamic Newspaper, Voice-Based Access to Encyclopaedic Knowledge The Voice Assistant application needs improved question answering capabilities to access information and services. The application will ask questions for clarification or narrowing down the search, if necessary. In the Dynamic Newspaper application, additional background information to newspaper articles can be requested using question-answering type systems. Broad-coverage QA technology is needed here. Definition in LT-World: Answer extraction (AE) aims at retrieving those exact passages of a document that directly answer a given user question. AE is more ambitious than information retrieval and information extraction in that the retrieval results are phrases, not entire documents, and in that the queries may be arbitrarily specific. It is less ambitious than full-fledged question 29

Deliverable 4.6 Architecture Specification and Mock-up System

Deliverable 4.6 Architecture Specification and Mock-up System DEEPTHOUGHT Hybrid Deep and Shallow Methods for Knowledge-Intensive Information Extraction Deliverable 4.6 Architecture Specification and Mock-up System The Consortium October 2003 I II PROJECT REF. NO.

More information

Deliverable D1.4 Report Describing Integration Strategies and Experiments

Deliverable D1.4 Report Describing Integration Strategies and Experiments DEEPTHOUGHT Hybrid Deep and Shallow Methods for Knowledge-Intensive Information Extraction Deliverable D1.4 Report Describing Integration Strategies and Experiments The Consortium October 2004 Report Describing

More information

CACAO PROJECT AT THE 2009 TASK

CACAO PROJECT AT THE 2009 TASK CACAO PROJECT AT THE TEL@CLEF 2009 TASK Alessio Bosca, Luca Dini Celi s.r.l. - 10131 Torino - C. Moncalieri, 21 alessio.bosca, dini@celi.it Abstract This paper presents the participation of the CACAO prototype

More information

3 Publishing Technique

3 Publishing Technique Publishing Tool 32 3 Publishing Technique As discussed in Chapter 2, annotations can be extracted from audio, text, and visual features. The extraction of text features from the audio layer is the approach

More information

Taxonomies and controlled vocabularies best practices for metadata

Taxonomies and controlled vocabularies best practices for metadata Original Article Taxonomies and controlled vocabularies best practices for metadata Heather Hedden is the taxonomy manager at First Wind Energy LLC. Previously, she was a taxonomy consultant with Earley

More information

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 1 Department of Electronics & Comp. Sc, RTMNU, Nagpur, India 2 Department of Computer Science, Hislop College, Nagpur,

More information

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent

More information

Enhanced retrieval using semantic technologies:

Enhanced retrieval using semantic technologies: Enhanced retrieval using semantic technologies: Ontology based retrieval as a new search paradigm? - Considerations based on new projects at the Bavarian State Library Dr. Berthold Gillitzer 28. Mai 2008

More information

Text Mining. Representation of Text Documents

Text Mining. Representation of Text Documents Data Mining is typically concerned with the detection of patterns in numeric data, but very often important (e.g., critical to business) information is stored in the form of text. Unlike numeric data,

More information

Text Mining: A Burgeoning technology for knowledge extraction

Text Mining: A Burgeoning technology for knowledge extraction Text Mining: A Burgeoning technology for knowledge extraction 1 Anshika Singh, 2 Dr. Udayan Ghosh 1 HCL Technologies Ltd., Noida, 2 University School of Information &Communication Technology, Dwarka, Delhi.

More information

Natural Language Processing with PoolParty

Natural Language Processing with PoolParty Natural Language Processing with PoolParty Table of Content Introduction to PoolParty 2 Resolving Language Problems 4 Key Features 5 Entity Extraction and Term Extraction 5 Shadow Concepts 6 Word Sense

More information

AN OVERVIEW OF SEARCHING AND DISCOVERING WEB BASED INFORMATION RESOURCES

AN OVERVIEW OF SEARCHING AND DISCOVERING WEB BASED INFORMATION RESOURCES Journal of Defense Resources Management No. 1 (1) / 2010 AN OVERVIEW OF SEARCHING AND DISCOVERING Cezar VASILESCU Regional Department of Defense Resources Management Studies Abstract: The Internet becomes

More information

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A.

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. Knowledge Retrieval Franz J. Kurfess Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. 1 Acknowledgements This lecture series has been sponsored by the European

More information

First Version of Grammar Matrix

First Version of Grammar Matrix DEEPTHOUGHT Hybrid Deep and Shallow Methods for Knowldege-Intensive Information Extraction Deliverable 3.1 First Version of Grammar Matrix The DeepThought Consortium March 2003 DeepThought IST-2000-30161

More information

INFORMATION RETRIEVAL SYSTEM: CONCEPT AND SCOPE

INFORMATION RETRIEVAL SYSTEM: CONCEPT AND SCOPE 15 : CONCEPT AND SCOPE 15.1 INTRODUCTION Information is communicated or received knowledge concerning a particular fact or circumstance. Retrieval refers to searching through stored information to find

More information

Version 11

Version 11 The Big Challenges Networked and Electronic Media European Technology Platform The birth of a new sector www.nem-initiative.org Version 11 1. NEM IN THE WORLD The main objective of the Networked and Electronic

More information

TERMINOLOGY MANAGEMENT DURING TRANSLATION PROJECTS: PROFESSIONAL TESTIMONY

TERMINOLOGY MANAGEMENT DURING TRANSLATION PROJECTS: PROFESSIONAL TESTIMONY LINGUACULTURE, 1, 2010 TERMINOLOGY MANAGEMENT DURING TRANSLATION PROJECTS: PROFESSIONAL TESTIMONY Nancy Matis Abstract This article briefly presents an overview of the author's experience regarding the

More information

Content Enrichment. An essential strategic capability for every publisher. Enriched content. Delivered.

Content Enrichment. An essential strategic capability for every publisher. Enriched content. Delivered. Content Enrichment An essential strategic capability for every publisher Enriched content. Delivered. An essential strategic capability for every publisher Overview Content is at the centre of everything

More information

Work-ready skills in Business, Administration and IT

Work-ready skills in Business, Administration and IT Work-ready skills in Business, Administration and IT A guide for centres We believe in learning At the core of everything we do is the desire to make a measurable impact on improving people s lives through

More information

SEMANTIC WEB POWERED PORTAL INFRASTRUCTURE

SEMANTIC WEB POWERED PORTAL INFRASTRUCTURE SEMANTIC WEB POWERED PORTAL INFRASTRUCTURE YING DING 1 Digital Enterprise Research Institute Leopold-Franzens Universität Innsbruck Austria DIETER FENSEL Digital Enterprise Research Institute National

More information

TE Teacher s Edition PE Pupil Edition Page 1

TE Teacher s Edition PE Pupil Edition Page 1 Standard 4 WRITING: Writing Process Students discuss, list, and graphically organize writing ideas. They write clear, coherent, and focused essays. Students progress through the stages of the writing process

More information

Knowledge Engineering with Semantic Web Technologies

Knowledge Engineering with Semantic Web Technologies This file is licensed under the Creative Commons Attribution-NonCommercial 3.0 (CC BY-NC 3.0) Knowledge Engineering with Semantic Web Technologies Lecture 5: Ontological Engineering 5.3 Ontology Learning

More information

EFFICIENT INTEGRATION OF SEMANTIC TECHNOLOGIES FOR PROFESSIONAL IMAGE ANNOTATION AND SEARCH

EFFICIENT INTEGRATION OF SEMANTIC TECHNOLOGIES FOR PROFESSIONAL IMAGE ANNOTATION AND SEARCH EFFICIENT INTEGRATION OF SEMANTIC TECHNOLOGIES FOR PROFESSIONAL IMAGE ANNOTATION AND SEARCH Andreas Walter FZI Forschungszentrum Informatik, Haid-und-Neu-Straße 10-14, 76131 Karlsruhe, Germany, awalter@fzi.de

More information

An introduction to machine translation: What, when, why and how?

An introduction to machine translation: What, when, why and how? An introduction to machine translation: What, when, why and how? WHITE PAPER Capita Translation and interpreting Contents Introduction 4 What is machine translation (MT)? 5 - How does it work? When is

More information

Parmenides. Semi-automatic. Ontology. construction and maintenance. Ontology. Document convertor/basic processing. Linguistic. Background knowledge

Parmenides. Semi-automatic. Ontology. construction and maintenance. Ontology. Document convertor/basic processing. Linguistic. Background knowledge Discover hidden information from your texts! Information overload is a well known issue in the knowledge industry. At the same time most of this information becomes available in natural language which

More information

Web Services in Language Technology and Terminology Management

Web Services in Language Technology and Terminology Management Web Services in Language Technology and Terminology Management Uwe Quasthoff, Christian Wolff Leipzig University Computer Science Institute, NLP Dept. Augustusplatz 10/11 04109 Leipzig, Germany {quasthoff,

More information

Final Project Discussion. Adam Meyers Montclair State University

Final Project Discussion. Adam Meyers Montclair State University Final Project Discussion Adam Meyers Montclair State University Summary Project Timeline Project Format Details/Examples for Different Project Types Linguistic Resource Projects: Annotation, Lexicons,...

More information

CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS

CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS 82 CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS In recent years, everybody is in thirst of getting information from the internet. Search engines are used to fulfill the need of them. Even though the

More information

Google indexed 3,3 billion of pages. Google s index contains 8,1 billion of websites

Google indexed 3,3 billion of pages. Google s index contains 8,1 billion of websites Access IT Training 2003 Google indexed 3,3 billion of pages http://searchenginewatch.com/3071371 2005 Google s index contains 8,1 billion of websites http://blog.searchenginewatch.com/050517-075657 Estimated

More information

5 Choosing keywords Initially choosing keywords Frequent and rare keywords Evaluating the competition rates of search

5 Choosing keywords Initially choosing keywords Frequent and rare keywords Evaluating the competition rates of search Seo tutorial Seo tutorial Introduction to seo... 4 1. General seo information... 5 1.1 History of search engines... 5 1.2 Common search engine principles... 6 2. Internal ranking factors... 8 2.1 Web page

More information

Data and Information Integration: Information Extraction

Data and Information Integration: Information Extraction International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Data and Information Integration: Information Extraction Varnica Verma 1 1 (Department of Computer Science Engineering, Guru Nanak

More information

Requirements. Chapter Learning objectives of this chapter. 2.2 Definition and syntax

Requirements. Chapter Learning objectives of this chapter. 2.2 Definition and syntax Chapter 2 Requirements A requirement is a textual description of system behaviour. A requirement describes in plain text, usually English, what a system is expected to do. This is a basic technique much

More information

Jumpstarting the Semantic Web

Jumpstarting the Semantic Web Jumpstarting the Semantic Web Mark Watson. Copyright 2003, 2004 Version 0.3 January 14, 2005 This work is licensed under the Creative Commons Attribution-NoDerivs-NonCommercial License. To view a copy

More information

Cloze Wizard Version 2.0

Cloze Wizard Version 2.0 Cloze Wizard Version 2.0 Rush Software 1991-2005 Proofing and Testing By Simon Fitzgibbons www.rushsoftware.com.au support@rushsoftware.com.au CONTENTS Overview... p 3 Technical Support... p 4 Installation...

More information

Eleven+ Views of Semantic Search

Eleven+ Views of Semantic Search Eleven+ Views of Semantic Search Denise A. D. Bedford, Ph.d. Goodyear Professor of Knowledge Management Information Architecture and Knowledge Management Kent State University Presentation Focus Long-Term

More information

Enterprise Multimedia Integration and Search

Enterprise Multimedia Integration and Search Enterprise Multimedia Integration and Search José-Manuel López-Cobo 1 and Katharina Siorpaes 1,2 1 playence, Austria, 2 STI Innsbruck, University of Innsbruck, Austria {ozelin.lopez, katharina.siorpaes}@playence.com

More information

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.

More information

Natural Language Requirements

Natural Language Requirements Natural Language Requirements Software Verification and Validation Laboratory Requirement Elaboration Heuristic Domain Model» Requirement Relationship Natural Language is elaborated via Requirement application

More information

Information Retrieval CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Information Retrieval CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science Information Retrieval CS 6900 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Information Retrieval Information Retrieval (IR) is finding material of an unstructured

More information

Only the original curriculum in Danish language has legal validity in matters of discrepancy

Only the original curriculum in Danish language has legal validity in matters of discrepancy CURRICULUM Only the original curriculum in Danish language has legal validity in matters of discrepancy CURRICULUM OF 1 SEPTEMBER 2007 FOR THE BACHELOR OF ARTS IN INTERNATIONAL BUSINESS COMMUNICATION (BA

More information

Question Answering Systems

Question Answering Systems Question Answering Systems An Introduction Potsdam, Germany, 14 July 2011 Saeedeh Momtazi Information Systems Group Outline 2 1 Introduction Outline 2 1 Introduction 2 History Outline 2 1 Introduction

More information

Conceptual document indexing using a large scale semantic dictionary providing a concept hierarchy

Conceptual document indexing using a large scale semantic dictionary providing a concept hierarchy Conceptual document indexing using a large scale semantic dictionary providing a concept hierarchy Martin Rajman, Pierre Andrews, María del Mar Pérez Almenta, and Florian Seydoux Artificial Intelligence

More information

structure of the presentation Frame Semantics knowledge-representation in larger-scale structures the concept of frame

structure of the presentation Frame Semantics knowledge-representation in larger-scale structures the concept of frame structure of the presentation Frame Semantics semantic characterisation of situations or states of affairs 1. introduction (partially taken from a presentation of Markus Egg): i. what is a frame supposed

More information

Please note: Only the original curriculum in Danish language has legal validity in matters of discrepancy. CURRICULUM

Please note: Only the original curriculum in Danish language has legal validity in matters of discrepancy. CURRICULUM Please note: Only the original curriculum in Danish language has legal validity in matters of discrepancy. CURRICULUM CURRICULUM OF 1 SEPTEMBER 2008 FOR THE BACHELOR OF ARTS IN INTERNATIONAL BUSINESS COMMUNICATION:

More information

Making Sense Out of the Web

Making Sense Out of the Web Making Sense Out of the Web Rada Mihalcea University of North Texas Department of Computer Science rada@cs.unt.edu Abstract. In the past few years, we have witnessed a tremendous growth of the World Wide

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A SURVEY ON WEB CONTENT MINING DEVEN KENE 1, DR. PRADEEP K. BUTEY 2 1 Research

More information

From Open Data to Data- Intensive Science through CERIF

From Open Data to Data- Intensive Science through CERIF From Open Data to Data- Intensive Science through CERIF Keith G Jeffery a, Anne Asserson b, Nikos Houssos c, Valerie Brasse d, Brigitte Jörg e a Keith G Jeffery Consultants, Shrivenham, SN6 8AH, U, b University

More information

Chapter 27 Introduction to Information Retrieval and Web Search

Chapter 27 Introduction to Information Retrieval and Web Search Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval

More information

Text Mining for Software Engineering

Text Mining for Software Engineering Text Mining for Software Engineering Faculty of Informatics Institute for Program Structures and Data Organization (IPD) Universität Karlsruhe (TH), Germany Department of Computer Science and Software

More information

ES01-KA

ES01-KA Technological Empowerment for VET trainers. An Open Educational Resource (OER) to train VET trainers in the design and use of m-learning methodologies. Dragon Dictation Dragon Dictation for the iphone

More information

Introduction to Text Mining. Hongning Wang

Introduction to Text Mining. Hongning Wang Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:

More information

Share.TEC System Architecture

Share.TEC System Architecture Share.TEC System Architecture Krassen Stefanov 1, Pavel Boytchev 2, Alexander Grigorov 3, Atanas Georgiev 4, Milen Petrov 5, George Gachev 6, and Mihail Peltekov 7 1,2,3,4,5,6,7 Faculty of Mathematics

More information

THINGS YOU NEED TO KNOW ABOUT USER DOCUMENTATION DOCUMENTATION BEST PRACTICES

THINGS YOU NEED TO KNOW ABOUT USER DOCUMENTATION DOCUMENTATION BEST PRACTICES 5 THINGS YOU NEED TO KNOW ABOUT USER DOCUMENTATION DOCUMENTATION BEST PRACTICES THIS E-BOOK IS DIVIDED INTO 5 PARTS: 1. WHY YOU NEED TO KNOW YOUR READER 2. A USER MANUAL OR A USER GUIDE WHAT S THE DIFFERENCE?

More information

Sustainability of Text-Technological Resources

Sustainability of Text-Technological Resources Sustainability of Text-Technological Resources Maik Stührenberg, Michael Beißwenger, Kai-Uwe Kühnberger, Harald Lüngen, Alexander Mehler, Dieter Metzing, Uwe Mönnich Research Group Text-Technological Overview

More information

Maximizing the Value of STM Content through Semantic Enrichment. Frank Stumpf December 1, 2009

Maximizing the Value of STM Content through Semantic Enrichment. Frank Stumpf December 1, 2009 Maximizing the Value of STM Content through Semantic Enrichment Frank Stumpf December 1, 2009 What is Semantics and Semantic Processing? Content Knowledge Framework Technology Framework Search Text Images

More information

Information Retrieval CSCI

Information Retrieval CSCI Information Retrieval CSCI 4141-6403 My name is Anwar Alhenshiri My email is: anwar@cs.dal.ca I prefer: aalhenshiri@gmail.com The course website is: http://web.cs.dal.ca/~anwar/ir/main.html 5/6/2012 1

More information

Cheshire 3 Framework White Paper: Implementing Support for Digital Repositories in a Data Grid Environment

Cheshire 3 Framework White Paper: Implementing Support for Digital Repositories in a Data Grid Environment Cheshire 3 Framework White Paper: Implementing Support for Digital Repositories in a Data Grid Environment Paul Watry Univ. of Liverpool, NaCTeM pwatry@liverpool.ac.uk Ray Larson Univ. of California, Berkeley

More information

May Read&Write 5 Gold for Mac Beginners Guide

May Read&Write 5 Gold for Mac Beginners Guide May 2012 Read&Write 5 Gold for Mac Beginners Guide Read&Write 5 Gold for Mac INTRODUCTION... 3 SPEECH... 4 SPELLING... 6 PREDICTION... 8 DICTIONARY... 10 PICTURE DICTIONARY... 12 SOUNDS LIKE AND CONFUSABLE

More information

Work-ready skills in Business, Administration and IT

Work-ready skills in Business, Administration and IT Work-ready skills in Business, Administration and IT We believe in learning At the core of everything we do is the desire to make a measurable impact on improving people s lives through learning. Pearson

More information

6.001 Notes: Section 15.1

6.001 Notes: Section 15.1 6.001 Notes: Section 15.1 Slide 15.1.1 Our goal over the next few lectures is to build an interpreter, which in a very basic sense is the ultimate in programming, since doing so will allow us to define

More information

Assessing the Quality of Natural Language Text

Assessing the Quality of Natural Language Text Assessing the Quality of Natural Language Text DC Research Ulm (RIC/AM) daniel.sonntag@dfki.de GI 2004 Agenda Introduction and Background to Text Quality Text Quality Dimensions Intrinsic Text Quality,

More information

PROJECT FINAL REPORT. Tel: Fax:

PROJECT FINAL REPORT. Tel: Fax: PROJECT FINAL REPORT Grant Agreement number: 262023 Project acronym: EURO-BIOIMAGING Project title: Euro- BioImaging - Research infrastructure for imaging technologies in biological and biomedical sciences

More information

Information Extraction Techniques in Terrorism Surveillance

Information Extraction Techniques in Terrorism Surveillance Information Extraction Techniques in Terrorism Surveillance Roman Tekhov Abstract. The article gives a brief overview of what information extraction is and how it might be used for the purposes of counter-terrorism

More information

Please note: Only the original curriculum in Danish language has legal validity in matters of discrepancy. CURRICULUM

Please note: Only the original curriculum in Danish language has legal validity in matters of discrepancy. CURRICULUM Please note: Only the original curriculum in Danish language has legal validity in matters of discrepancy. CURRICULUM CURRICULUM OF 1 SEPTEMBER 2008 FOR THE BACHELOR OF ARTS IN INTERNATIONAL COMMUNICATION:

More information

Mission-Critical Customer Service. 10 Best Practices for Success

Mission-Critical  Customer Service. 10 Best Practices for Success Mission-Critical Email Customer Service 10 Best Practices for Success Introduction When soda cans and chocolate wrappers start carrying email contact information, you know that email-based customer service

More information

An Ontology Based Question Answering System on Software Test Document Domain

An Ontology Based Question Answering System on Software Test Document Domain An Ontology Based Question Answering System on Software Test Document Domain Meltem Serhatli, Ferda N. Alpaslan Abstract Processing the data by computers and performing reasoning tasks is an important

More information

The Metadata Assignment and Search Tool Project. Anne R. Diekema Center for Natural Language Processing April 18, 2008, Olin Library 106G

The Metadata Assignment and Search Tool Project. Anne R. Diekema Center for Natural Language Processing April 18, 2008, Olin Library 106G The Metadata Assignment and Search Tool Project Anne R. Diekema Center for Natural Language Processing April 18, 2008, Olin Library 106G Anne Diekema DEE-ku-ma Assistant Research Professor School of Information

More information

Whole World OLIF. Version 3.0 of the Versatile Language Data Format

Whole World OLIF. Version 3.0 of the Versatile Language Data Format Whole World OLIF Version 3.0 of the Versatile Language Data Format Christian Lieske, SAP Language Services, SAP AG Susan McCormick, PhD, Linguistic Consultant tekom Annual Conference, November 2007 Caveat:

More information

It s time for a semantic engine!

It s time for a semantic engine! It s time for a semantic engine! Ido Dagan Bar-Ilan University, Israel 1 Semantic Knowledge is not the goal it s a primary mean to achieve semantic inference! Knowledge design should be derived from its

More information

XML in the bipharmaceutical

XML in the bipharmaceutical XML in the bipharmaceutical sector XML holds out the opportunity to integrate data across both the enterprise and the network of biopharmaceutical alliances - with little technological dislocation and

More information

A Short Introduction to CATMA

A Short Introduction to CATMA A Short Introduction to CATMA Outline: I. Getting Started II. Analyzing Texts - Search Queries in CATMA III. Annotating Texts (collaboratively) with CATMA IV. Further Search Queries: Analyze Your Annotations

More information

A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet

A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet Joerg-Uwe Kietz, Alexander Maedche, Raphael Volz Swisslife Information Systems Research Lab, Zuerich, Switzerland fkietz, volzg@swisslife.ch

More information

To search and summarize on Internet with Human Language Technology

To search and summarize on Internet with Human Language Technology To search and summarize on Internet with Human Language Technology Hercules DALIANIS Department of Computer and System Sciences KTH and Stockholm University, Forum 100, 164 40 Kista, Sweden Email:hercules@kth.se

More information

Writing for the web and SEO. University of Manchester Humanities T4 Guides Writing for the web and SEO Page 1

Writing for the web and SEO. University of Manchester Humanities T4 Guides Writing for the web and SEO Page 1 Writing for the web and SEO University of Manchester Humanities T4 Guides Writing for the web and SEO Page 1 Writing for the web and SEO Writing for the web and SEO... 2 Writing for the web... 3 Change

More information

clarin:el an infrastructure for documenting, sharing and processing language data

clarin:el an infrastructure for documenting, sharing and processing language data clarin:el an infrastructure for documenting, sharing and processing language data Stelios Piperidis, Penny Labropoulou, Maria Gavrilidou (Athena RC / ILSP) the problem 19/9/2015 ICGL12, FU-Berlin 2 use

More information

EISAS Enhanced Roadmap 2012

EISAS Enhanced Roadmap 2012 [Deliverable November 2012] I About ENISA The European Network and Information Security Agency (ENISA) is a centre of network and information security expertise for the EU, its Member States, the private

More information

Quagmire or Goldmine?

Quagmire or Goldmine? The World-Wide Wide Web: Quagmire or Goldmine? Oren Etzioni [Comm. of the ACM, Nov 1996] Presentation Credits: Shabnam Sobti 30 - OCT - 2002 WWW - Quagmire or Goldmine? 1 Agenda Prelude: The Internet Story

More information

Digital Newsletter. Editorial. Second Review Meeting in Brussels

Digital Newsletter. Editorial. Second Review Meeting in Brussels Editorial The aim of this newsletter is to inform scientists, industry as well as older people in general about the achievements reached within the HERMES project. The newsletter appears approximately

More information

A tool for Entering Structural Metadata in Digital Libraries

A tool for Entering Structural Metadata in Digital Libraries A tool for Entering Structural Metadata in Digital Libraries Lavanya Prahallad, Indira Thammishetty, E.Veera Raghavendra, Vamshi Ambati MSIT Division, International Institute of Information Technology,

More information

arxiv: v1 [cs.hc] 14 Nov 2017

arxiv: v1 [cs.hc] 14 Nov 2017 A visual search engine for Bangladeshi laws arxiv:1711.05233v1 [cs.hc] 14 Nov 2017 Manash Kumar Mandal Department of EEE Khulna University of Engineering & Technology Khulna, Bangladesh manashmndl@gmail.com

More information

An Oracle White Paper October Oracle Social Cloud Platform Text Analytics

An Oracle White Paper October Oracle Social Cloud Platform Text Analytics An Oracle White Paper October 2012 Oracle Social Cloud Platform Text Analytics Executive Overview Oracle s social cloud text analytics platform is able to process unstructured text-based conversations

More information

ACCESSING DATABASE USING NLP

ACCESSING DATABASE USING NLP ACCESSING DATABASE USING NLP Pooja A.Dhomne 1, Sheetal R.Gajbhiye 2, Tejaswini S.Warambhe 3, Vaishali B.Bhagat 4 1 Student, Computer Science and Engineering, SRMCEW, Maharashtra, India, poojadhomne@yahoo.com

More information

BHL-EUROPE: Biodiversity Heritage Library for Europe. Jana Hoffmann, Henning Scholz

BHL-EUROPE: Biodiversity Heritage Library for Europe. Jana Hoffmann, Henning Scholz Nimis P. L., Vignes Lebbe R. (eds.) Tools for Identifying Biodiversity: Progress and Problems pp. 43-48. ISBN 978-88-8303-295-0. EUT, 2010. BHL-EUROPE: Biodiversity Heritage Library for Europe Jana Hoffmann,

More information

Object-oriented Compiler Construction

Object-oriented Compiler Construction 1 Object-oriented Compiler Construction Extended Abstract Axel-Tobias Schreiner, Bernd Kühl University of Osnabrück, Germany {axel,bekuehl}@uos.de, http://www.inf.uos.de/talks/hc2 A compiler takes a program

More information

What is this Song About?: Identification of Keywords in Bollywood Lyrics

What is this Song About?: Identification of Keywords in Bollywood Lyrics What is this Song About?: Identification of Keywords in Bollywood Lyrics by Drushti Apoorva G, Kritik Mathur, Priyansh Agrawal, Radhika Mamidi in 19th International Conference on Computational Linguistics

More information

Q: Is there often any change in length when translating from a LOTE to English? Why?

Q: Is there often any change in length when translating from a LOTE to English? Why? Translation Service Instructions Q: Is there often any change in length when translating from a LOTE to English? Why? A: Typically, the target texts could be slightly lengthier than the source texts. Sometimes

More information

Dmesure: a readability platform for French as a foreign language

Dmesure: a readability platform for French as a foreign language Dmesure: a readability platform for French as a foreign language Thomas François 1, 2 and Hubert Naets 2 (1) Aspirant F.N.R.S. (2) CENTAL, Université Catholique de Louvain Presentation at CLIN 21 February

More information

Natural Language Processing as Key Component to Successful Information Products

Natural Language Processing as Key Component to Successful Information Products Natural Language Processing as Key Component to Successful Information Products Yves Schabes Teragram Corporation Tera monster in Greek 2 40 (=1,099,511,627,766) 10 12 (one trillion) gram something written

More information

Introduction

Introduction Introduction EuropeanaConnect All-Staff Meeting Berlin, May 10 12, 2010 Welcome to the All-Staff Meeting! Introduction This is a quite big meeting. This is the end of successful project year Project established

More information

Requirements Engineering

Requirements Engineering Requirements Engineering An introduction to requirements engineering Gerald Kotonya and Ian Sommerville G. Kotonya and I. Sommerville 1998 Slide 1 Objectives To introduce the notion of system requirements

More information

1.0 Abstract. 2.0 TIPSTER and the Computing Research Laboratory. 2.1 OLEADA: Task-Oriented User- Centered Design in Natural Language Processing

1.0 Abstract. 2.0 TIPSTER and the Computing Research Laboratory. 2.1 OLEADA: Task-Oriented User- Centered Design in Natural Language Processing Oleada: User-Centered TIPSTER Technology for Language Instruction 1 William C. Ogden and Philip Bernick The Computing Research Laboratory at New Mexico State University Box 30001, Department 3CRL, Las

More information

Conversational Knowledge Graphs. Larry Heck Microsoft Research

Conversational Knowledge Graphs. Larry Heck Microsoft Research Conversational Knowledge Graphs Larry Heck Microsoft Research Multi-modal systems e.g., Microsoft MiPad, Pocket PC TV Voice Search e.g., Bing on Xbox Task-specific argument extraction (e.g., Nuance, SpeechWorks)

More information

LIDER Survey. Overview. Number of participants: 24. Participant profile (organisation type, industry sector) Relevant use-cases

LIDER Survey. Overview. Number of participants: 24. Participant profile (organisation type, industry sector) Relevant use-cases LIDER Survey Overview Participant profile (organisation type, industry sector) Relevant use-cases Discovering and extracting information Understanding opinion Content and data (Data Management) Monitoring

More information

User Task Automator. Himanshu Prasad 1, P. Geetha Priya 2, S.Manjunatha 3, B.H Namratha 4 and Rekha B. Venkatapur 5 1,2,3&4

User Task Automator. Himanshu Prasad 1, P. Geetha Priya 2, S.Manjunatha 3, B.H Namratha 4 and Rekha B. Venkatapur 5 1,2,3&4 Asian Journal of Engineering and Applied Technology ISSN: 2249-068X Vol. 6 No. 1, 2017, pp.40-44 The Research Publication, www.trp.org.in Himanshu Prasad 1, P. Geetha Priya 2, S.Manjunatha 3, B.H Namratha

More information

TRANSANA and Chapter 8 Retrieval

TRANSANA and Chapter 8 Retrieval TRANSANA and Chapter 8 Retrieval Chapter 8 in Using Software for Qualitative Research focuses on retrieval a crucial aspect of qualitatively coding data. Yet there are many aspects of this which lead to

More information

CoE CENTRE of EXCELLENCE ON DATA WAREHOUSING

CoE CENTRE of EXCELLENCE ON DATA WAREHOUSING in partnership with Overall handbook to set up a S-DWH CoE: Deliverable: 4.6 Version: 3.1 Date: 3 November 2017 CoE CENTRE of EXCELLENCE ON DATA WAREHOUSING Handbook to set up a S-DWH 1 version 2.1 / 4

More information

Annotation by category - ELAN and ISO DCR

Annotation by category - ELAN and ISO DCR Annotation by category - ELAN and ISO DCR Han Sloetjes, Peter Wittenburg Max Planck Institute for Psycholinguistics P.O. Box 310, 6500 AH Nijmegen, The Netherlands E-mail: Han.Sloetjes@mpi.nl, Peter.Wittenburg@mpi.nl

More information

The Pluralistic Usability Walk-Through Method S. Riihiaho Helsinki University of Technology P.O. Box 5400, FIN HUT

The Pluralistic Usability Walk-Through Method S. Riihiaho Helsinki University of Technology P.O. Box 5400, FIN HUT The Pluralistic Usability Walk-Through Method S. Riihiaho Helsinki University of Technology P.O. Box 5400, FIN-02015 HUT sirpa.riihiaho@hut.fi Abstract Pluralistic usability walkthrough is a usability

More information

Semantic Technologies for Nuclear Knowledge Modelling and Applications

Semantic Technologies for Nuclear Knowledge Modelling and Applications Semantic Technologies for Nuclear Knowledge Modelling and Applications D. Beraha 3 rd International Conference on Nuclear Knowledge Management 7.-11.11.2016, Vienna, Austria Why Semantics? Machines understanding

More information

Qualitative Data Analysis Software. A workshop for staff & students School of Psychology Makerere University

Qualitative Data Analysis Software. A workshop for staff & students School of Psychology Makerere University Qualitative Data Analysis Software A workshop for staff & students School of Psychology Makerere University (PhD) January 27, 2016 Outline for the workshop CAQDAS NVivo Overview Practice 2 CAQDAS Before

More information