Modeling Slang-style Word Formation for Retrieving Evaluative Information

Size: px
Start display at page:

Download "Modeling Slang-style Word Formation for Retrieving Evaluative Information"

Transcription

1 Modeling Slang-style Word Formation for Retrieving Evaluative Information Atsushi Fujii Graduate School of Library, Information and Media Studies University of Tsukuba 1-2 Kasuga, Tsukuba, , Japan Abstract The volume of evaluative information for a specific item, such as opinions about an organization or reviews of a product, has been increasing rapidly on the World Wide Web. This trend has had a significant impact on both producers and consumers. However, the information overload problem on the Web makes it time consuming to identify relevant information. We propose a method for retrieving evaluative documents for an item. In evaluative documents, the item in question is often represented by a slang-style coined name, such as Micro$oft referring to Microsoft, chosen for the purposes of euphemism or wordplay. To generate coined names for an item automatically, we have modeled slang-style word formation for Japanese. Coined names are used to query the Web, enabling us to retrieve evaluative documents that cannot be retrieved by existing methods. We show the effectiveness of our method experimentally. 1 Introduction The volume of evaluative information for a specific item, such as opinions about an organization or reviews for a product, has been increasing rapidly on the World Wide Web. This trend has a significant impact on both producers and consumers. While a company may assess its reputation by analyzing customers opinions, a customer may compare reviews before choosing a product. However, because a simple method for Web search, such as using the name of a target item as a query, usually retrieves a large number of extraneous pages, it is time consuming for users to identify pages that satisfy their information needs. Because evaluative information usually contains subjective descriptions, a number of methods for sentiment analysis can potentially alleviate this information overload problem. Existing methods for sentiment analysis can be divided into three approaches: distinguishing between subjective and objective descriptions in texts (Eguchi and Lavrenko, 2006; Riloff and Wiebe, 2003), classifying subjective descriptions into bipolar categories (Hu and Liu, 2004; Turney, 2002) or multipoint scale categories (Pang and Lee, 2005), and summarizing subjective descriptions (Fujii and Ishikawa, 2006; Hu and Liu, 2004; Liu et al., 2005). Among the above approaches, the distinction between subjective and objective descriptions is the most straightforward solution to retrieving evaluative information. However, because existing methods rely on evaluative expressions associated with sentiment or subjectivity, such as excellent or service is bad, as the clues, subjective descriptions that do not contain these expressions cannot be retrieved. We propose a new method for retrieving evaluative documents for an item. The contribution of our research is that to overcome the limitation of existing methods, we explore a new feature for sentiment analysis. In evaluative documents on the Web, the item in question is often represented by a slang-style coined name, such as Micro$oft referring to Microsoft, for the purposes of euphemism or wordplay. This implies that by using a slang-style coined name for an item as a query, we can identify evaluative documents that cannot be retrieved by existing methods. In brief, given the name of a target item, our method automatically generates slang-style alternative names for that item and uses those names to query the Web. To realize this method, we need to model slang-style word formation, for which we currently target only Japanese. Because Japanese uses different types of characters, such as the Kanji, Katakana, and Hiragana alphabets, and

2 other characters such as numerals, the mechanism of word formation in Japanese is complicated and thus our method can potentially be applied to other languages in the future. Although slang has been a subject of linguistics, the purpose of past research was to analyze and classify slang words in terms of specific properties, such as usage and word formation. Our research is the first exploration of utilizing slang words for information retrieval and sentiment analysis. Section 2 outlines our method for retrieving evaluative documents. Section 3 describes our method for generating slang-style names. Section 4 describes the experiments and discusses the results obtained. 2 Retrieval Method for Evaluative Documents Given the name of a target item, our retrieval method performs the following two steps. (1) We generate slang-style names for the input item. (2) For each generated name, we search the Web for pages in which that name appears. Our method does not classify the retrieved pages into semantic orientations, such as positive or negative. For this purpose, the existing classification methods in Section 1 can be utilized. However, because slang words are often used for criticism, our method tends to retrieve negative documents. We discuss this tendency in Section 4. In the current retrieval interface, users are allowed to select slang-style names to be used as a query; otherwise all the generated names are automatically used for retrieval purposes. For step (1), we have modeled slang-style word formation in Japanese, which will be elaborated in Section 3. Our method for generating slang-style names uses both a target name and its pronunciation. If a target name consists of only Katakana or Hiragana, both of which comprise phonograms, its pronunciation is represented by the name. However, if a target name contains Kanji, which comprises ideograms, in principle we consult a dictionary for the pronunciation. However, because target names are usually proper nouns and the out-ofvocabulary problem is therefore crucial, in practice a user is requested to provide the pronunciation of a target name in Katakana or Hiragana. For step (2), an existing search engine on the Web can be used without any modification. However, after the retrieval, we discard the pages that do not include the generated query name as it is. Irrelevant pages are often retrieved if the query name contains a special symbol that is used as a wildcard character, such as an asterisk ( * ), in the search engine used. 3 Generating Slang-style Names 3.1 Overview To generate slang-style names for an item, we have identified types of slang-style word formation in Japanese and developed an automatic generation method for each type. Although a number of traditional references in linguistics have identified types of word formations in Japanese slang (Maeda, 1922; Nomura and Koike, 1992), our focus is so-called Internet slang and there are thus a number of word formation types that are not identified in traditional studies. Certain types of Internet slang are associated with word processing methods in computers. For example, many users make deliberate typographical errors to generate an unusual sequence of characters. In view of this background, we performed a preliminary study, in which we collected slang words from Japanese Web sites and identified their types in terms of the word formation. However, not all slang types are desirable for our purpose. For example, abbreviation, which is typically used to generate both general and slang words, cannot be effective in retrieving only evaluative documents. In addition, not all slang types can be realized in an automatic method with high accuracy. For example, Micro$oft looks similar to Microsoft and may be associated with a company chasing a profit. Again, a slang word of a company s name may be associated with the personality or physical characteristics of the president. To model such highly intelligent association or inference accurately, we require a knowledge-intensive method using a number of rules and heuristics. However, as the first step of our research, we currently focus only on word formation types that can be realized with straightforward algorithms and dictionaries available to the public. As a result, our initial work targets the following six types of word formation: blank, partial romanization, typographical similarity, character-type conversion, input-mode er-

3 ror, and Japanese-conversion error. For each word formation type, we describe the definition and the method for generating slang-style names in the following sections. 3.2 Blank One or more characters in an original name are not printed and are replaced with a special symbol, for which in English * (an asterisk) is often used, but in Japanese (a circle) is often used. placement is unidirectional ( ) or bidirectional ( ). Using Figure 1, for we can generate (/sofutopanku/) or (/nfutohasoku/). In principle, we can replace an arbitrary number of characters in the original name with any symbol. However, in practice we replace only a sin- gle character with, to restrict the number of names generated. Thus, for a name consisting of N characters, we generate N slang-style names. For example, for (/sofuto- banku/), which is the name of an information technology company in Japan, we generate six Figure 1: Correspondences for typographically names, such as and similar Katakana characters.. This method can also be used for names in English. For example, for softbank, which is the English name for, we can generate oftbank and s ftbank. Throughout this paper, we use slashes to indicate the pronunciation of Japanese words in Roman characters. 3.3 Partial Romanization One or more segments in an original name are romanized and only the first Roman character for each segment is used. Example names for are S and F. However, as in Section 3.2, to restrict the number of names generated, we romanize only a single character in the original name. To convert a Japanese character into its Roman representation, we use correspondences between Japanese and Roman characters 1. If the target character in question is a Katakana or Hiragana character, we simply consult these correspondences for its Roman representation. However if the target character is a Kanji character, we use the pronunciation of the target name. 3.4 Typographical Similarity One or more characters in an original name are replaced with another character that is typographically similar to the character in question. For example, (/n/) and (/shi/) may be replaced with (/so/) and (/tsu/), respectively. 1 utashiro/perl/scripts/romkan pl/ We have empirically identified 44 pairs of typographically similar Katakana characters, and replace an arbitrary number of Katakana characters in the original name with their counterpart characters. Figure 1 shows the 44 typographically similar pairs, in which arrows denote whether the re- 3.5 Character-type Conversion Katakana characters in an original name are entirely or partially replaced with Hiragana characters and vice versa. An example name for is, in which is written in Hiragana characters. We use the pronunciation of the input name represented by Hiragana. We segment the original name into two segments with an arbitrary position and convert one of the segments into Katakana. Thus, for a name consisting of N characters, we consider N 1 segmentations. We use the EUC-JP code, in which Hiragana and Katakana can mutually be converted based on the character codes. 3.6 Input-mode Error The background of this word formation type should perhaps be explained. In typical front-end processors (FEPs) for Japanese, which help users to input Japanese characters, users are requested to choose Japanese or non-japanese mode. In either mode, users are allowed to input ASCII characters as indicated by the keyboard. However, in the Japanese mode, an input string is regarded as the romanization of the pronunciation of a Japanese word and will be converted into a plausible word in the Japanese alphabets. If a user intends to input an English word, but mistakenly chooses the Japanese mode, an input string is entirely or partially converted into

4 Japanese and the resultant string may look like an unusual combination of characters. Thus, users can purposefully choose the wrong mode to generate a slang-style name. If a user chooses the Japanese mode to input softbank, the resultant string can be ft k, in which so, ba, and n coincidently correspond to romanized Japanese moras. If a user chooses the non- Japanese mode to input Japanese characters, the resultant string is simply a romanization of the intended Japanese word. We use two different methods independently. If a target name is in Japanese, such as candidates for each segment. We retain the segments that do not correspond to a Japanese word, we simply convert the target name into its Roman representation, for which we use the romanization method in Section 3.3. However, if a Unlike the other methods in Sections , as Hiragana characters. target name is not in Japanese, we read the constituent characters in the target name sequentially this method often generates a plausible name in which usually generate unusual Japanese strings, and convert combinations of characters that are the Japanese that corresponds to an existing item. If same as the Roman representation for Japanese we use these names to query the Web, we cannot retrieve evaluative documents for a target item, moras into their corresponding Hiragana characters. Although in most cases a mora is a single but retrieve homepages for different items. For example, for (/fujiya/), which is a confec- vowel, such as (/a/), or a combination of one or more consonants followed by a vowel, such as tionery company in Japan, our method generated (/so/) and (/kya/), (/n/) consists of a single consonant. If we read the character n, we must read the next character to determine the resultant Hiragana character. If the next character is a vowel, we convert a combination of n and the vowel into the corresponding Hiragana characters, such as (/na/) or (/ni/) ; otherwise we convert n into (/n/). 3.7 Japanese-conversion Error As explained in Section 3.6, in the Japanese mode, an input string is converted into one or more Japanese words with the same pronunciation as the input characters. However, because more than one Japanese word often corresponds to the same pronunciation, most Japanese FEPs use disambiguation methods and also allow users to choose a correct Japanese word from more than one candidate. This problem is crucial because Kanji comprises ideograms. Users can choose incorrect Japanese words purposefully, to generate unusual words and often play on the double meaning. For, an example word is, in which (/sofu/), (/to/), and (/banku/) mean grand father, and, and great pain, respectively. We use the SKK dictionary 2 for Japanese FEPs, which defines Japanese words and their pronunciation in Hiragana. We use this dictionary to segment the input Hiragana pronunciation and to derive possible Japanese words for each segment. In principle, we consider all possible segmentations of the input Hiragana pronunciation by consulting the SKK dictionary, and derive all possible Japanese words for each segment. However, to restrict the number of names generated, we currently segment the input into only two segments and use up to three Japanese word (/fujiya/), which is the name of a hotel. To resolve this problem, we check whether a generated name corresponds to an existing item, and if it does, we discard the name. We use a query classification method (Fujii, 2008), which automatically identifies whether a query is informational or navigational. While an informational query is used to obtain information in general, a navigational query is used to retrieve one or more representative pages for a known item, such as a homepage for a company or product. In other words, a navigational query is usually the name of an existing item and thus we discard the generated names classified as navigational. Because the above query classification method requires a collection of Web pages, we used the test collection produced for NTCIR-5 3. The target document set for NTCIR-5 consists of pages collected from the JP domain. 4 Experiments 4.1 Method To evaluate the effectiveness of our retrieval method, we used the following three company

5 names as targets: (Softbank), get. Thus, our method cannot be evaluated by such (Amazon), and (Fujiya). For each target name, we also used its English common measures as Mean Average Precision and Mean Reciprocal Rank, which uses the rank name, as indicated in parentheses. Although in of each document in a single list. principle our method can be used with any type of item, such as a company or a product, we targeted 4.2 Results and Discussion only company names for the following two Tables 1 and 2 show the retrieval accuracy and reasons. First, because the number of pages for other figures for slang-style and original names, a company is usually larger than that for a single respectively. In Table 1, Fujiya and Fujiya* product, the information overload problem is crucial for company names compared with product denote the results with or without the query classification method (Fujii, 2008) for filtering purposes. Using the filtering method, we discarded names. Second, because the cost of human judgment is prohibitive, it was necessary to restrict the six generated names and successfully reduced the number of target names. number of irrelevant pages while maintaining the While evaluative documents for number of evaluative pages. We did not use the and are associated with a variety of filtering method for the other two targets, in which their products, evaluative documents for the names consist of only Katakana characters. are associated with its service for online shopping. Thus, an experiment for can be Comparing the results in Tables 1 and 2, our method retrieved more evaluative pages and seen as either for a company or for a product. achieved a higher accuracy than the baseline For each target, we used the six methods in Section 3 to generate slang-style names. The numbers method, irrespective of the target. Comparing the results for Fujiya and Fujiya* in Table 1, our of names generated for, filtering method was effective in improving the, and were 44, 40, and 35, respectively. For each generated name, we used Yahoo! Japan 4 to query the Web and we retrieved up to 20 top pages that contained the generated name. For each target, we also discarded duplicate pages. As a result, the numbers of pages retrieved retrieval accuracy. We investigated the number of evaluative pages retrieved by our method that could not be retrieved by existing methods. Our method retrieved 188 evaluative pages in total, of which 166 pages did not contain the original target name. These 166 pages could not be retrieved for,, and by existing methods that use the original name of were 524, 474, and 416, respectively. a target and additional evaluative expressions. As the baseline method, for each target, we used Table 3 shows the number of names generated and the retrieval accuracy for each slang its original name as a query. Because the number of pages retrieved for slang-style names for each type: blank (BL), partial romanization (PR), target was approximately 500, we retrieved up to typographical similarity (TS), character-type 500 pages for each target. We also discarded duplicate pages and the pages that did not contain the conversion (CC), input-mode error (IE), and Japanese-conversion error (JE). The number of query. As a result, the numbers of pages retrieved names by IE is only two, because we did not use for,, and the names by IE for and, were 489, 428, and 449, respectively. which are identical to its Japanese or English official name. Because we distinguish pages retrieved For each retrieved page, an assessor assigned one of three categories: positive evaluation by different slang types in Table 3, the total number of pages in Table 3 is more than that in Table 1. (Pos), negative evaluation (Neg), and no evaluation (No). For each target, we calculated the In Table 3, BL retrieved the largest number of evaluative pages and achieved the highest accuracy. retrieval accuracy ( Acc ), which is the ratio of the number of Pos and Neg pages to the total number of pages retrieved ( Total ). For each negative page, we analyzed whether the description was emotional or rational. The Because our method does not determine priorities of different generated names, it produces more than one ranked document list for each tar- 4 numbers of emotional negative descriptions retrieved by the baseline method and our method were 5 and 39, respectively. If a user intends to find slanders against a company, retrieving emo-

6 Table 1: Retrieval accuracy for slang-style names. Target Pos Neg No Total Acc (%) Softbank Amazon Fujiya Fujiya* Table 2: Retrieval accuracy for original names. Target Pos Neg No Total Acc (%) Softbank Amazon Fujiya Table 3: Retrieval accuracy for each slang type. Type #Names Pos Neg No Total Acc (%) BL PR TS CC IE JE tional descriptions is crucial, and our method does so effectively. To identify the reasons for errors using our method, we analyzed irrelevant pages retrieved by our method and found that in those pages generated names used as queries often matched typographic errors, handle names, or Kanji characters in Chinese pages. The irrelevant pages, including typographical errors of a query, are divided into intentional and unintentional. Those who try to increase the page view of a specific page often embed high-frequency typographic errors for a company or product in that page, so that a user who mistakenly uses an incorrect query may reach that page. We will analyze the characteristics of these pages for filtering purposes in the future. 5 Conclusion We have proposed a method for retrieving evaluative documents for a specific item. Because evaluative documents often include slang-style coined names, to retrieve these pages, we modeled slangstyle word formation in Japanese. We also showed the effectiveness of our method experimentally. Acknowledgments This research was supported in part by MEXT Grant-in-Aid Scientific Research on Priority Area of New IT Infrastructure for the Informationexplosion Era (Grant No ). References Koji Eguchi and Victor Lavrenko Sentiment retrieval using generative models. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pages Atsushi Fujii Modeling anchor text and classifying queries to enhance Web document retrieval. In Proceedings of the 17th International World Wide Web Conference, pages Atsushi Fujii and Tetsuya Ishikawa A system for summarizing and visualizing arguments in subjective documents: Toward supporting decision making. In Proceedings of COLING-ACL Workshop on Sentiment and Subjectivity in Text, pages Minqing Hu and Bing Liu Mining and summarizing customer reviews. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages Bing Liu, Minqing Hu, and Junsheng Cheng Opinion observer: Analyzing and comparing opinions on the Web. In Proceedings of the 14th International World Wide Web Conference, pages Taro Maeda Gairaigonokenkyuu (A Study on Loanwords). Iwanami Shoten publisher. (In Japanese). Masaaki Nomura and Seiji Koike Nihongojiten (An Encyclopedia for Japanese). Tokyodo publisher. (In Japanese). Bo Pang and Lillian Lee Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, pages Ellen Riloff and Janyce Wiebe Learning extraction patterns for subjective expressions. In Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pages Peter D. Turney Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages

Document Structure Analysis in Associative Patent Retrieval

Document Structure Analysis in Associative Patent Retrieval Document Structure Analysis in Associative Patent Retrieval Atsushi Fujii and Tetsuya Ishikawa Graduate School of Library, Information and Media Studies University of Tsukuba 1-2 Kasuga, Tsukuba, 305-8550,

More information

HUKB at NTCIR-12 IMine-2 task: Utilization of Query Analysis Results and Wikipedia Data for Subtopic Mining

HUKB at NTCIR-12 IMine-2 task: Utilization of Query Analysis Results and Wikipedia Data for Subtopic Mining HUKB at NTCIR-12 IMine-2 task: Utilization of Query Analysis Results and Wikipedia Data for Subtopic Mining Masaharu Yoshioka Graduate School of Information Science and Technology, Hokkaido University

More information

Overview of the Patent Retrieval Task at the NTCIR-6 Workshop

Overview of the Patent Retrieval Task at the NTCIR-6 Workshop Overview of the Patent Retrieval Task at the NTCIR-6 Workshop Atsushi Fujii, Makoto Iwayama, Noriko Kando Graduate School of Library, Information and Media Studies University of Tsukuba 1-2 Kasuga, Tsukuba,

More information

Overview of the Patent Mining Task at the NTCIR-8 Workshop

Overview of the Patent Mining Task at the NTCIR-8 Workshop Overview of the Patent Mining Task at the NTCIR-8 Workshop Hidetsugu Nanba Atsushi Fujii Makoto Iwayama Taiichi Hashimoto Graduate School of Information Sciences, Hiroshima City University 3-4-1 Ozukahigashi,

More information

Assigning Vocation-Related Information to Person Clusters for Web People Search Results

Assigning Vocation-Related Information to Person Clusters for Web People Search Results Global Congress on Intelligent Systems Assigning Vocation-Related Information to Person Clusters for Web People Search Results Hiroshi Ueda 1) Harumi Murakami 2) Shoji Tatsumi 1) 1) Graduate School of

More information

Overview of Patent Retrieval Task at NTCIR-5

Overview of Patent Retrieval Task at NTCIR-5 Overview of Patent Retrieval Task at NTCIR-5 Atsushi Fujii, Makoto Iwayama, Noriko Kando Graduate School of Library, Information and Media Studies University of Tsukuba 1-2 Kasuga, Tsukuba, 305-8550, Japan

More information

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating Dipak J Kakade, Nilesh P Sable Department of Computer Engineering, JSPM S Imperial College of Engg. And Research,

More information

Toward Interlinking Asian Resources Effectively: Chinese to Korean Frequency-Based Machine Translation System

Toward Interlinking Asian Resources Effectively: Chinese to Korean Frequency-Based Machine Translation System Toward Interlinking Asian Resources Effectively: Chinese to Korean Frequency-Based Machine Translation System Eun Ji Kim and Mun Yong Yi (&) Department of Knowledge Service Engineering, KAIST, Daejeon,

More information

SENTIMENT ESTIMATION OF TWEETS BY LEARNING SOCIAL BOOKMARK DATA

SENTIMENT ESTIMATION OF TWEETS BY LEARNING SOCIAL BOOKMARK DATA IADIS International Journal on WWW/Internet Vol. 14, No. 1, pp. 15-27 ISSN: 1645-7641 SENTIMENT ESTIMATION OF TWEETS BY LEARNING SOCIAL BOOKMARK DATA Yasuyuki Okamura, Takayuki Yumoto, Manabu Nii and Naotake

More information

Latent Aspect Rating Analysis. Hongning Wang

Latent Aspect Rating Analysis. Hongning Wang Latent Aspect Rating Analysis Hongning Wang CS@UVa Online opinions cover all kinds of topics Topics: People Events Products Services, Sources: Blogs Microblogs Forums Reviews, 45M reviews 53M blogs 1307M

More information

SENTIMENT ANALYSIS OF TEXTUAL DATA USING MATRICES AND STACKS FOR PRODUCT REVIEWS

SENTIMENT ANALYSIS OF TEXTUAL DATA USING MATRICES AND STACKS FOR PRODUCT REVIEWS SENTIMENT ANALYSIS OF TEXTUAL DATA USING MATRICES AND STACKS FOR PRODUCT REVIEWS Akhil Krishna, CSE department, CMR Institute of technology, Bangalore, Karnataka 560037 akhil.krsn@gmail.com Suresh Kumar

More information

Integrating Query Translation and Text Classification in a Cross-Language Patent Access System

Integrating Query Translation and Text Classification in a Cross-Language Patent Access System Integrating Query Translation and Text Classification in a Cross-Language Patent Access System Guo-Wei Bian Shun-Yuan Teng Department of Information Management Huafan University, Taiwan, R.O.C. gwbian@cc.hfu.edu.tw

More information

R 2 D 2 at NTCIR-4 Web Retrieval Task

R 2 D 2 at NTCIR-4 Web Retrieval Task R 2 D 2 at NTCIR-4 Web Retrieval Task Teruhito Kanazawa KYA group Corporation 5 29 7 Koishikawa, Bunkyo-ku, Tokyo 112 0002, Japan tkana@kyagroup.com Tomonari Masada University of Tokyo 7 3 1 Hongo, Bunkyo-ku,

More information

Web Product Ranking Using Opinion Mining

Web Product Ranking Using Opinion Mining Web Product Ranking Using Opinion Mining Yin-Fu Huang and Heng Lin Department of Computer Science and Information Engineering National Yunlin University of Science and Technology Yunlin, Taiwan {huangyf,

More information

Speech-based Information Retrieval System with Clarification Dialogue Strategy

Speech-based Information Retrieval System with Clarification Dialogue Strategy Speech-based Information Retrieval System with Clarification Dialogue Strategy Teruhisa Misu Tatsuya Kawahara School of informatics Kyoto University Sakyo-ku, Kyoto, Japan misu@ar.media.kyoto-u.ac.jp Abstract

More information

Comment Extraction from Blog Posts and Its Applications to Opinion Mining

Comment Extraction from Blog Posts and Its Applications to Opinion Mining Comment Extraction from Blog Posts and Its Applications to Opinion Mining Huan-An Kao, Hsin-Hsi Chen Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan

More information

Applying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task

Applying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task Applying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task Walid Magdy, Gareth J.F. Jones Centre for Next Generation Localisation School of Computing Dublin City University,

More information

The Design of Model for Tibetan Language Search System

The Design of Model for Tibetan Language Search System International Conference on Chemical, Material and Food Engineering (CMFE-2015) The Design of Model for Tibetan Language Search System Wang Zhong School of Information Science and Engineering Lanzhou University

More information

Empirical Evaluation of RNN Architectures on Sentence Classification Task

Empirical Evaluation of RNN Architectures on Sentence Classification Task Empirical Evaluation of RNN Architectures on Sentence Classification Task Lei Shen, Junlin Zhang Chanjet Information Technology lorashen@126.com, zhangjlh@chanjet.com Abstract. Recurrent Neural Networks

More information

FLL: Answering World History Exams by Utilizing Search Results and Virtual Examples

FLL: Answering World History Exams by Utilizing Search Results and Virtual Examples FLL: Answering World History Exams by Utilizing Search Results and Virtual Examples Takuya Makino, Seiji Okura, Seiji Okajima, Shuangyong Song, Hiroko Suzuki, Fujitsu Laboratories Ltd. Fujitsu R&D Center

More information

Karami, A., Zhou, B. (2015). Online Review Spam Detection by New Linguistic Features. In iconference 2015 Proceedings.

Karami, A., Zhou, B. (2015). Online Review Spam Detection by New Linguistic Features. In iconference 2015 Proceedings. Online Review Spam Detection by New Linguistic Features Amir Karam, University of Maryland Baltimore County Bin Zhou, University of Maryland Baltimore County Karami, A., Zhou, B. (2015). Online Review

More information

Query classification by using named entity recognition systems and clue keywords

Query classification by using named entity recognition systems and clue keywords Query classification by using named entity recognition systems and clue keywords Masaharu Yoshioka Graduate School of Information Science and echnology, Hokkaido University N14 W9, Kita-ku, Sapporo-shi

More information

Overview of Classification Subtask at NTCIR-6 Patent Retrieval Task

Overview of Classification Subtask at NTCIR-6 Patent Retrieval Task Overview of Classification Subtask at NTCIR-6 Patent Retrieval Task Makoto Iwayama *, Atsushi Fujii, Noriko Kando * Hitachi, Ltd., 1-280 Higashi-koigakubo, Kokubunji, Tokyo 185-8601, Japan makoto.iwayama.nw@hitachi.com

More information

A MODEL OF EXTRACTING PATTERNS IN SOCIAL NETWORK DATA USING TOPIC MODELLING, SENTIMENT ANALYSIS AND GRAPH DATABASES

A MODEL OF EXTRACTING PATTERNS IN SOCIAL NETWORK DATA USING TOPIC MODELLING, SENTIMENT ANALYSIS AND GRAPH DATABASES A MODEL OF EXTRACTING PATTERNS IN SOCIAL NETWORK DATA USING TOPIC MODELLING, SENTIMENT ANALYSIS AND GRAPH DATABASES ABSTRACT Assane Wade 1 and Giovanna Di MarzoSerugendo 2 Centre Universitaire d Informatique

More information

Incorporating Satellite Documents into Co-citation Networks for Scientific Paper Searches

Incorporating Satellite Documents into Co-citation Networks for Scientific Paper Searches Incorporating Satellite Documents into Co-citation Networks for Scientific Paper Searches Masaki Eto Gakushuin Women s College Tokyo, Japan masaki.eto@gakushuin.ac.jp Abstract. To improve the search performance

More information

From CLIR to CLIE: Some Lessons in NTCIR Evaluation

From CLIR to CLIE: Some Lessons in NTCIR Evaluation From CLIR to CLIE: Some Lessons in NTCIR Evaluation Hsin-Hsi Chen Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan +886-2-33664888 ext 311 hhchen@csie.ntu.edu.tw

More information

Mahek Merchant 1, Ricky Parmar 2, Nishil Shah 3, P.Boominathan 4 1,3,4 SCOPE, VIT University, Vellore, Tamilnadu, India

Mahek Merchant 1, Ricky Parmar 2, Nishil Shah 3, P.Boominathan 4 1,3,4 SCOPE, VIT University, Vellore, Tamilnadu, India Sentiment Analysis of Web Scraped Product Reviews using Hadoop Mahek Merchant 1, Ricky Parmar 2, Nishil Shah 3, P.Boominathan 4 1,3,4 SCOPE, VIT University, Vellore, Tamilnadu, India Abstract As in the

More information

Clustering of Text and Image for Grouping Similar Contents

Clustering of Text and Image for Grouping Similar Contents University of Aizu, Graduation Thesis. August, 2003 s1070176 1 Clustering of Text and Image for Grouping Similar Contents of Web Data Keigo Hirai s1070176 Supervised by Prof. Ryuichi Oka Abstract 2 System

More information

Finding Context Paths for Web Pages

Finding Context Paths for Web Pages Finding Context Paths for Web Pages Yoshiaki Mizuuchi Keishi Tajima Department of Computer and Systems Engineering Kobe University, Japan ( Currently at NTT Data Corporation) Background (1/3) Aceess to

More information

Final Project Discussion. Adam Meyers Montclair State University

Final Project Discussion. Adam Meyers Montclair State University Final Project Discussion Adam Meyers Montclair State University Summary Project Timeline Project Format Details/Examples for Different Project Types Linguistic Resource Projects: Annotation, Lexicons,...

More information

NUS-I2R: Learning a Combined System for Entity Linking

NUS-I2R: Learning a Combined System for Entity Linking NUS-I2R: Learning a Combined System for Entity Linking Wei Zhang Yan Chuan Sim Jian Su Chew Lim Tan School of Computing National University of Singapore {z-wei, tancl} @comp.nus.edu.sg Institute for Infocomm

More information

Sentiment Analysis for Customer Review Sites

Sentiment Analysis for Customer Review Sites Sentiment Analysis for Customer Review Sites Chi-Hwan Choi 1, Jeong-Eun Lee 2, Gyeong-Su Park 2, Jonghwa Na 3, Wan-Sup Cho 4 1 Dept. of Bio-Information Technology 2 Dept. of Business Data Convergence 3

More information

Evaluation of the Document Categorization in Fixed-point Observatory

Evaluation of the Document Categorization in Fixed-point Observatory Evaluation of the Document Categorization in Fixed-point Observatory Yoshihiro Ueda Mamiko Oka Katsunori Houchi Service Technology Development Department Fuji Xerox Co., Ltd. 3-1 Minatomirai 3-chome, Nishi-ku,

More information

CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets

CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets Arjumand Younus 1,2, Colm O Riordan 1, and Gabriella Pasi 2 1 Computational Intelligence Research Group,

More information

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures Springer Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web

More information

Sparse unsupervised feature learning for sentiment classification of short documents

Sparse unsupervised feature learning for sentiment classification of short documents Sparse unsupervised feature learning for sentiment classification of short documents Simone Albertini Ignazio Gallo Alessandro Zamberletti University of Insubria Varese, Italy simone.albertini@uninsubria.it

More information

Improvement of Web Search Results using Genetic Algorithm on Word Sense Disambiguation

Improvement of Web Search Results using Genetic Algorithm on Word Sense Disambiguation Volume 3, No.5, May 24 International Journal of Advances in Computer Science and Technology Pooja Bassin et al., International Journal of Advances in Computer Science and Technology, 3(5), May 24, 33-336

More information

Feature and Sentiment based Linked Instance RDF Data towards Ontology based Review Categorization

Feature and Sentiment based Linked Instance RDF Data towards Ontology based Review Categorization , July 1-3, 2015, London, U.K. Feature and Sentiment based Linked Instance RDF Data towards Ontology based Review Categorization D. Teja Santosh, B. Vishnu Vardhan, Member, IAENG Abstract-Online reviews

More information

INFORMATION RETRIEVAL SYSTEM: CONCEPT AND SCOPE

INFORMATION RETRIEVAL SYSTEM: CONCEPT AND SCOPE 15 : CONCEPT AND SCOPE 15.1 INTRODUCTION Information is communicated or received knowledge concerning a particular fact or circumstance. Retrieval refers to searching through stored information to find

More information

TEXT CHAPTER 5. W. Bruce Croft BACKGROUND

TEXT CHAPTER 5. W. Bruce Croft BACKGROUND 41 CHAPTER 5 TEXT W. Bruce Croft BACKGROUND Much of the information in digital library or digital information organization applications is in the form of text. Even when the application focuses on multimedia

More information

NTU Approaches to Subtopic Mining and Document Ranking at NTCIR-9 Intent Task

NTU Approaches to Subtopic Mining and Document Ranking at NTCIR-9 Intent Task NTU Approaches to Subtopic Mining and Document Ranking at NTCIR-9 Intent Task Chieh-Jen Wang, Yung-Wei Lin, *Ming-Feng Tsai and Hsin-Hsi Chen Department of Computer Science and Information Engineering,

More information

Revealing the Modern History of Japanese Philosophy Using Digitization, Natural Language Processing, and Visualization

Revealing the Modern History of Japanese Philosophy Using Digitization, Natural Language Processing, and Visualization Revealing the Modern History of Japanese Philosophy Using Digitization, Natural Language Katsuya Masuda *, Makoto Tanji **, and Hideki Mima *** Abstract This study proposes a framework to access to the

More information

Using Self-Organizing Maps for Sentiment Analysis. Keywords Sentiment Analysis, Self-Organizing Map, Machine Learning, Text Mining.

Using Self-Organizing Maps for Sentiment Analysis. Keywords Sentiment Analysis, Self-Organizing Map, Machine Learning, Text Mining. Using Self-Organizing Maps for Sentiment Analysis Anuj Sharma Indian Institute of Management Indore 453331, INDIA Email: f09anujs@iimidr.ac.in Shubhamoy Dey Indian Institute of Management Indore 453331,

More information

CLEF-IP 2009: Exploring Standard IR Techniques on Patent Retrieval

CLEF-IP 2009: Exploring Standard IR Techniques on Patent Retrieval DCU @ CLEF-IP 2009: Exploring Standard IR Techniques on Patent Retrieval Walid Magdy, Johannes Leveling, Gareth J.F. Jones Centre for Next Generation Localization School of Computing Dublin City University,

More information

Designing a Semantic Ground Truth for Mathematical Formulas

Designing a Semantic Ground Truth for Mathematical Formulas Designing a Semantic Ground Truth for Mathematical Formulas Alan Sexton 1, Volker Sorge 1, and Masakazu Suzuki 2 1 School of Computer Science, University of Birmingham, UK, A.P.Sexton V.Sorge@cs.bham.ac.uk,

More information

Micro-blogging Sentiment Analysis Using Bayesian Classification Methods

Micro-blogging Sentiment Analysis Using Bayesian Classification Methods Micro-blogging Sentiment Analysis Using Bayesian Classification Methods Suhaas Prasad I. Introduction In this project I address the problem of accurately classifying the sentiment in posts from micro-blogs

More information

NTCIR-5 Query Expansion Experiments using Term Dependence Models

NTCIR-5 Query Expansion Experiments using Term Dependence Models NTCIR-5 Query Expansion Experiments using Term Dependence Models Koji Eguchi National Institute of Informatics 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan eguchi@nii.ac.jp Abstract This paper

More information

jprocessing Documentation

jprocessing Documentation jprocessing Documentation Release 0.1 Pulkit Kathuria Sep 17, 2017 Contents 1 1.1 Requirements 3 1.1 1.1.1 Links.............................................. 3 1.2 1.1.2 Install.............................................

More information

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.

More information

Query Structuring and Expansion with Two-stage Term Dependence for Japanese Web Retrieval

Query Structuring and Expansion with Two-stage Term Dependence for Japanese Web Retrieval Noname manuscript No. (will be inserted by the editor) Query Structuring and Expansion with Two-stage Term Dependence for Japanese Web Retrieval Koji Eguchi W. Bruce Croft Received: date / Accepted: date

More information

An Analysis of Image Retrieval Behavior for Metadata Type and Google Image Database

An Analysis of Image Retrieval Behavior for Metadata Type and Google Image Database An Analysis of Image Retrieval Behavior for Metadata Type and Google Image Database Toru Fukumoto Canon Inc., JAPAN fukumoto.toru@canon.co.jp Abstract: A large number of digital images are stored on the

More information

Feature Selecting Model in Automatic Text Categorization of Chinese Financial Industrial News

Feature Selecting Model in Automatic Text Categorization of Chinese Financial Industrial News Selecting Model in Automatic Text Categorization of Chinese Industrial 1) HUEY-MING LEE 1 ), PIN-JEN CHEN 1 ), TSUNG-YEN LEE 2) Department of Information Management, Chinese Culture University 55, Hwa-Kung

More information

Evaluating the Usefulness of Sentiment Information for Focused Crawlers

Evaluating the Usefulness of Sentiment Information for Focused Crawlers Evaluating the Usefulness of Sentiment Information for Focused Crawlers Tianjun Fu 1, Ahmed Abbasi 2, Daniel Zeng 1, Hsinchun Chen 1 University of Arizona 1, University of Wisconsin-Milwaukee 2 futj@email.arizona.edu,

More information

KINETIC TYPOGRAPHY STUDIES TODAY IN JAPAN

KINETIC TYPOGRAPHY STUDIES TODAY IN JAPAN The 2nd International Conference on Design Creativity (ICDC2012) Glasgow, UK, 18th-20th September 2012 KINETIC TYPOGRAPHY STUDIES TODAY IN JAPAN J.E.Lee IMCTS / Hokkaido University, Sapporo, Japan Abstract:

More information

KODANSHA'S FURIGANA JAPANESE DICTIONARY: JAPANESE-ENGLISH ENGLISH-JAPANESE BY MASATOSHI YOSHIDA, YOSHIKATSU NAKAMURA

KODANSHA'S FURIGANA JAPANESE DICTIONARY: JAPANESE-ENGLISH ENGLISH-JAPANESE BY MASATOSHI YOSHIDA, YOSHIKATSU NAKAMURA KODANSHA'S FURIGANA JAPANESE DICTIONARY: JAPANESE-ENGLISH ENGLISH-JAPANESE BY MASATOSHI YOSHIDA, YOSHIKATSU NAKAMURA DOWNLOAD EBOOK : KODANSHA'S FURIGANA JAPANESE DICTIONARY: JAPANESE-ENGLISH ENGLISH-JAPANESE

More information

Refinement of digitized documents through recognition of mathematical formulae

Refinement of digitized documents through recognition of mathematical formulae Refinement of digitized documents through recognition of mathematical formulae Toshihiro KANAHORI Research and Support Center on Higher Education for the Hearing and Visually Impaired, Tsukuba University

More information

Microsoft Research Asia at the NTCIR-10 Intent Task

Microsoft Research Asia at the NTCIR-10 Intent Task Microsoft Research Asia at the NTCIR-0 Intent Task Kosetsu Tsukuda Kyoto University tsukuda@dl.kuis.kyotou.ac.jp Zhicheng Dou Microsoft Research Asia zhichdou@microsoft.com Tetsuya Sakai Microsoft Research

More information

Multimedia Integration for Cooking Video Indexing

Multimedia Integration for Cooking Video Indexing Multimedia Integration for Cooking Video Indexing Reiko Hamada 1, Koichi Miura 1, Ichiro Ide 2, Shin ichi Satoh 3, Shuichi Sakai 1, and Hidehiko Tanaka 4 1 The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku,

More information

A User Preference Based Search Engine

A User Preference Based Search Engine A User Preference Based Search Engine 1 Dondeti Swedhan, 2 L.N.B. Srinivas 1 M-Tech, 2 M-Tech 1 Department of Information Technology, 1 SRM University Kattankulathur, Chennai, India Abstract - In this

More information

Best Customer Services among the E-Commerce Websites A Predictive Analysis

Best Customer Services among the E-Commerce Websites A Predictive Analysis www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 5 Issues 6 June 2016, Page No. 17088-17095 Best Customer Services among the E-Commerce Websites A Predictive

More information

Query Disambiguation from Web Search Logs

Query Disambiguation from Web Search Logs Vol.133 (Information Technology and Computer Science 2016), pp.90-94 http://dx.doi.org/10.14257/astl.2016. Query Disambiguation from Web Search Logs Christian Højgaard 1, Joachim Sejr 2, and Yun-Gyung

More information

Automatic Domain Partitioning for Multi-Domain Learning

Automatic Domain Partitioning for Multi-Domain Learning Automatic Domain Partitioning for Multi-Domain Learning Di Wang diwang@cs.cmu.edu Chenyan Xiong cx@cs.cmu.edu William Yang Wang ww@cmu.edu Abstract Multi-Domain learning (MDL) assumes that the domain labels

More information

Query Difficulty Prediction for Contextual Image Retrieval

Query Difficulty Prediction for Contextual Image Retrieval Query Difficulty Prediction for Contextual Image Retrieval Xing Xing 1, Yi Zhang 1, and Mei Han 2 1 School of Engineering, UC Santa Cruz, Santa Cruz, CA 95064 2 Google Inc., Mountain View, CA 94043 Abstract.

More information

Understanding the Query: THCIB and THUIS at NTCIR-10 Intent Task. Junjun Wang 2013/4/22

Understanding the Query: THCIB and THUIS at NTCIR-10 Intent Task. Junjun Wang 2013/4/22 Understanding the Query: THCIB and THUIS at NTCIR-10 Intent Task Junjun Wang 2013/4/22 Outline Introduction Related Word System Overview Subtopic Candidate Mining Subtopic Ranking Results and Discussion

More information

Towards open-domain QA. Question answering. TReC QA framework. TReC QA: evaluation

Towards open-domain QA. Question answering. TReC QA framework. TReC QA: evaluation Question ing Overview and task definition History Open-domain question ing Basic system architecture Watson s architecture Techniques Predictive indexing methods Pattern-matching methods Advanced techniques

More information

Selection of Best Match Keyword using Spoken Term Detection for Spoken Document Indexing

Selection of Best Match Keyword using Spoken Term Detection for Spoken Document Indexing Selection of Best Match Keyword using Spoken Term Detection for Spoken Document Indexing Kentaro Domoto, Takehito Utsuro, Naoki Sawada and Hiromitsu Nishizaki Graduate School of Systems and Information

More information

IRCE at the NTCIR-12 IMine-2 Task

IRCE at the NTCIR-12 IMine-2 Task IRCE at the NTCIR-12 IMine-2 Task Ximei Song University of Tsukuba songximei@slis.tsukuba.ac.jp Yuka Egusa National Institute for Educational Policy Research yuka@nier.go.jp Masao Takaku University of

More information

We Recommend: Recommender System based on Product Reviews

We Recommend: Recommender System based on Product Reviews IJSTE - International Journal of Science Technology & Engineering Volume 2 Issue 12 June 2016 ISSN (online): 2349-784X We Recommend: Recommender System based on Product Reviews Vedita Velingker PG. Student

More information

Exploring the use of Paragraph-level Annotations for Sentiment Analysis of Financial Blogs

Exploring the use of Paragraph-level Annotations for Sentiment Analysis of Financial Blogs Exploring the use of Paragraph-level Annotations for Sentiment Analysis of Financial Blogs Paul Ferguson 1, Neil O Hare 1, Michael Davy 2, Adam Bermingham 1, Scott Tattersall 3, Paraic Sheridan 2, Cathal

More information

Event Detection using Archived Smart House Sensor Data obtained using Symbolic Aggregate Approximation

Event Detection using Archived Smart House Sensor Data obtained using Symbolic Aggregate Approximation Event Detection using Archived Smart House Sensor Data obtained using Symbolic Aggregate Approximation Ayaka ONISHI 1, and Chiemi WATANABE 2 1,2 Graduate School of Humanities and Sciences, Ochanomizu University,

More information

Finding Topic-centric Identified Experts based on Full Text Analysis

Finding Topic-centric Identified Experts based on Full Text Analysis Finding Topic-centric Identified Experts based on Full Text Analysis Hanmin Jung, Mikyoung Lee, In-Su Kang, Seung-Woo Lee, Won-Kyung Sung Information Service Research Lab., KISTI, Korea jhm@kisti.re.kr

More information

Finding Neighbor Communities in the Web using Inter-Site Graph

Finding Neighbor Communities in the Web using Inter-Site Graph Finding Neighbor Communities in the Web using Inter-Site Graph Yasuhito Asano 1, Hiroshi Imai 2, Masashi Toyoda 3, and Masaru Kitsuregawa 3 1 Graduate School of Information Sciences, Tohoku University

More information

Automatic Wordnet Mapping: from CoreNet to Princeton WordNet

Automatic Wordnet Mapping: from CoreNet to Princeton WordNet Automatic Wordnet Mapping: from CoreNet to Princeton WordNet Jiseong Kim, Younggyun Hahm, Sunggoo Kwon, Key-Sun Choi Semantic Web Research Center, School of Computing, KAIST 291 Daehak-ro, Yuseong-gu,

More information

Question Answering Systems

Question Answering Systems Question Answering Systems An Introduction Potsdam, Germany, 14 July 2011 Saeedeh Momtazi Information Systems Group Outline 2 1 Introduction Outline 2 1 Introduction 2 History Outline 2 1 Introduction

More information

Chapter 8. Evaluating Search Engine

Chapter 8. Evaluating Search Engine Chapter 8 Evaluating Search Engine Evaluation Evaluation is key to building effective and efficient search engines Measurement usually carried out in controlled laboratory experiments Online testing can

More information

Improving Recognition through Object Sub-categorization

Improving Recognition through Object Sub-categorization Improving Recognition through Object Sub-categorization Al Mansur and Yoshinori Kuno Graduate School of Science and Engineering, Saitama University, 255 Shimo-Okubo, Sakura-ku, Saitama-shi, Saitama 338-8570,

More information

Classifying Twitter Data in Multiple Classes Based On Sentiment Class Labels

Classifying Twitter Data in Multiple Classes Based On Sentiment Class Labels Classifying Twitter Data in Multiple Classes Based On Sentiment Class Labels Richa Jain 1, Namrata Sharma 2 1M.Tech Scholar, Department of CSE, Sushila Devi Bansal College of Engineering, Indore (M.P.),

More information

Multilingual Information Processing for Digital Libraries

Multilingual Information Processing for Digital Libraries Multilingual Information Processing for Digital Libraries Akira Maeda Department of Computer Science, Ritsumeikan University 1-1-1 Noji-higashi, Kusatsu, Shiga 525-8577, Japan E-mail: amaeda@cs.ritsumei.ac.jp

More information

Challenge. Case Study. The fabric of space and time has collapsed. What s the big deal? Miami University of Ohio

Challenge. Case Study. The fabric of space and time has collapsed. What s the big deal? Miami University of Ohio Case Study Use Case: Recruiting Segment: Recruiting Products: Rosette Challenge CareerBuilder, the global leader in human capital solutions, operates the largest job board in the U.S. and has an extensive

More information

Retrieval in Texts with Traditional Mongolian Script Realizing Unicoded Traditional Mongolian Digital Library

Retrieval in Texts with Traditional Mongolian Script Realizing Unicoded Traditional Mongolian Digital Library Retrieval in Texts with Traditional Mongolian Script Realizing Unicoded Traditional Mongolian Digital Library Garmaabazar Khaltarkhuu and Akira Maeda Graduate School of Science and Engineering, Ritsumeikan

More information

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate

More information

A Co-occurrence Graph-based Approach for Personal Name Alias Extraction from Anchor Texts

A Co-occurrence Graph-based Approach for Personal Name Alias Extraction from Anchor Texts A Co-occurrence Graph-based Approach for Personal Name Alias Extraction from Anchor Texts Danushka Bollegala The University of Tokyo 7-3-1, Hongo, Tokyo, 113-8656, Japan danushka@mi.ci.i.utokyo.ac.jp Yutaka

More information

CHAPTER 7 CONCLUSION AND FUTURE WORK

CHAPTER 7 CONCLUSION AND FUTURE WORK CHAPTER 7 CONCLUSION AND FUTURE WORK 7.1 Conclusion Data pre-processing is very important in data mining process. Certain data cleaning techniques usually are not applicable to all kinds of data. Deduplication

More information

Sentiment analysis under temporal shift

Sentiment analysis under temporal shift Sentiment analysis under temporal shift Jan Lukes and Anders Søgaard Dpt. of Computer Science University of Copenhagen Copenhagen, Denmark smx262@alumni.ku.dk Abstract Sentiment analysis models often rely

More information

Navigation Retrieval with Site Anchor Text

Navigation Retrieval with Site Anchor Text Navigation Retrieval with Site Anchor Text Hideki Kawai Kenji Tateishi Toshikazu Fukushima NEC Internet Systems Research Labs. 8916-47, Takayama-cho, Ikoma-city, Nara, JAPAN {h-kawai@ab, k-tateishi@bq,

More information

An Adaptive Query Processing Method according to System Environments in Database Broadcasting Systems

An Adaptive Query Processing Method according to System Environments in Database Broadcasting Systems An Query Processing Method according to System Environments in Database Broadcasting Systems M. KASHITA T. TERADA T. HARA Graduate School of Engineering, Cybermedia Center, Graduate School of Information

More information

NAME mendex Japanese index processor

NAME mendex Japanese index processor NAME mendex Japanese index processor SYNOPSIS mendex [-ilqrcgfejsu] [-s sty] [-d dic] [-o ind] [-t log] [-p no] [-I enc] [--help] [--] [idx0 idx1 idx2...] DESCRIPTION The program mendex is a general purpose

More information

Identifying and Ranking Possible Semantic and Common Usage Categories of Search Engine Queries

Identifying and Ranking Possible Semantic and Common Usage Categories of Search Engine Queries Identifying and Ranking Possible Semantic and Common Usage Categories of Search Engine Queries Reza Taghizadeh Hemayati 1, Weiyi Meng 1, Clement Yu 2 1 Department of Computer Science, Binghamton university,

More information

Transliteration of Tamil and Other Indic Scripts. Ram Viswanadha Unicode Software Engineer IBM Globalization Center of Competency, California, USA

Transliteration of Tamil and Other Indic Scripts. Ram Viswanadha Unicode Software Engineer IBM Globalization Center of Competency, California, USA Transliteration of Tamil and Other Indic Scripts Ram Viswanadha Unicode Software Engineer IBM Globalization Center of Competency, California, USA Main points of Powerpoint presentation This talk gives

More information

FOCUS ON: DATABASE MANAGEMENT

FOCUS ON: DATABASE MANAGEMENT EXCEL 2002 (XP) FOCUS ON: DATABASE MANAGEMENT December 16, 2005 ABOUT GLOBAL KNOWLEDGE, INC. Global Knowledge, Inc., the world s largest independent provider of integrated IT education solutions, is dedicated

More information

A Filtering System Based on Personal Profiles

A  Filtering System Based on Personal Profiles A E-mail Filtering System Based on Personal Profiles Masami Shishibori, Kazuaki Ando and Jun-ichi Aoe Department of Information Science & Intelligent Systems, The University of Tokushima 2-1 Minami-Jhosanjima-Cho,

More information

ABRIR at NTCIR-9 at GeoTime Task Usage of Wikipedia and GeoNames for Handling Named Entity Information

ABRIR at NTCIR-9 at GeoTime Task Usage of Wikipedia and GeoNames for Handling Named Entity Information ABRIR at NTCIR-9 at GeoTime Task Usage of Wikipedia and GeoNames for Handling Named Entity Information Masaharu Yoshioka Graduate School of Information Science and Technology, Hokkaido University N14 W9,

More information

Cross-Lingual Information Access and Its Evaluation

Cross-Lingual Information Access and Its Evaluation Cross-Lingual Information Access and Its Evaluation Noriko Kando Research and Development Department National Center for Science Information Systems (NACSIS), Japan URL: http://www.rd.nacsis.ac.jp/~{ntcadm,kando}/

More information

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. 6 What is Web Mining? p. 6 Summary of Chapters p. 8 How

More information

Design and Development of Japanese Law Translation Database System

Design and Development of Japanese Law Translation Database System Design and Development of Japanese Law Translation Database System TOYAMA Katsuhiko a, d, SAITO Daichi c, d, SEKINE Yasuhiro c, d, OGAWA Yasuhiro a, d, KAKUTA Tokuyasu b, d, KIMURA Tariho b, d and MATSUURA

More information

On the Effectiveness of Web Usage Mining for Page Recommendation and Restructuring

On the Effectiveness of Web Usage Mining for Page Recommendation and Restructuring On the Effectiveness of Web Usage Mining for Recommendation and Restructuring Hiroshi Ishikawa, Manabu Ohta, Shohei Yokoyama, Junya Nakayama, and Kaoru Katayama Tokyo Metropolitan University Abstract.

More information

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN:

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN: IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 20131 Improve Search Engine Relevance with Filter session Addlin Shinney R 1, Saravana Kumar T

More information

Domain Specific Search Engine for Students

Domain Specific Search Engine for Students Domain Specific Search Engine for Students Domain Specific Search Engine for Students Wai Yuen Tang The Department of Computer Science City University of Hong Kong, Hong Kong wytang@cs.cityu.edu.hk Lam

More information

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es

More information

With A Little Help From Yelp

With A Little Help From Yelp With A Little Help From Yelp Caitlin Bonnar Computer Science University of Washington cbonnar@cs.uw.edu Felicia Cordeiro Computer Science University of Washington Felicia0@cs.uw.edu Julie Michelman Statistics

More information