World Wide Web Collecting Encyclopedic Knowledge Using the World Wide Web Atsushi Fujii Tetsuya Ishikawa Universi
|
|
- Suzanna Reeves
- 5 years ago
- Views:
Transcription
1 World Wide Web Collecting Encyclopedic Knowledge Using the World Wide Web Atsushi Fujii Tetsuya Ishikawa University of Library and Information Science Abstract: This paper proposes a method to collect encyclopedic knowledge from the World Wide Web. For this purpose, we uselinguistic patterns and text structures provided with HTML tags to extract text fragments containing term descriptions, from Web pages. We then use a language model to discard extraneous descriptions, and a clustering method to summarize resultant descriptions. We show the eectiveness of our method by way of experiments. 1 / World Wide Web Web Web [17, 18] HTML World Wide Web 20 50% [18] [8] [4] [13] Web Web [11, 12] [10, 24] Web [2, 9] Web Web
2 Web [5, 15] Web (1) (2) (3) (4) 4 1 (1) Web Web (2) Web HTML Web a) b) c) HTML <X>...</X> (3) Web 1 (4) 2 1 Web Web Web (1) (4) Web Web 2 2
3 Web Web Web 1: 2: 3.3 a) b) [16, 21] [17] CD-ROM 8 [20] 2 EDR 12 [19] 2,259 [23] HTML Web HTML 2 <H> <B>? <A> HTML
4 HTML HTML 1 1. <P>...</P> </P> <P> 1 2. <UL>...</UL> 3. N. 3 N =3 3.4 (1)/ (0) 2 N N Web CMU-Cambridge [1] 2 [22] N perplexity 1, [14] 8 5 Hierarchical Baysian Clustering: HBC [6] HBC HBC : 4.1 Web
5 NACSIS [7] NACSIS goo Web % 27/ goo Web goo 1, Zipf % 292/ % 266/ NACSIS : NACSIS % 67.9% World Wide Web [3]
6 2: 27 Web Zipf 15 6, , ,399 3, ,3899, , ,6861,6943,141 1,682 10,078 1, , ,938 2, CD-ROM NACSIS [1] Philip Clarkson and Ronald Rosenfeld. Statistical language modeling using the CMU-Cambridge toolkit. In Proceedings of EuroSpeech'97, pp. 2707{ 2710, [2] Oren Etzioni. Moving up the information food chain. AI Magazine, Vol. 18, No. 2, pp. 11{18, [3] Atsushi Fujii and Tetsuya Ishikawa. Applying machine translation to two-stage cross-language information retrieval. In Proceedings of the 4th Conference of the Association for Machine Translation in the Americas, [4] Vasileios Hatzivassiloglou and Kathleen R. McKeown. Towards the automatic identication of adjectival scales: Clustering adjectives according to meaning. In Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, pp. 172{182, [5] Akihiro Inokuchi, Takashi Washio, Hiroshi Motoda, Kouhei Kumasawa, and Naohide Arai. Basket analysis for graph structured data. In Proceedings of the 3rd Pacic-Asia Conference on Knowledge Discovery and Data Mining, pp. 420{431, [6] Makoto Iwayama and Takenobu Tokunaga. Hierarchical Bayesian clustering for automatic text classication. In Proceedings of the 14th International Joint Conference on Articial Intelligence, pp. 1322{1327, [7] Noriko Kando, Kazuko Kuriyama, and Toshihiko Nozue. NACSIS test collection workshop (NTCIR- 1). In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 299{300, [8] Julian Kupiec and John Maxwell. Training stochastic grammars from unlabelled text corpora. In Workshop on Statistically-Based Natural Language Programming Techniques, AAAI Technical Reports WS [9] Hougen Lu, Leon Sterling, and Alex Wyatt. Knowledge discovery in SportsFinder: An agent to extract sports results from the Web. In Proceedings of the 3rd Pacic-Asia Conference on Knowledge Discovery and Data Mining, pp. 469{473, [10] Andrew McCallum, Kamal Nigam, Jason Rennie, and Kristie Seymore. A machine learning approach to building domain-specic search engines. In Proceedings of the 16th International Joint Conference on Articial Intelligence, pp. 662{667, [11] Jian-Yun Nie, Michel Simard, Pierre Isabelle, and Richard Durand. Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 74{81, [12] Philip Resnik. Mining the Web for bilingual texts. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, pp. 527{ 534, [13] Frank Smadja, Kathleen R. McKeown, and Vasileios Hatzivassiloglou. Translating collocations for bilingual lexicons: A statistical approach. Computational Linguistics, Vol. 22, No. 1, pp. 1{38, [14].., [15],,,.., Vol. 15, No. 3, pp. 485{494, [16],,,.., Vol. 7, No. 2, pp. 336{345, [17],. Quest. 5, pp. 353{356, [18],. Web. 6, pp. 296{299, [19]., [20]. CD-ROM, [21],,.. 5, pp. 124{127, [22]. CD- '94-'95, [23],,,,. version 1.5. Technical Report NAIST-IS-TR97007,, [24].. 6, pp. 447{450, 2000.
arxiv:cs/ v1 [cs.cl] 2 Nov 2000
Applying Machine Translation to Two-Stage Cross-Language Information Retrieval Atsushi Fujii and Tetsuya Ishikawa arxiv:cs/0011003v1 [cs.cl] 2 Nov 2000 University of Library and Information Science 1-2
More informationDocument Structure Analysis in Associative Patent Retrieval
Document Structure Analysis in Associative Patent Retrieval Atsushi Fujii and Tetsuya Ishikawa Graduate School of Library, Information and Media Studies University of Tsukuba 1-2 Kasuga, Tsukuba, 305-8550,
More informationCross-Lingual Information Access and Its Evaluation
Cross-Lingual Information Access and Its Evaluation Noriko Kando Research and Development Department National Center for Science Information Systems (NACSIS), Japan URL: http://www.rd.nacsis.ac.jp/~{ntcadm,kando}/
More informationAn Iterative Link-based Method for Parallel Web Page Mining
An Iterative Link-based Method for Parallel Web Page Mining Le Liu 1, Yu Hong 1*, Jun Lu 2, Jun Lang 2, Heng Ji 3, Jianmin Yao 1 1 School of Computer Science & Technology, Soochow University, Suzhou, 215006,
More informationOverview of the Patent Mining Task at the NTCIR-8 Workshop
Overview of the Patent Mining Task at the NTCIR-8 Workshop Hidetsugu Nanba Atsushi Fujii Makoto Iwayama Taiichi Hashimoto Graduate School of Information Sciences, Hiroshima City University 3-4-1 Ozukahigashi,
More information1.
* 390/0/2 : 389/07/20 : 2 25-8223 ( ) 2 25-823 ( ) ISC SCOPUS L ISA http://jist.irandoc.ac.ir 390 22-97 - :. aminnezarat@gmail.com mosavit@pnu.ac.ir : ( ).... 00.. : 390... " ". ( )...2 2. 3. 4 Google..
More informationOverview of the Patent Retrieval Task at the NTCIR-6 Workshop
Overview of the Patent Retrieval Task at the NTCIR-6 Workshop Atsushi Fujii, Makoto Iwayama, Noriko Kando Graduate School of Library, Information and Media Studies University of Tsukuba 1-2 Kasuga, Tsukuba,
More informationOverview of Patent Retrieval Task at NTCIR-5
Overview of Patent Retrieval Task at NTCIR-5 Atsushi Fujii, Makoto Iwayama, Noriko Kando Graduate School of Library, Information and Media Studies University of Tsukuba 1-2 Kasuga, Tsukuba, 305-8550, Japan
More informationindexing and query processing. The inverted le was constructed for the retrieval target collection which contains full texts of two years' Japanese pa
Term Distillation in Patent Retrieval Hideo Itoh Hiroko Mano Yasushi Ogawa Software R&D Group, RICOH Co., Ltd. 1-1-17 Koishikawa, Bunkyo-ku, Tokyo 112-0002, JAPAN fhideo,mano,yogawag@src.ricoh.co.jp Abstract
More informationOverview of Classification Subtask at NTCIR-6 Patent Retrieval Task
Overview of Classification Subtask at NTCIR-6 Patent Retrieval Task Makoto Iwayama *, Atsushi Fujii, Noriko Kando * Hitachi, Ltd., 1-280 Higashi-koigakubo, Kokubunji, Tokyo 185-8601, Japan makoto.iwayama.nw@hitachi.com
More informationDomain Specific Search Engine for Students
Domain Specific Search Engine for Students Domain Specific Search Engine for Students Wai Yuen Tang The Department of Computer Science City University of Hong Kong, Hong Kong wytang@cs.cityu.edu.hk Lam
More informationELIJAH, Extracting Genealogy from the Web David Barney and Rachel Lee {dbarney, WhizBang! Labs and Brigham Young University
ELIJAH, Extracting Genealogy from the Web David Barney and Rachel Lee {dbarney, rlee}@whizbang.com WhizBang! Labs and Brigham Young University 1. Introduction On-line genealogy is becoming America s latest
More informationA Novel Method for Bilingual Web Page Acquisition from Search Engine Web Records
A Novel Method for Bilingual Web Page Acquisition from Search Engine Web Records Yanhui Feng, Yu Hong, Zhenxiang Yan, Jianmin Yao, Qiaoming Zhu School of Computer Science & Technology, Soochow University
More informationAnnotated Suffix Trees for Text Clustering
Annotated Suffix Trees for Text Clustering Ekaterina Chernyak and Dmitry Ilvovsky National Research University Higher School of Economics Moscow, Russia echernyak,dilvovsky@hse.ru Abstract. In this paper
More informationCreating Large-scale Training and Test Corpora for Extracting Structured Data from the Web
Creating Large-scale Training and Test Corpora for Extracting Structured Data from the Web Robert Meusel and Heiko Paulheim University of Mannheim, Germany Data and Web Science Group {robert,heiko}@informatik.uni-mannheim.de
More informationjldadmm: A Java package for the LDA and DMM topic models
jldadmm: A Java package for the LDA and DMM topic models Dat Quoc Nguyen School of Computing and Information Systems The University of Melbourne, Australia dqnguyen@unimelb.edu.au Abstract: In this technical
More informationIPAL at CLEF 2008: Mixed-Modality based Image Search, Novelty based Re-ranking and Extended Matching
IPAL at CLEF 2008: Mixed-Modality based Image Search, Novelty based Re-ranking and Extended Matching Sheng Gao, Jean-Pierre Chevallet and Joo-Hwee Lim IPAL, Institute for Infocomm Research, A*Star, Singapore
More informationUsing Monolingual Clickthrough Data to Build Cross-lingual Search Systems
Using Monolingual Clickthrough Data to Build Cross-lingual Search Systems ABSTRACT Vamshi Ambati Institute for Software Research International Carnegie Mellon University Pittsburgh, PA vamshi@cmu.edu A
More informationAutomatically Generating Queries for Prior Art Search
Automatically Generating Queries for Prior Art Search Erik Graf, Leif Azzopardi, Keith van Rijsbergen University of Glasgow {graf,leif,keith}@dcs.gla.ac.uk Abstract This report outlines our participation
More informationSparse unsupervised feature learning for sentiment classification of short documents
Sparse unsupervised feature learning for sentiment classification of short documents Simone Albertini Ignazio Gallo Alessandro Zamberletti University of Insubria Varese, Italy simone.albertini@uninsubria.it
More informationInformation Agents for Competitive Market Monitoring in Production Chains
Agents for Competitive Market Monitoring in Production Chains Gerhard Schiefer and Melanie Fritz University of Bonn, Business and Management e-mail: schiefer@uni-bonn.de m.fritz@uni-bonn.de Abstract The
More informationHow SPICE Language Modeling Works
How SPICE Language Modeling Works Abstract Enhancement of the Language Model is a first step towards enhancing the performance of an Automatic Speech Recognition system. This report describes an integrated
More informationIJSER. Privacy and Data Mining
Privacy and Data Mining 2177 Shilpa M.S Dept. of Computer Science Mohandas College of Engineering and Technology Anad,Trivandrum shilpams333@gmail.com Shalini.L Dept. of Computer Science Mohandas College
More informationKeywords : Bayesian, classification, tokens, text, probability, keywords. GJCST-C Classification: E.5
Global Journal of Computer Science and Technology Software & Data Engineering Volume 12 Issue 13 Version 1.0 Year 2012 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global
More informationUsing the K-Nearest Neighbor Method and SMART Weighting in the Patent Document Categorization Subtask at NTCIR-6
Using the K-Nearest Neighbor Method and SMART Weighting in the Patent Document Categorization Subtask at NTCIR-6 Masaki Murata National Institute of Information and Communications Technology 3-5 Hikaridai,
More informationA hybrid method to categorize HTML documents
Data Mining VI 331 A hybrid method to categorize HTML documents M. Khordad, M. Shamsfard & F. Kazemeyni Electrical & Computer Engineering Department, Shahid Beheshti University, Iran Abstract In this paper
More informationThe NTCIR Workshop : the First Evaluation Workshop on Japanese Text Retrieval and Cross-Lingual Information Retrieval
The NTCIR Workshop : the First Evaluation Workshop on Japanese Text Retrieval and Cross-Lingual Information Retrieval Noriko Kando, Kazuko Kuriyama, Toshihiko Nozue, Koji Eguchi, Hiroyuki Kato, Soichiro
More informationInferring Ongoing Activities of Workstation Users by Clustering
Inferring Ongoing Activities of Workstation Users by Clustering Email 1. The Problem Yifen Huang, Dinesh Govindaraju, Tom Mitchell School of Computer Science Carnegie Mellon University We are interested
More informationApplying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task
Applying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task Walid Magdy, Gareth J.F. Jones Centre for Next Generation Localisation School of Computing Dublin City University,
More informationMINING GRAPH DATA EDITED BY. Diane J. Cook School of Electrical Engineering and Computei' Science Washington State University Puliman, Washington
MINING GRAPH DATA EDITED BY Diane J. Cook School of Electrical Engineering and Computei' Science Washington State University Puliman, Washington Lawrence B. Holder School of Electrical Engineering and
More informationLatent Relation Representations for Universal Schemas
University of Massachusetts Amherst From the SelectedWorks of Andrew McCallum 2013 Latent Relation Representations for Universal Schemas Sebastian Riedel Limin Yao Andrew McCallum, University of Massachusetts
More informationScalable Trigram Backoff Language Models
Scalable Trigram Backoff Language Models Kristie Seymore Ronald Rosenfeld May 1996 CMU-CS-96-139 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 This material is based upon work
More informationLearning to find transliteration on the Web
Learning to find transliteration on the Web Chien-Cheng Wu Department of Computer Science National Tsing Hua University 101 Kuang Fu Road, Hsin chu, Taiwan d9283228@cs.nthu.edu.tw Jason S. Chang Department
More informationA recommendation engine by using association rules
Available online at www.sciencedirect.com Procedia - Social and Behavioral Sciences 62 ( 2012 ) 452 456 WCBEM 2012 A recommendation engine by using association rules Ozgur Cakir a 1, Murat Efe Aras b a
More informationFinding Scientific Papers with HomePageSearch and MOPS
Finding Scientific Papers with HomePageSearch and MOPS Gerd Hoff Universität Trier FB IV Informatik D-54286 Trier, Germany hoffg@uni-trier.de Martin Mundhenk Friedrich-Schiller-Universität Jena Fakultät
More informationGraph Mining Sub Domains and a Framework for Indexing A Graphical Approach
Graph Mining Sub Domains and a Framework for Indexing A Graphical Approach K. Vivekanandan Professor BSMED A. Pankaj Moses Monickaraj (Correspoding author) Doctoral Scholar Department of Computer Science
More informationColumbia University (office) Computer Science Department (mobile) Amsterdam Avenue
Wisam Dakka Columbia University (office) 212-939-7116 Computer Science Department (mobile) 646-643-1306 1214 Amsterdam Avenue wisam@cs.columbia.edu New York, New York, 10027 www.cs.columbia.edu/~wisam
More informationIncorporating Hyperlink Analysis in Web Page Clustering
Incorporating Hyperlink Analysis in Web Page Clustering Michael Chau School of Business The University of Hong Kong Pokfulam, Hong Kong +852 2859-1014 mchau@business.hku.hk Patrick Y. K. Chau School of
More informationA component-centric UML based approach for modeling the architecture of web applications.
International Journal of Recent Research and Review, Vol. V, March 2013 ISSN 2277 8322 A component-centric UML based approach for modeling the architecture of web applications. Mukesh Kataria 1 1 Affiliated
More informationText Mining. Munawar, PhD. Text Mining - Munawar, PhD
10 Text Mining Munawar, PhD Definition Text mining also is known as Text Data Mining (TDM) and Knowledge Discovery in Textual Database (KDT).[1] A process of identifying novel information from a collection
More informationA Combined Method of Text Summarization via Sentence Extraction
Proceedings of the 2007 WSEAS International Conference on Computer Engineering and Applications, Gold Coast, Australia, January 17-19, 2007 434 A Combined Method of Text Summarization via Sentence Extraction
More informationUsing Maximum Entropy for Automatic Image Annotation
Using Maximum Entropy for Automatic Image Annotation Jiwoon Jeon and R. Manmatha Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts Amherst Amherst, MA-01003.
More informationImproving Query Translation for Cross-Language Information Retrieval using a Web-based Approach
Improving Query Translation for Cross-Language Information Retrieval using a Web-based Approach Jian Hu 1 Gui-Rong Xue 1 Hua-Jun Zeng 2 Fan-Yuan Ma 1 Ming Zhou 2 1 Computer Science and Engineering, Shanghai
More informationIterative Learning of Relation Patterns for Market Analysis with UIMA
UIMA Workshop, GLDV, Tübingen, 09.04.2007 Iterative Learning of Relation Patterns for Market Analysis with UIMA Sebastian Blohm, Jürgen Umbrich, Philipp Cimiano, York Sure Universität Karlsruhe (TH), Institut
More informationMetadata Extraction from Scholarly Articles
Metadata Extraction from Scholarly Articles Ricardo Candeias Instituto Superior Técnico, INESC-ID Av. Professor Cavaco Silva, 2744-016 Porto Salvo, Portugal Abstract. Modern digital libraries of scholarly
More informationSentiment Learning on Product Reviews via Sentiment Ontology Tree
Sentiment Learning on Product Reviews via Sentiment Ontology Tree Wei Wei Department of Computer and Information Science Norwegian University of Science and Technology wwei@idi.ntnu.no Jon Atle Gulla Department
More informationFrom CLIR to CLIE: Some Lessons in NTCIR Evaluation
From CLIR to CLIE: Some Lessons in NTCIR Evaluation Hsin-Hsi Chen Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan +886-2-33664888 ext 311 hhchen@csie.ntu.edu.tw
More informationWeighted Suffix Tree Document Model for Web Documents Clustering
ISBN 978-952-5726-09-1 (Print) Proceedings of the Second International Symposium on Networking and Network Security (ISNNS 10) Jinggangshan, P. R. China, 2-4, April. 2010, pp. 165-169 Weighted Suffix Tree
More informationUsing Reinforcement Learning to Spider the Web Efficiently
Using Reinforcement Learning to Spider the Web Efficiently Jason Rennie jrennie@justresearch.com School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Andrew Kachites McCallum mccallum@justresearch.com
More informationsecond_language research_teaching sla vivian_cook language_department idl
Using Implicit Relevance Feedback in a Web Search Assistant Maria Fasli and Udo Kruschwitz Department of Computer Science, University of Essex, Wivenhoe Park, Colchester, CO4 3SQ, United Kingdom fmfasli
More informationProceedings of NTCIR-9 Workshop Meeting, December 6-9, 2011, Tokyo, Japan
The Report on Subtopic Mining and Document Ranking of NTCIR-9 Intent Task Wei-Lun Xiao, CSIE, Chaoyang University of Technology No. 168, Jifong E. Rd., Wufong, Taichung, Taiwan, R.O.C s9927632@cyut.edu.tw
More informationYou ve Got A Workflow Management Extraction System
342 Journal of Reviews on Global Economics, 2017, 6, 342-349 You ve Got Email: A Workflow Management Extraction System Piyanuch Chaipornkaew 1, Takorn Prexawanprasut 1,* and Michael McAleer 2-6 1 College
More informationInternational Journal of Advanced Research in Computer Science and Software Engineering
Volume 3, Issue 3, March 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue:
More informationApplying Clustering of Hierarchical K-means-like Algorithm on Arabic Language
Applying Clustering of Hierarchical K-means-like Algorithm on Arabic Language Sameh H. Ghwanmeh Abstract In this study a clustering technique has been implemented which is K-Means like with hierarchical
More informationA Novel Method of Optimizing Website Structure
A Novel Method of Optimizing Website Structure Mingjun Li 1, Mingxin Zhang 2, Jinlong Zheng 2 1 School of Computer and Information Engineering, Harbin University of Commerce, Harbin, 150028, China 2 School
More informationAutomatic Classification of Web Pages using Back Propagation
Automatic Classification of Web Pages using Back Propagation Poonam Nagale Student DYPSOET,Lohegaon Pune, India ABSTRACT Word Wide Web is huge repository of information. So there is tremendous growth in
More informationQuery Expansion from Wikipedia and Topic Web Crawler on CLIR
Query Expansion from Wikipedia and Topic Web Crawler on CLIR Meng-Chun Lin, Ming-Xiang Li, Chih-Chuan Hsu and Shih-Hung Wu* Department of Computer Science and Information Engineering Chaoyang University
More informationMetaNews: An Information Agent for Gathering News Articles On the Web
MetaNews: An Information Agent for Gathering News Articles On the Web Dae-Ki Kang 1 and Joongmin Choi 2 1 Department of Computer Science Iowa State University Ames, IA 50011, USA dkkang@cs.iastate.edu
More informationGraph-Based Concept Clustering for Web Search Results
International Journal of Electrical and Computer Engineering (IJECE) Vol. 5, No. 6, December 2015, pp. 1536~1544 ISSN: 2088-8708 1536 Graph-Based Concept Clustering for Web Search Results Supakpong Jinarat*,
More informationModeling Slang-style Word Formation for Retrieving Evaluative Information
Modeling Slang-style Word Formation for Retrieving Evaluative Information Atsushi Fujii Graduate School of Library, Information and Media Studies University of Tsukuba 1-2 Kasuga, Tsukuba, 305-8550, Japan
More informationImproving Statistical Word Alignment with Ensemble Methods
Improving Statiical Word Alignment with Ensemble Methods Hua Wu and Haifeng Wang Toshiba (China) Research and Development Center, 5/F., Tower W2, Oriental Plaza, No.1, Ea Chang An Ave., Dong Cheng Dirict,
More informationMethodology for evaluating citation parsing and matching
Methodology for evaluating citation parsing and matching Mateusz Fedoryszak, Lukasz Bolikowski, Dominika Tkaczyk, and Krzyś Wojciechowski Interdisciplinary Centre for Mathematical and Computational Modelling,
More informationCATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING
CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING Amol Jagtap ME Computer Engineering, AISSMS COE Pune, India Email: 1 amol.jagtap55@gmail.com Abstract Machine learning is a scientific discipline
More informationCentroid-Based Document Classification: Analysis & Experimental Results?
Centroid-Based Document Classification: Analysis & Experimental Results? Eui-Hong (Sam) Han and George Karypis University of Minnesota, Department of Computer Science / Army HPC Research Center Minneapolis,
More informationTEXT CHAPTER 5. W. Bruce Croft BACKGROUND
41 CHAPTER 5 TEXT W. Bruce Croft BACKGROUND Much of the information in digital library or digital information organization applications is in the form of text. Even when the application focuses on multimedia
More informationText mining on a grid environment
Data Mining X 13 Text mining on a grid environment V. G. Roncero, M. C. A. Costa & N. F. F. Ebecken COPPE/Federal University of Rio de Janeiro, Brazil Abstract The enormous amount of information stored
More informationPh.D. in Computer Science & Technology, Tsinghua University, Beijing, China, 2007
Yiqun Liu Associate Professor & Department co-chair Department of Computer Science and Technology Email yiqunliu@tsinghua.edu.cn URL http://www.thuir.org/group/~yqliu Phone +86-10-62796672 Fax +86-10-62796672
More informationFinding parallel texts on the web using cross-language information retrieval
Finding parallel texts on the web using cross-language information retrieval Achim Ruopp University of Washington, Seattle, WA 98195, USA achimr@u.washington.edu Fei Xia University of Washington Seattle,
More informationMining Parallel Documents Using Low Bandwidth and High Precision CLIR from the Heterogeneous Web
Mining Parallel Documents Using Low Bandwidth and High Precision CLIR from the Heterogeneous Web Simon Shi 1, Pascale Fung 1, Emmanuel Prochasson 2, Chi-kiu Lo 1 and Dekai Wu 1 1 Human Language Technology
More informationA Comparison of Text-Categorization Methods applied to N-Gram Frequency Statistics
A Comparison of Text-Categorization Methods applied to N-Gram Frequency Statistics Helmut Berger and Dieter Merkl 2 Faculty of Information Technology, University of Technology, Sydney, NSW, Australia hberger@it.uts.edu.au
More informationIntegrate Multilingual Web Search Results using Cross-Lingual Topic Models
Integrate Multilingual Web Search Results using Cross-Lingual Topic Models Duo Ding Shanghai Jiao Tong University, Shanghai, 200240, P.R. China dingduo1@gmail.com Abstract With the thriving of the Internet,
More informationAndrew Davenport and Edward Tsang. fdaveat,edwardgessex.ac.uk. mostly soluble problems and regions of overconstrained, mostly insoluble problems as
An empirical investigation into the exceptionally hard problems Andrew Davenport and Edward Tsang Department of Computer Science, University of Essex, Colchester, Essex CO SQ, United Kingdom. fdaveat,edwardgessex.ac.uk
More informationMaking Sense Out of the Web
Making Sense Out of the Web Rada Mihalcea University of North Texas Department of Computer Science rada@cs.unt.edu Abstract. In the past few years, we have witnessed a tremendous growth of the World Wide
More informationAnalyzing Document Retrievability in Patent Retrieval Settings
Analyzing Document Retrievability in Patent Retrieval Settings Shariq Bashir and Andreas Rauber Institute of Software Technology and Interactive Systems, Vienna University of Technology, Austria {bashir,rauber}@ifs.tuwien.ac.at
More informationSPIDERING AND FILTERING WEB PAGES FOR VERTICAL SEARCH ENGINES
Association for Information Systems AIS Electronic Library (AISeL) AMCIS 2002 Proceedings Americas Conference on Information Systems (AMCIS) December 2002 SPIDERING AND FILTERING WEB PAGES FOR VERTICAL
More informationInformation Granulation for Web based Information Retrieval Support Systems
Information Granulation for Web based Information Retrieval Support Systems J.T. Yao Y.Y. Yao Department of Computer Science University of Regina Regina, Saskatchewan, Canada S4S 0A2 E-mail: {jtyao, yyao}@cs.uregina.ca
More informationHYBRID APROACH FOR WEB PAGE CLASSIFICATION BASED ON FIREFLY AND ANT COLONY OPTIMIZATION
HYBRID APROACH FOR WEB PAGE CLASSIFICATION BASED ON FIREFLY AND ANT COLONY OPTIMIZATION ABSTRACT: Poonam Asawara, Dr Amit Shrivastava and Dr Manish Manoria Department of Computer Science and Engineering
More informationQuagmire or Goldmine?
The World-Wide Wide Web: Quagmire or Goldmine? Oren Etzioni [Comm. of the ACM, Nov 1996] Presentation Credits: Shabnam Sobti 30 - OCT - 2002 WWW - Quagmire or Goldmine? 1 Agenda Prelude: The Internet Story
More informationImproving Suffix Tree Clustering Algorithm for Web Documents
International Conference on Logistics Engineering, Management and Computer Science (LEMCS 2015) Improving Suffix Tree Clustering Algorithm for Web Documents Yan Zhuang Computer Center East China Normal
More informationPARAMETRIC STUDY WITH GEOFRAC: A THREE-DIMENSIONAL STOCHASTIC FRACTURE FLOW MODEL. Alessandra Vecchiarelli, Rita Sousa, Herbert H.
PROCEEDINGS, Thirty-Eighth Workshop on Geothermal Reservoir Engineering Stanford University, Stanford, California, February 3, 23 SGP-TR98 PARAMETRIC STUDY WITH GEOFRAC: A THREE-DIMENSIONAL STOCHASTIC
More informationInvestigation on Application of Local Cluster Analysis and Part of Speech Tagging on Persian Text
Investigation on Application of Local Cluster Analysis and Part of Speech Tagging on Persian Text Amir Hossein Jadidinejad Mitra Mohtarami Hadi Amiri Computer Engineering Department, Islamic Azad University,
More informationGraph Classification in Heterogeneous
Title: Graph Classification in Heterogeneous Networks Name: Xiangnan Kong 1, Philip S. Yu 1 Affil./Addr.: Department of Computer Science University of Illinois at Chicago Chicago, IL, USA E-mail: {xkong4,
More informationBitextor s participation in WMT 16: shared task on document alignment
Bitextor s participation in WMT 16: shared task on document alignment Miquel Esplà-Gomis, Mikel L. Forcada Departament de Llenguatges i Sistemes Informàtics Universitat d Alacant, E-03690 Sant Vicent del
More informationText clustering based on a divide and merge strategy
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 55 (2015 ) 825 832 Information Technology and Quantitative Management (ITQM 2015) Text clustering based on a divide and
More informationAdaptive Web Sites: Conceptual Cluster Mining
Adaptive Web Sites: Conceptual Cluster Mining Mike Perkowitz Oren Etzioni Department of Computer Science and Engineering, Box 352350 University of Washington, Seattle, WA 98195 {map, etzioni}@s.washington.edu
More informationText Assisted Defence Information Extractor
International Journal of Computational Engineering Research Vol, 03 Issue, 6 Text Assisted Defence Information Extractor Nishant Kumar 1, Shikha Suman 2, Anubhuti Khera 3, Kanika Agarwal 4 1 Scientist
More informationijade Reporter An Intelligent Multi-agent Based Context Aware News Reporting System
ijade Reporter An Intelligent Multi-agent Based Context Aware Reporting System Eddie C.L. Chan and Raymond S.T. Lee The Department of Computing, The Hong Kong Polytechnic University, Hung Hong, Kowloon,
More informationMaSMT: A Multi-agent System Development Framework for English-Sinhala Machine Translation
MaSMT: A Multi-agent System Development Framework for English-Sinhala Machine Translation B. Hettige #1, A. S. Karunananda *2, G. Rzevski *3 # Department of Statistics and Computer Science, University
More informationExperiments on Patent Retrieval at NTCIR-4 Workshop
Working Notes of NTCIR-4, Tokyo, 2-4 June 2004 Exeriments on Patent Retrieval at NTCIR-4 Worksho Hironori Takeuchi Λ Naohiko Uramoto Λy Koichi Takeda Λ Λ Tokyo Research Laboratory, IBM Research y National
More informationSupport System- Pioneering approach for Web Data Mining
Support System- Pioneering approach for Web Data Mining Geeta Kataria 1, Surbhi Kaushik 2, Nidhi Narang 3 and Sunny Dahiya 4 1,2,3,4 Computer Science Department Kurukshetra University Sonepat, India ABSTRACT
More informationNavigation Retrieval with Site Anchor Text
Navigation Retrieval with Site Anchor Text Hideki Kawai Kenji Tateishi Toshikazu Fukushima NEC Internet Systems Research Labs. 8916-47, Takayama-cho, Ikoma-city, Nara, JAPAN {h-kawai@ab, k-tateishi@bq,
More informationCurriculum Vitae Lidong BING
Curriculum Vitae Lidong BING Senior Researcher, Tencent AI Lab Contact Information Address: Langke Building, High Tech South 6th Road, Nanshan District, Shenzhen city, China. 518000 Tel: +86-18018715657
More informationEvaluation of the Document Categorization in Fixed-point Observatory
Evaluation of the Document Categorization in Fixed-point Observatory Yoshihiro Ueda Mamiko Oka Katsunori Houchi Service Technology Development Department Fuji Xerox Co., Ltd. 3-1 Minatomirai 3-chome, Nishi-ku,
More informationTERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES
TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.
More informationA New Context Based Indexing in Search Engines Using Binary Search Tree
A New Context Based Indexing in Search Engines Using Binary Search Tree Aparna Humad Department of Computer science and Engineering Mangalayatan University, Aligarh, (U.P) Vikas Solanki Department of Computer
More information[Gidhane* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116
IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY AN EFFICIENT APPROACH FOR TEXT MINING USING SIDE INFORMATION Kiran V. Gaidhane*, Prof. L. H. Patil, Prof. C. U. Chouhan DOI: 10.5281/zenodo.58632
More informationMiddleware for Ubiquitous Computing
Middleware for Ubiquitous Computing Software Testing for Mobile Computing National Institute of Informatics Ichiro Satoh Abstract When a portable computing device is moved into and attached to a new local
More informationIE in Context. Machine Learning Problems for Text/Web Data
Machine Learning Problems for Text/Web Data Lecture 24: Document and Web Applications Sam Roweis Document / Web Page Classification or Detection 1. Does this document/web page contain an example of thing
More informationWeb. The Discovery Method of Multiple Web Communities with Markov Cluster Algorithm
Markov Cluster Algorithm Web Web Web Kleinberg HITS Web Web HITS Web Markov Cluster Algorithm ( ) Web The Discovery Method of Multiple Web Communities with Markov Cluster Algorithm Kazutami KATO and Hiroshi
More informationKnowledge-based Word Sense Disambiguation using Topic Models Devendra Singh Chaplot
Knowledge-based Word Sense Disambiguation using Topic Models Devendra Singh Chaplot Ruslan Salakhutdinov Word Sense Disambiguation Word sense disambiguation (WSD) is defined as the problem of computationally
More information