n-gram A Large-scale N-gram Search System Using Inverted Files Susumu Yata, 1, 2 Kazuhiro Morita, 1 Masao Fuketa 1 and Jun-ichi Aoe 1 TSUBAKI 2)
|
|
- Kellie Hill
- 6 years ago
- Views:
Transcription
1 n-gram 1, n-gram n-gram A Large-scale N-gram Search System Using Inverted Files Susumu Yata, 1, 2 Kazuhiro Morita, 1 Masao Fuketa 1 and Jun-ichi Aoe 1 Recently, a search system has been proposed for large-scale n-gram data. The system uses tries for indexing n-gram data because of its retrieval performance. However, the disk usage and the construction cost of the trie index are very large. In contrast, inverted files could be easily constructed but require random access to secondary storage in retrieval. Therefore, this paper reports the actual performance of inverted files for large-scale n-gram data. 1. 1) 1 University of Tokushima 2 JSPS Research Fellow TSUBAKI 2) Wikipedia Google n-gram 3),4) n-gram 5) 7) 8) 9) 10) Document Frequency DF 11) n-gram n-gram 100 GB n-gram 1-gram 6) MySQL 9) n-gram Minimal Prefix MP 12) 13),14) 1-gram n-gram k-gram k C k/ gram 4) GB 15) n-gram n-gram 2 3 n-gram 4 1 c 2009 Information Processing Society of Japan
2 1 D Table 1 Word segmentation of documents in D ID Document Segmented words (separated by / ) 1 <S> / / / / / / </S> 2 <S> / / / / / / </S> 3 <S> / / / / / / </S> ID DocID 1 </S> 1, 2, 3 2 <S> 1, 2, 3 3 1, 2, 3 2 Table 2 An inverted index using segmented words ID DocID 4 1, 2, 3 5 1, 2, 3 6 1, 2, ID DocID n-gram D = {d 1, d 2, d 3 } ID ID DocID 2 ID ID ID DocID 2 DocID 1, 2, 3 DocID 1 d 1 DocID Variable Byte Code DocID 15) 2.2 n-gram n-gram 4) 1-gram 3 D 2-gram Table 3 2-gram data extracted from documents in D ID ID Frequency 1 2, 6 <S> 3 2 3, 1 </S> 3 3 4, , , 7 1 ID ID Frequency 6 5, , , , , 4 1 n-gram D 2-gram 3 1-gram n-gram DocID n-gram n-gram n-gram n-gram 3 Solid State Disk SSD 4 3. n-gram 3.1 n-gram n-gram 2 c 2009 Information Processing Society of Japan
3 n-gram 40% ( 1 ) 1-gram ID ( 2 ) n-gram ID ( 3 ) n-gram n-gram 1 n-gram ( i ) Dictionary lookup : ID ( ii ) Index lookup : n-gram ( iii ) N-gram lookup : n-gram ( iv ) Decode : n-gram Index lookup n-gram n-gram 1 N-gram database : n-gram n-gram Inverted index : N-gram database n-gram ID ID dictionary : ID ID 16) ID 3.2 n-gram n-gram n-gram n-gram Query Inverted index Fig. 1 Dictionary lookup ID Index lookup dictionary N-gram addresses Encoded n-grams N-gram lookup Decode 1 n-gram An n-gram search system using an inverted index N-grams N-gram database Search result n-gram DocID n-gram 1-gram n-gram n-gram n-gram 3 n-gram n-gram 2 1 Index lookup N-gram lookup Database lookup Database lookup : n-gram 2 Inverted database 1 Inverted Index N-gram database Inverted database : n-gram ID ID 3 c 2009 Information Processing Society of Japan
4 4 n-gram kb Query Dictionary lookup ID dictionary Database lookup Decode Encoded n-grams N-grams Search result Table 4 Disk usage of large-scale n-gram search systems (kb) Dic Inverted Idx N-gram DB Inverted DB Total (a) Uncompressed 76,818 29,757, ,221, ,055,762 (b) Compressed 148,126 26,969,347 29,033, ,151,370 (c) Extended 148, ,720, ,720,463 Fig. 2 Inverted database 2 n-gram An n-gram search system using an inverted database 5 Table 5 Average, median and maximum search time (s) 1-gram 3-gram 5-gram Ave Med Max Ave Med Max Ave Med Max (a) hdd (b) hdd (b ) ssd (c) hdd n-gram n-gram n-gram n-gram 64-bit Linux Google n-gram 4) 1-gram 7-gram 32 n-gram 3 4 ( a ) Uncompressed : n-gram ( b ) Compressed : n-gram ( c ) Extended : n-gram 1-gram, 3-gram, 5-gram 1,000 Core 2 Quad 2.83 GHz, 8 GB RAM, 1.5 TB HDDx2 (RAID 0), 256 GB SSD 24 HDD Compressed SSD SSD b SSD b n-gram SSD a b n-gram 40 % % b b HDD SSD % SSD b c b 3 4 c 2009 Information Processing Society of Japan
5 10 3 Fig. 3 Inverted indices 10-4 (a) Uncompressed (b ) Compressed Inverted database 10-4 (c) Extended gram n-gram Dependencies between the number of results and the search time for unigram queries Inverted indices (a) Uncompressed (b ) Compressed 10 3 Fig. 4 Inverted database (c) Extended gram n-gram Dependencies between the number of results and the search time for 5-gram queries c b 1/10 b n-gram c HDD SSD b 1-gram n-gram 3 a b n-gram c n-gram 5-gram 4 3 a b n-gram 13) 6 Table 6 6 Differences between an inverted database and a trie index 13) 1-gram 7-gram 32 7-gram GB 490 GB 1 GB 24 4 GB gram AND gram AND SSD n-gram 5. n-gram 5 c 2009 Information Processing Society of Japan
6 n-gram 13) 1) Christopher D. Manning and Hinrich Schütze. Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge, Massachusetts, London, England, 6 edition, ) Keiji Shinzato, Tomohide Shibata, Daisuke Kawahara, Chikara Hashimoto, and Sadao Kurohashi. TSUBAKI: An Open Search Engine Infrastructure for Developing New Information Access Methodology. In IJCNLP2008, pp , Hyderabad, India, January ) Thorsten Brants and Alex Franz. Web 1T 5-gram Version 1. upenn.edu/catalog/catalogentry.jsp?catalogid=ldc2006t13, ),. Web N 1. GSK2007-C/catalog.html, ) Andrew Finch, Etienne Denoual, Hideo Okuma, Michael Paul, Hirofumi Yamamoto, Keiji Yasuda, Ruiqiang Zhang, and Eiichiro Sumita. The NICT/ATR Speech Translation System for IWSLT In IWSLT 2007, pp , Trento, Italy, October ) Marcello Federico and Mauro Cettolo. Efficient Handling of N-gram Language Models for Statistical Machine Translation. In the 2nd Workshop on Statistical Machine Translation, pp , Prague, Czech Republic, June ) Jing Zheng, Wen Wang, and Necip Fazil Ayan. Development of SRI s Translation Systems for Broadcast News and Broadcast Conversations. In Interspeech 2008, Brisbane, Australia, September ) Ghinwa F. Choueiter, Stephanie Seneff, and James R. Glass. New Word Acquisition Using Subword Modeling. In Interspeech 2007, pp , Antwerp, Belgium, August ) Andrew Carlson and Ian Fette. Memory-Based Context-Sensitive Spelling Correction at Web Scale. In ICMLA 2007, pp , Cincinnati, Ohio, USA, December ) Steven Bethard and James H. Martin. Learning Semantic Links from a Corpus of Parallel Temporal and Causal Relations. In ACL-08: HLT, Short Papers (Companion Volume), pp , Columbus, Ohio, USA, June ) Martin Klein and Michael L. Nelson. Correlation of Count and Document Frequency for Google N-Grams. In ECIR 2009, pp , Toulouse, France, April ) John A. Dundas III. Implementing Dynamic Minimal Prefix Tries. Software Practice and Experience, Vol.21, No.10, pp , October ). N Google 7. 14, pp , March ) Satoshi Sekine. A Linguistic Knowledge Discovery Tool: Very Large Ngram Database Search with Arbitrary Wildcards. In Coling 2008: Companion volume: Demonstrations, pp , Manchester, UK, August ) Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. Introduction to Information Retrieval. Cambridge University Press, ).., Vol. J71 D, No.9, pp , c 2009 Information Processing Society of Japan
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department
More informationA Retrieval Method for Double Array Structures by Using Byte N-Gram
A Retrieval Method for Double Array Structures by Using Byte N-Gram Masao Fuketa, Kazuhiro Morita, and Jun-Ichi Aoe Abstract Retrieving keywords requires speed and compactness. A trie is one of the data
More informationSummarizing Search Results using PLSI
Summarizing Search Results using PLSI Jun Harashima and Sadao Kurohashi Graduate School of Informatics Kyoto University Yoshida-honmachi, Sakyo-ku, Kyoto, 606-8501, Japan {harashima,kuro}@nlp.kuee.kyoto-u.ac.jp
More informationComparisons of Efficient Implementations for DAWG
Comparisons of Efficient Implementations for DAWG Masao Fuketa, Kazuhiro Morita, and Jun-ichi Aoe Abstract Key retrieval is very important in various applications. A trie and DAWG are data structures for
More informationExperimental Observations of Construction Methods for Double Array Structures Using Linear Functions
Experimental Observations of Construction Methods for Double Array Structures Using Linear Functions Shunsuke Kanda*, Kazuhiro Morita, Masao Fuketa, Jun-Ichi Aoe Department of Information Science and Intelligent
More informationImprovement of Web Search Results using Genetic Algorithm on Word Sense Disambiguation
Volume 3, No.5, May 24 International Journal of Advances in Computer Science and Technology Pooja Bassin et al., International Journal of Advances in Computer Science and Technology, 3(5), May 24, 33-336
More informationTSUBAKI: An Open Search Engine Infrastructure for Developing New Information Access Methodology
TSUBAKI: An Open Search Engine Infrastructure for Developing New Information Access Methodology Keiji Shinzato, Tomohide Shibata, Daisuke Kawahara, Chikara Hashimoto and Sadao Kurohashi Graduate School
More informationEfficient Data Structures for Massive N-Gram Datasets
Efficient ata Structures for Massive N-Gram atasets Giulio Ermanno Pibiri University of Pisa and ISTI-CNR Pisa, Italy giulio.pibiri@di.unipi.it Rossano Venturini University of Pisa and ISTI-CNR Pisa, Italy
More informationIntroduction to Text Mining. Hongning Wang
Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:
More informationEfficiency. Efficiency: Indexing. Indexing. Efficiency Techniques. Inverted Index. Inverted Index (COSC 488)
Efficiency Efficiency: Indexing (COSC 488) Nazli Goharian nazli@cs.georgetown.edu Difficult to analyze sequential IR algorithms: data and query dependency (query selectivity). O(q(cf max )) -- high estimate-
More informationBasic techniques. Text processing; term weighting; vector space model; inverted index; Web Search
Basic techniques Text processing; term weighting; vector space model; inverted index; Web Search Overview Indexes Query Indexing Ranking Results Application Documents User Information analysis Query processing
More informationPushing the Limits. ADSM Symposium Sheelagh Treweek September 1999 Oxford University Computing Services 1
Pushing the Limits ADSM Symposium Sheelagh Treweek sheelagh.treweek@oucs.ox.ac.uk September 1999 Oxford University Computing Services 1 Overview History of ADSM services at Oxford October 1995 - started
More informationAKIKO MANADA. The University of Electro-Communications 1-5-1, Chofugaoka, Chofu, Tokyo, , JAPAN
Curriculum Vitæ AKIKO MANADA The University of Electro-Communications 1-5-1, Chofugaoka, Chofu, Tokyo, 182-8585, JAPAN Email: amanada@uec.ac.jp WORK EXPERIENCE Assistant Professor: February 2012 Present
More informationTemporal Information Access (Temporalia-2)
Temporal Information Access (Temporalia-2) NTCIR-12 Core Task http://ntcirtemporalia.github.io Hideo Joho (University of Tsukuba) Adam Jatowt (Kyoto University) Roi Blanco (Yahoo! Research) Haitao Yu (University
More informationInformation Retrieval. Lecture 3 - Index compression. Introduction. Overview. Characterization of an index. Wintersemester 2007
Information Retrieval Lecture 3 - Index compression Seminar für Sprachwissenschaft International Studies in Computational Linguistics Wintersemester 2007 1/ 30 Introduction Dictionary and inverted index:
More informationFaster and Smaller N-Gram Language Models
Faster and Smaller N-Gram Language Models Adam Pauls Dan Klein Computer Science Division University of California, Berkeley {adpauls,klein}@csberkeleyedu Abstract N-gram language models are a major resource
More informationCS 102. Big Data. Spring Big Data Platforms
CS 102 Big Data Spring 2016 Big Data Platforms How Big is Big? The Data Data Sets 1000 2 (5.3 MB) Complete works of Shakespeare (text) 1000 3 (~5-500 GB) Your data 1000 4 (10 TB) Library of Congress (text)
More informationEECS 395/495 Lecture 3 Scalable Indexing, Searching, and Crawling
EECS 395/495 Lecture 3 Scalable Indexing, Searching, and Crawling Doug Downey Based partially on slides by Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze Announcements Project progress report
More informationHITSZ-ICRC: An Integration Approach for QA TempEval Challenge
HITSZ-ICRC: An Integration Approach for QA TempEval Challenge Yongshuai Hou, Cong Tan, Qingcai Chen and Xiaolong Wang Department of Computer Science and Technology Harbin Institute of Technology Shenzhen
More informationStoring the Web in Memory: Space Efficient Language Models with Constant Time Retrieval
Storing the Web in Memory: Space Efficient Language Models with Constant Time Retrieval David Guthrie Computer Science Department University of Sheffield D.Guthrie@dcs.shef.ac.uk Mark Hepple Computer Science
More informationIntroduction to Information Retrieval (Manning, Raghavan, Schutze)
Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 3 Dictionaries and Tolerant retrieval Chapter 4 Index construction Chapter 5 Index compression Content Dictionary data structures
More informationDepartment of Information, Evidence & Research. International Clinical Trials Registry Platform. ICTRP Search Portal Revisions Document
Department of Information, Evidence & Research International Clinical Trials Registry Platform ICTRP Search Portal Revisions Document Created on 18/02/2009 Last update on 11/1/2018 Table of contents Introduction..
More informationLattice Rescoring for Speech Recognition Using Large Scale Distributed Language Models
Lattice Rescoring for Speech Recognition Using Large Scale Distributed Language Models ABSTRACT Euisok Chung Hyung-Bae Jeon Jeon-Gue Park and Yun-Keun Lee Speech Processing Research Team, ETRI, 138 Gajeongno,
More informationA Compression Technique Based On Optimality Of LZW Code (OLZW)
2012 Third International Conference on Computer and Communication Technology A Compression Technique Based On Optimality Of LZW (OLZW) Utpal Nandi Dept. of Comp. Sc. & Engg. Academy Of Technology Hooghly-712121,West
More informationNatural Language Processing
Natural Language Processing Language Models Language models are distributions over sentences N gram models are built from local conditional probabilities Language Modeling II Dan Klein UC Berkeley, The
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval WS 2008/2009 25.11.2008 Information Systems Group Mohammed AbuJarour Contents 2 Basics of Information Retrieval (IR) Foundations: extensible Markup Language (XML)
More informationQuickSpecs. PCIe Solid State Drives for HP Workstations
Introduction Storage technology with NAND media is outgrowing the bandwidth limitations of the SATA bus. New high performance Storage solutions will connect directly to the PCIe bus for revolutionary performance
More informationDashboard. 16 Jun Jun 2009 Comparing to: 16 Jun Jun % Bounce Rate. 62,921 Visits. 407,003 Page Views
JAWA CZ Owners Club Dashboard 16 Jun 2008-16 Jun 2009 Comparing to: 16 Jun 2007-15 Jun 2008 Previous: Visits Visits 400 400 200 200 16 June 2008 19 July 2008 21 August 2008 23 September 26 October 200
More informationCompression of the Stream Array Data Structure
Compression of the Stream Array Data Structure Radim Bača and Martin Pawlas Department of Computer Science, Technical University of Ostrava Czech Republic {radim.baca,martin.pawlas}@vsb.cz Abstract. In
More informationKyoto-NMT: a Neural Machine Translation implementation in Chainer
Kyoto-NMT: a Neural Machine Translation implementation in Chainer Fabien Cromières Japan Science and Technology Agency Kawaguchi-shi, Saitama 332-0012 fabien@pa.jst.jp Abstract We present Kyoto-NMT, an
More informationMultilingual Information Retrieval
Proposal for Tutorial on Multilingual Information Retrieval Proposed by Arjun Atreya V Shehzaad Dhuliawala ShivaKarthik S Swapnil Chaudhari Under the direction of Prof. Pushpak Bhattacharyya Department
More informationInformation Retrieval II
Information Retrieval II David Hawking 30 Sep 2010 Machine Learning Summer School, ANU Session Outline Ranking documents in response to a query Measuring the quality of such rankings Case Study: Tuning
More informationInformation Retrieval and Organisation
Information Retrieval and Organisation Dell Zhang Birkbeck, University of London 2015/16 IR Chapter 04 Index Construction Hardware In this chapter we will look at how to construct an inverted index Many
More informationGUJARAT TECHNOLOGICAL UNIVERSITY
GUJARAT TECHNOLOGICAL UNIVERSITY INFORMATION TECHNOLOGY DATA COMPRESSION AND DATA RETRIVAL SUBJECT CODE: 2161603 B.E. 6 th SEMESTER Type of course: Core Prerequisite: None Rationale: Data compression refers
More informationHow SPICE Language Modeling Works
How SPICE Language Modeling Works Abstract Enhancement of the Language Model is a first step towards enhancing the performance of an Automatic Speech Recognition system. This report describes an integrated
More informationSpoken Document Retrieval (SDR) for Broadcast News in Indian Languages
Spoken Document Retrieval (SDR) for Broadcast News in Indian Languages Chirag Shah Dept. of CSE IIT Madras Chennai - 600036 Tamilnadu, India. chirag@speech.iitm.ernet.in A. Nayeemulla Khan Dept. of CSE
More informationCALENDAR OF FILING DEADLINES AND SEC HOLIDAYS
CALENDAR OF FILING S AND SEC HOLIDAYS INFORMATION IN THIS CALENDAR HAS BEEN OBTAINED BY SOURCES BELIEVED TO BE RELIABLE, BUT CANNOT BE GUARANTEED FOR ACCURACY. PLEASE CONSULT WITH PROFESSIONAL COUNSEL
More informationDistributed Implementation of a Self-Organizing. Appliance Middleware
Distributed Implementation of a Self-Organizing Appliance Middleware soc-eusai 2005 Conference of Smart Objects & Ambient Intelligence October 12th-14th 2005 Grenoble, France Oral Session 6 - Middleware
More informationSSD. DEIM Forum 2014 D8-6 SSD I/O I/O I/O HDD SSD I/O
DEIM Forum 214 D8-6 SSD, 153 855 4-6-1 135 8548 3-7-5 11 843 2-1-2 E-mail: {keisuke,haya,yokoyama,kitsure}@tkl.iis.u-tokyo.ac.jp, miyuki@sic.shibaura-it.ac.jp SSD SSD HDD 1 1 I/O I/O I/O I/O,, OLAP, SSD
More informationInformation Retrieval and Organisation
Information Retrieval and Organisation Dell Zhang Birkbeck, University of London 2016/17 IR Chapter 00 Motivation What is Information Retrieval? The meaning of the term Information Retrieval (IR) can be
More informationQuickSpecs. PCIe Solid State Drives for HP Workstations
Solid State Drives for HP Workstations Overview Solid State Drives for HP Workstations Introduction Storage technology with NAND media is outgrowing the bandwidth limitations of the SATA bus. New high
More informationInformation Extraction of Important International Conference Dates using Rules and Regular Expressions
ISBN 978-93-84422-80-6 17th IIE International Conference on Computer, Electrical, Electronics and Communication Engineering (CEECE-2017) Pattaya (Thailand) Dec. 28-29, 2017 Information Extraction of Important
More informationOpenSFS Test Cluster Donation. Dr. Nihat Altiparmak, Assistant Professor Computer Engineering & Computer Science Department University of Louisville
OpenSFS Test Cluster Donation Dr. Nihat Altiparmak, Assistant Professor Computer Engineering & Computer Science Department University of Louisville 4/24/2018 Donation Details Jan 5, 2018 OpenSFS announced
More informationWorkshops. 1. SIGMM Workshop on Social Media. 2. ACM Workshop on Multimedia and Security
1. SIGMM Workshop on Social Media SIGMM Workshop on Social Media is a workshop in conjunction with ACM Multimedia 2009. With the growing of user-centric multimedia applications in the recent years, this
More informationCompressing and Decoding Term Statistics Time Series
Compressing and Decoding Term Statistics Time Series Jinfeng Rao 1,XingNiu 1,andJimmyLin 2(B) 1 University of Maryland, College Park, USA {jinfeng,xingniu}@cs.umd.edu 2 University of Waterloo, Waterloo,
More information2009. October. Semiconductor Business SAMSUNG Electronics
2009. October Semiconductor Business SAMSUNG Electronics Why SSD performance is faster than HDD? HDD has long latency & late seek time due to mechanical operation SSD does not have both latency and seek
More informationIndex extensions. Manning, Raghavan and Schütze, Chapter 2. Daniël de Kok
Index extensions Manning, Raghavan and Schütze, Chapter 2 Daniël de Kok Today We will discuss some extensions to the inverted index to accommodate boolean processing: Skip pointers: improve performance
More informationThe i-visto Gateway XG Uncompressed HDTV Multiple Transmission Technology for 10-Gbit/s Networks
The Gateway XG Uncompressed HDTV Multiple Transmission Technology for Networks Takeaki Mochida, Tetsuo Kawano, Tsuyoshi Ogura, and Keiji Harada Abstract The is a new product for the (Internet video studio
More informationA Space-Efficient Phrase Table Implementation Using Minimal Perfect Hash Functions
A Space-Efficient Phrase Table Implementation Using Minimal Perfect Hash Functions Marcin Junczys-Dowmunt Faculty of Mathematics and Computer Science Adam Mickiewicz University ul. Umultowska 87, 61-614
More informationLightweight Client-Side Chinese/Japanese Morphological Analyzer Based on Online Learning
Lightweight Client-Side Chinese/Japanese Morphological Analyzer Based on Online Learning Masato Hagiwara Satoshi Sekine Rakuten Institute of Technology, New York 215 Park Avenue South, New York, NY {masato.hagiwara,
More informationProcessing Posting Lists Using OpenCL
Processing Posting Lists Using OpenCL Advisor Dr. Chris Pollett Committee Members Dr. Thomas Austin Dr. Sami Khuri By Radha Kotipalli Agenda Project Goal About Yioop Inverted index Compression Algorithms
More informationINFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from
INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Schütze s, linked from http://informationretrieval.org/ IR 6: Index Compression Paul Ginsparg Cornell University, Ithaca, NY 15 Sep
More informationMarcello Federico MMT Srl / FBK Trento, Italy
Marcello Federico MMT Srl / FBK Trento, Italy Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 Page 207 Symbiotic Human and Machine Translation
More informationAdvance Indexing. Limock July 3, 2014
Advance Indexing Limock July 3, 2014 1 Papers 1) Gurajada, Sairam : "On-line index maintenance using horizontal partitioning." Proceedings of the 18th ACM conference on Information and knowledge management.
More informationIntroduction & Administrivia
Introduction & Administrivia Information Retrieval Evangelos Kanoulas ekanoulas@uva.nl Section 1: Unstructured data Sec. 8.1 2 Big Data Growth of global data volume data everywhere! Web data: observation,
More informationProceedings of the Meeting & workshop on Development of a National IT Strategy Focusing on Indigenous Content Development
Ministry of Science, Research & Technology Iranian Information & Documentation Center (Research Center) Proceedings of the Meeting & workshop on Development of a National IT Strategy Focusing on Indigenous
More informationMitel for Microsoft Dynamics CRM Client V5 Release Notes
Mitel for Microsoft Dynamics CRM Client V5 Release Notes February 08, 2018. Mitel for Microsoft Dynamics CRM Client V5 Release Notes Description: This Application Note Consists of the dates and version
More informationAdaptive String Dictionary Compression in In-Memory Column-Store Database Systems
Adaptive String Dictionary Compression in In-Memory Column-Store Database Systems Ingo Müller, Cornelius Ratsch, Franz Faerber / KIT / SAP AG March 26, 2014 EDBT, Athens, Greece Motivation: Column-Store
More informationCS506/606 - Topics in Information Retrieval
CS506/606 - Topics in Information Retrieval Instructors: Class time: Steven Bedrick, Brian Roark, Emily Prud hommeaux Tu/Th 11:00 a.m. - 12:30 p.m. September 25 - December 6, 2012 Class location: WCC 403
More informationData Block. Data Block. Copy A B C D P HDD 0 HDD 1 HDD 2 HDD 3 HDD 4 HDD 0 HDD 1
RAID Network RAID File System 1) Takashi MATSUMOTO 1) ( 101-8430 2{1{2 E-mail:tmatsu@nii.ac.jp) ABSTRACT. The NRFS is a brand-new kernel-level subsystem for a low-cost distributed le system with fault-tolerant
More informationExample. Section: PS 709 Examples of Calculations of Reduced Hours of Work Last Revised: February 2017 Last Reviewed: February 2017 Next Review:
Following are three examples of calculations for MCP employees (undefined hours of work) and three examples for MCP office employees. Examples use the data from the table below. For your calculations use
More informationNetwork Traffic Reduction by Hypertext Compression
Network Traffic Reduction by Hypertext Compression BEN CHOI & NAKUL BHARADE Computer Science, College of Engineering and Science Louisiana Tech University, Ruston, LA 71272, USA pro@benchoi.org Abstract.
More informationShrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent
More informationA Combined Semi-Pipelined Query Processing Architecture For Distributed Full-Text Retrieval
A Combined Semi-Pipelined Query Processing Architecture For Distributed Full-Text Retrieval Simon Jonassen and Svein Erik Bratsberg Department of Computer and Information Science Norwegian University of
More informationAustralian Journal of Basic and Applied Sciences
ISSN:1991-8178 Australian Journal of Basic and Applied Sciences Journal home page: www.ajbasweb.com A Review on Raid Levels Implementation and Comparisons P. Sivakumar and K. Devi Department of Computer
More informationA Statistical Study of ANY Resource Record Based DNS Query Request Packet Traffic
A Statistical Study of ANY Resource Record Based DNS Query Request Packet Traffic A Statistical Study of ANY Resource Record Based DNS Query Request Packet Traffic Yasuo Musashi,* Yuto Takeda,** Nobuhiro
More informationIntroduction to Information Retrieval. Hongning Wang
Introduction to Information Retrieval Hongning Wang CS@UVa What is information retrieval? 2 Why information retrieval Information overload It refers to the difficulty a person can have understanding an
More informationDNS Query Access and Backscattering SMTP Distributed Denial-of-Service Attack
DNS Query Access and Backscattering SMTP Distributed Denial-of-Service Attack Yasuo Musashi, Ryuichi Matsuba, and Kenichi Sugitani Center for Multimedia and Information Technologies, Kumamoto University,
More informationRecord Placement Based on Data Skew Using Solid State Drives
BPOE-5 Record Placement Based on Data Skew Using Solid State Drives Jun Suzuki 1,2, Shivaram Venkataraman 2, Sameer Agarwal 2, Michael Franklin 2, and Ion Stoica 2 1 Green Platform Research Laboratories,
More informationPI SERVER 2012 Do. More. Faster. Now! Copyri g h t 2012 OSIso f t, LLC.
PI SERVER 2012 Do. More. Faster. Now! Copyri g h t 2012 OSIso f t, LLC. AUGUST 7, 2007 APRIL 14, 2010 APRIL 24, 2012 Copyri g h t 2012 OSIso f t, LLC. 2 PI SERVER 2010 PERFORMANCE 2010 R3 Max Point Count
More informationWhite Paper. File System Throughput Performance on RedHawk Linux
White Paper File System Throughput Performance on RedHawk Linux By: Nikhil Nanal Concurrent Computer Corporation August Introduction This paper reports the throughput performance of the,, and file systems
More informationStabilizing Minimum Error Rate Training
Stabilizing Minimum Error Rate Training George Foster and Roland Kuhn National Research Council Canada first.last@nrc.gc.ca Abstract The most commonly used method for training feature weights in statistical
More informationPersonalized Search for TV Programs Based on Software Man
Personalized Search for TV Programs Based on Software Man 12 Department of Computer Science, Zhengzhou College of Science &Technology Zhengzhou, China 450064 E-mail: 492590002@qq.com Bao-long Zhang 3 Department
More informationIterative CKY parsing for Probabilistic Context-Free Grammars
Iterative CKY parsing for Probabilistic Context-Free Grammars Yoshimasa Tsuruoka and Jun ichi Tsujii Department of Computer Science, University of Tokyo Hongo 7-3-1, Bunkyo-ku, Tokyo 113-0033 CREST, JST
More informationCorso di Biblioteche Digitali
Corso di Biblioteche Digitali Vittore Casarosa casarosa@isti.cnr.it tel. 050-315 3115 cell. 348-397 2168 Ricevimento dopo la lezione o per appuntamento Valutazione finale 70-75% esame orale 25-30% progetto
More informationindex construct Overview Overview Recap How to construct index? Introduction Index construction Introduction to Recap
to to Information Retrieval Index Construct Ruixuan Li Huazhong University of Science and Technology http://idc.hust.edu.cn/~rxli/ October, 2012 1 2 How to construct index? Computerese term document docid
More informationNew Concept based Indexing Technique for Search Engine
Indian Journal of Science and Technology, Vol 10(18), DOI: 10.17485/ijst/2017/v10i18/114018, May 2017 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 New Concept based Indexing Technique for Search
More informationData acquisition system of COMPASS experiment - progress and future plans
Data acquisition system of COMPASS experiment - progress and future plans Faculty of Nuclear Sciences and Physical Engineering Czech Technical University in Prague & CERN COMPASS experiment COMPASS experiment
More informationDatabase Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu
Database Architecture 2 & Storage Instructor: Matei Zaharia cs245.stanford.edu Summary from Last Time System R mostly matched the architecture of a modern RDBMS» SQL» Many storage & access methods» Cost-based
More informationIATF Stakeholder Conference
IATF Stakeholder Conference 13 September 2017 Oberursel, Germany Rüdiger Funke (BMW Group) Number of certified sites against ISO/TS 16949 (and IATF 16949) 70,000 60,000 50,000 40,000 30,000 30,156 50,071
More informationTernary Tree Tree Optimalization for n-gram for n-gram Indexing
Ternary Tree Tree Optimalization for n-gram for n-gram Indexing Indexing Daniel Robenek, Jan Platoš, Václav Snášel Department Daniel of Computer Robenek, Science, Jan FEI, Platoš, VSB Technical Václav
More informationText Analytics. Index-Structures for Information Retrieval. Ulf Leser
Text Analytics Index-Structures for Information Retrieval Ulf Leser Content of this Lecture Inverted files Storage structures Phrase and proximity search Building and updating the index Using a RDBMS Ulf
More informationYear 10 OCR GCSE Computer Science (9-1)
01 4 th September 02 11 th September 03 18 th September Half Term 1 04 25 th September 05 2 nd October 06 9 th October 07 16 th October NA Students on in school Thursday PM and Friday Only Unit 1, Lesson
More informationResearch Topics in Information Retrieval
Research Topics in Information Retrieval Cristina Ribeiro Sérgio Nunes FEUP / INESC TEC Information Systems Research Group http://infolab.fe.up.pt Information Retrieval "Information retrieval (IR) is finding
More informationGTRC Hosting Infrastructure Reports
GTRC Hosting Infrastructure Reports GTRC 2012 1. Description - The Georgia Institute of Technology has provided a data hosting infrastructure to support the PREDICT project for the data sets it provides.
More informationIndex Construction Introduction to Information Retrieval INF 141 Donald J. Patterson
Index Construction Introduction to Information Retrieval INF 141 Donald J. Patterson Content adapted from Hinrich Schütze http://www.informationretrieval.org Index Construction Overview Introduction Hardware
More informationService withdrawal: Selected IBM ServicePac offerings
Announcement ZS09-0086, dated April 21, 2009 Service withdrawal: Selected IBM offerings Table of contents 1 Overview 9 Announcement countries 8 Withdrawal date Overview Effective April 21, 2009, IBM will
More informationFrom The European Library to The European Digital Library. Jill Cousins Inforum, Prague, May 2007
From The European Library to The European Digital Library Jill Cousins Inforum, Prague, May 2007 Timeline Past to Present Started as TEL a project funded by the EU and led by The British Library now fully
More informationAdaptec MaxIQ SSD Cache Performance Solution for Web Server Environments Analysis
Adaptec MaxIQ SSD Cache Performance Solution for Web Server Environments Analysis September 22, 2009 Page 1 of 7 Introduction Adaptec has requested an evaluation of the performance of the Adaptec MaxIQ
More informationIntroduction to. CS276: Information Retrieval and Web Search Christopher Manning and Prabhakar Raghavan. Lecture 4: Index Construction
Introduction to Information Retrieval CS276: Information Retrieval and Web Search Christopher Manning and Prabhakar Raghavan Lecture 4: Index Construction 1 Plan Last lecture: Dictionary data structures
More informationA Comparison of Memory Usage and CPU Utilization in Column-Based Database Architecture vs. Row-Based Database Architecture
A Comparison of Memory Usage and CPU Utilization in Column-Based Database Architecture vs. Row-Based Database Architecture By Gaurav Sheoran 9-Dec-08 Abstract Most of the current enterprise data-warehouses
More informationQuickSpecs. PCIe Solid State Drives for HP Workstations
Overview Introduction Storage technology with NAND media is outgrowing the bandwidth limitations of the SATA bus. New high performance Storage solutions will connect directly to the PCIe bus for revolutionary
More informationDawood Public School Computer Studies Course Outline for Class V
Dawood Public School Computer Studies Course Outline for 2017-2018 Class V Course book- Keyboard Computer Science with Application Software (V) Second edition (Oxford University Press) Month wise distribution
More informationCMMI Version 1.2. Josh Silverman Northrop Grumman
CMMI Version 1.2 Josh Silverman Northrop Grumman Topics The Concept of Maturity: Why CMMI? CMMI Overview/Aspects Version 1.2 Changes Sunsetting of Version 1.1 Training Summary The Concept of Maturity:
More informationTEXT PREPROCESSING FOR TEXT MINING USING SIDE INFORMATION
TEXT PREPROCESSING FOR TEXT MINING USING SIDE INFORMATION Ms. Nikita P.Katariya 1, Prof. M. S. Chaudhari 2 1 Dept. of Computer Science & Engg, P.B.C.E., Nagpur, India, nikitakatariya@yahoo.com 2 Dept.
More informationBefore you install a product, verify that no empty directory exists with the same name as the installation directory.
Informatica Identity Resolution Version 9.5.3 Release Notes September 2013 Copyright (c) 1998-2013 Informatica Corporation. All rights reserved. Contents Installation... 1 Installing to a Subdirectory....
More informationCS290N Summary Tao Yang
CS290N Summary 2015 Tao Yang Text books [CMS] Bruce Croft, Donald Metzler, Trevor Strohman, Search Engines: Information Retrieval in Practice, Publisher: Addison-Wesley, 2010. Book website. [MRS] Christopher
More informationAssimilation process for shortened constables pay scale produced by Greater Manchester Police
Assimilation process for shortened constables pay scale produced by Greater Manchester Police The current constables pay scale of 11 points will be reduced by 3 points over a 2 year period to create a
More informationQuery Difficulty Prediction for Contextual Image Retrieval
Query Difficulty Prediction for Contextual Image Retrieval Xing Xing 1, Yi Zhang 1, and Mei Han 2 1 School of Engineering, UC Santa Cruz, Santa Cruz, CA 95064 2 Google Inc., Mountain View, CA 94043 Abstract.
More informationImproving Throughput in Cloud Storage System
Improving Throughput in Cloud Storage System Chanho Choi chchoi@dcslab.snu.ac.kr Shin-gyu Kim sgkim@dcslab.snu.ac.kr Hyeonsang Eom hseom@dcslab.snu.ac.kr Heon Y. Yeom yeom@dcslab.snu.ac.kr Abstract Because
More information