Automatic Text Processing

Size: px
Start display at page:

Download "Automatic Text Processing"

Transcription

1 Automatic Text Processing The Transformation, Analysis, and Retrieval of Information by Computer Gerard Salton Cornell University Technlsche Univerariat Darmstadt FACHBEREICH1NFORMATJK BIBLIOTHE.K Invented.: Sachgebteta: Standort: TT ADDISON-WESLEY PUBLISHING COMPANY Reading, Massachusetts Menlo Park, California New York Don Mills, Ontario Wokingham, England Amsterdam Bonn Sydney Singapore Tokyo Madrid San Juan

2 Contents 2 Part 1: The Information-Processing Environment 1 The Information Environment Automatic Information Processing Types of Information Text Processing Speech Processing Graphics Processing Semantic and Behavioral Processing 8 The Computer Environment Computer Architecture Large versus Small Machines Sequential versus Parallel Computing Multiprocessor and Multicomputer Configurations Types of Computers Storage Technology Input-Output and Peripheral Equipment 23 { Terminal Equipment Printing Equipment Document Input Computer Networks Integrated Computing Systems 36

3 The Automated Office The Office Environment Analyzing Office Systems File-management Systems System Characteristics Relational Database Systems Relational Data Manipulations Data Security, Integrity, and Recovery 60 3*4 Office Display Systems Office-Information Retrieval 63 Part 2: Word Processing and File Access 71 Text Editing and Formatting Introduction Approaches to Word Processing Text Editing and Formatting Typical Processing Systems Off-line Text-Editing Systems Interactive Graphics-Editing Systems Automatic Typesetting Systems Typesetting Systems Automatic Typefont Design 98 Text Compression Statistical Language Characteristics 105 <$ Frequency Considerations 105 2> Entropy Measurements Rationale for Text Compression Text-Compression Methods 114 ^ Special-purpose Compression Systems Basic Fixed-Length Codes Restricted Variable-length Codes 119 5J.4 Variable-length Codes Word-Fragment Encoding 125 Text Encryption Basic Cryptographic Concepts Conventional Cryptographic Systems Sample Cryptographic Ciphers The Data Encryption Standard (DES) 146 «8» 6.5 Ciphers Based on Computationally Difficult Problems 149

4 7 8 9 Automatic File-Accessing Systems Basic Concepts Single-Key Searching: Sequential Search Single-Key Indexed Searches Tree Searching Balanced Search Trees Multiway Search Trees Hash-Table Access Indexed Searches for Multikey Access Bitmap Encoding for Multikey Access Multidimensional Access Structures 216 Part 3: Information-Retrieval Systems 227 Conventional Text-Retrieval Systems Database Management and Information Retrieval Text Retrieval Using Inverted Indexing Methods Extensions of the Inverted Index Operations Distance Constraints Term Weights Synonym Specifications Term Truncation Typical File Organization Optimization of Inverted-List Procedures Reducing the Number of Index Terms Quorum-level Searches Partial List Searching Text-scanning Systems General Considerations Elementary String Matching Fast String Matching Hardware Aids to Text Searching 266 Indexing Indexing Environment Indexing Aims Single-term Indexing Theories Term-frequency Considerations Term-discrimination Value Probabilistic Term Weighting Term Relationships in Indexing Term-phrase Formation Thesaurus-Group Generation A Blueprint for Automatic Indexing 303

5 10 II Advanced Information-Retrieval Models The Vector Space Model Basic Vector-processing Model Vector Modifications Automatic Document Classification General Considerations Hierarchical Cluster Generation Heuristic Clustering Methods Cluster Searching Probabilistic Retrieval Model Extended Boolean Retrieval Model Fuzzy Set Extensions Extended Boolean System Integrated System for Processing Text and Data Advanced Interface Systems 365 Part 4. Text Analysis and Language Processing 375 Language Analysis and Understanding The Linguistic Approach Dictionary Operations Morphological Decomposition Dictionary Types Syntactic Analysis Typical Syntactic-Analysis Systems Semantic Grammars Knowledge-based Processing Knowledge Structures Prospects for Knowledge-based Processing Specialized Language Processing Robust Parsing Sublanguage Analysis Natural-Language Interface to Information Systems Automatic Text Transformations Text Transformations Automatic Writing Aids Automatic Spelling Checkers Automatic Spelling Correction Syntax and Style Checking 436

6 Automatic Abstracting Systems Automatic Extracting Abstracting Based on Text Understanding Automatic Text Generation 448 { Approaches to Text Generation 448 J* Typical Text-generation Systems Automatic Translation Main Approaches Typical Machine-translation Systems 461 Paperless Information Systems Paperless Processing Processing Complex Documents Graphics Processing Basic Display Systems Object Transformations Picture Recognition Speech Processing Speech Synthesis Speech Recognition Automatic Teleconferencing Systems Electronic Mail and Messages Electronic Information Services Teletext Videotex Electronic Publications and the Electronic Library 507 Author Index 517 Subject Index 523 XIII

Introductory logic and sets for Computer scientists

Introductory logic and sets for Computer scientists Introductory logic and sets for Computer scientists Nimal Nissanke University of Reading ADDISON WESLEY LONGMAN Harlow, England II Reading, Massachusetts Menlo Park, California New York Don Mills, Ontario

More information

Designing the User Interface

Designing the User Interface Designing the User Interface Strategies for Effective Human-Computer Interaction Second Edition Ben Shneiderman The University of Maryland Addison-Wesley Publishing Company Reading, Massachusetts Menlo

More information

Win32 Network Programming

Win32 Network Programming Win32 Network Programming Windows 95 and Windows NT Network Programming Using MFC Ralph Davis TT Addison-Wesley Developers Press Reading, Massachusetts Menlo Park, California New York Don Mills, Ontario

More information

An Introduction to Object-Oriented Programming

An Introduction to Object-Oriented Programming An Introduction to Object-Oriented Programming Timothy Budd Oregon State University TT Addison-Wesley Publishing Company Reading, Massachusetts Menlo Park, California New York Don Mills, Ontario Wokingham,

More information

Mathematica for Scientists and Engineers

Mathematica for Scientists and Engineers Mathematica for Scientists and Engineers Thomas B. Bahder Addison-Wesley Publishing Company Reading, Massachusetts Menlo Park, California New York Don Mills, Ontario Wokingham, England Amsterdam Bonn Paris

More information

COMPUTER AND ROBOT VISION

COMPUTER AND ROBOT VISION VOLUME COMPUTER AND ROBOT VISION Robert M. Haralick University of Washington Linda G. Shapiro University of Washington A^ ADDISON-WESLEY PUBLISHING COMPANY Reading, Massachusetts Menlo Park, California

More information

System BIOS for IBM PCs, Compatibles, and EISA Computers, Second Edition

System BIOS for IBM PCs, Compatibles, and EISA Computers, Second Edition TECHNICAL REFERENCE SERIES System BIOS for IBM PCs, Compatibles, and EISA Computers, Second Edition The Complete Guide to ROM-Based System Software PHOENIX TECHNOLOGIES LTD. J TT Addison-Wesley Publishing

More information

Data Structures in C++ Using the Standard Template Library

Data Structures in C++ Using the Standard Template Library Data Structures in C++ Using the Standard Template Library Timothy Budd Oregon State University ^ ADDISON-WESLEY An imprint of Addison Wesley Longman, Inc. Reading, Massachusetts Harlow, England Menlo

More information

Programming. In Ada JOHN BARNES TT ADDISON-WESLEY

Programming. In Ada JOHN BARNES TT ADDISON-WESLEY Programming In Ada 2005 JOHN BARNES... TT ADDISON-WESLEY An imprint of Pearson Education Harlow, England London New York Boston San Francisco Toronto Sydney Tokyo Singapore Hong Kong Seoul Taipei New Delhi

More information

COMPUTER AND ROBOT VISION

COMPUTER AND ROBOT VISION VOLUME COMPUTER AND ROBOT VISION Robert M. Haralick University of Washington Linda G. Shapiro University of Washington T V ADDISON-WESLEY PUBLISHING COMPANY Reading, Massachusetts Menlo Park, California

More information

THE DESIGN AND ANALYSIS OF COMPUTER ALGORITHMS

THE DESIGN AND ANALYSIS OF COMPUTER ALGORITHMS 2008 AGI-Information Management Consultants May be used for personal purporses only or by libraries associated to dandelon.com network. THE DESIGN AND ANALYSIS OF COMPUTER ALGORITHMS Alfred V. Aho Bell

More information

Search Engines Information Retrieval in Practice

Search Engines Information Retrieval in Practice Search Engines Information Retrieval in Practice W. BRUCE CROFT University of Massachusetts, Amherst DONALD METZLER Yahoo! Research TREVOR STROHMAN Google Inc. ----- PEARSON Boston Columbus Indianapolis

More information

Advanced Programming in the UNIX Environment W. Richard Stevens

Advanced Programming in the UNIX Environment W. Richard Stevens Advanced Programming in the UNIX Environment W. Richard Stevens ADDISON-WESLEY PUBLISHING COMPANY Reading, Massachusetts Menlo Park, California New York Don Mills, Ontario Wokingham, England Amsterdam

More information

School of Computer Engineering. B.Eng. (Computer Science) Content of Subjects Applicable to Students Matriculating in 2011 or later

School of Computer Engineering. B.Eng. (Computer Science) Content of Subjects Applicable to Students Matriculating in 2011 or later B.Eng. (Computer Science) Content of Subjects Applicable to Students Matriculating in 2011 or later FIRST YEAR CZ1001 DISCRETE MATHEMATICS Elementary number theory; Sets; Predicate logic; Linear recurrence

More information

FUNDAMENTALS OF. Database S wctpmc. Shamkant B. Navathe College of Computing Georgia Institute of Technology. Addison-Wesley

FUNDAMENTALS OF. Database S wctpmc. Shamkant B. Navathe College of Computing Georgia Institute of Technology. Addison-Wesley FUNDAMENTALS OF Database S wctpmc SIXTH EDITION Ramez Elmasri Department of Computer Science and Engineering The University of Texas at Arlington Shamkant B. Navathe College of Computing Georgia Institute

More information

DATABASE SYSTEM CONCEPTS

DATABASE SYSTEM CONCEPTS DATABASE SYSTEM CONCEPTS HENRY F. KORTH ABRAHAM SILBERSCHATZ University of Texas at Austin McGraw-Hill, Inc. New York St. Louis San Francisco Auckland Bogota Caracas Lisbon London Madrid Mexico Milan Montreal

More information

Modern Information Retrieval

Modern Information Retrieval Modern Information Retrieval Ricardo Baeza-Yates Berthier Ribeiro-Neto ACM Press NewYork Harlow, England London New York Boston. San Francisco. Toronto. Sydney Singapore Hong Kong Tokyo Seoul Taipei. New

More information

Fundamentals of. Database Systems. Shamkant B. Navathe. College of Computing Georgia Institute of Technology PEARSON.

Fundamentals of. Database Systems. Shamkant B. Navathe. College of Computing Georgia Institute of Technology PEARSON. Fundamentals of Database Systems 5th Edition Ramez Elmasri Department of Computer Science and Engineering The University of Texas at Arlington Shamkant B. Navathe College of Computing Georgia Institute

More information

CJT^jL rafting Cm ompiler

CJT^jL rafting Cm ompiler CJT^jL rafting Cm ompiler ij CHARLES N. FISCHER Computer Sciences University of Wisconsin Madison RON K. CYTRON Computer Science and Engineering Washington University RICHARD J. LeBLANC, Jr. Computer Science

More information

An Introduction to Search Engines and Web Navigation

An Introduction to Search Engines and Web Navigation An Introduction to Search Engines and Web Navigation MARK LEVENE ADDISON-WESLEY Ал imprint of Pearson Education Harlow, England London New York Boston San Francisco Toronto Sydney Tokyo Singapore Hong

More information

Chapter 27 Introduction to Information Retrieval and Web Search

Chapter 27 Introduction to Information Retrieval and Web Search Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval

More information

Real-Time Systems and Programming Languages

Real-Time Systems and Programming Languages Real-Time Systems and Programming Languages Ada, Real-Time Java and C/Real-Time POSIX Fourth Edition Alan Burns and Andy Wellings University of York * ADDISON-WESLEY An imprint of Pearson Education Harlow,

More information

Systems:;-'./'--'.; r. Ramez Elmasri Department of Computer Science and Engineering The University of Texas at Arlington

Systems:;-'./'--'.; r. Ramez Elmasri Department of Computer Science and Engineering The University of Texas at Arlington Data base 7\,T"] Systems:;-'./'--'.; r Modelsj Languages, Design, and Application Programming Ramez Elmasri Department of Computer Science and Engineering The University of Texas at Arlington Shamkant

More information

Programming in Python 3

Programming in Python 3 Programming in Python 3 A Complete Introduction to the Python Language Mark Summerfield.4.Addison-Wesley Upper Saddle River, NJ Boston Indianapolis San Francisco New York Toronto Montreal London Munich

More information

The Unified Modeling Language User Guide

The Unified Modeling Language User Guide The Unified Modeling Language User Guide Grady Booch James Rumbaugh Ivar Jacobson Rational Software Corporation TT ADDISON-WESLEY Boston San Francisco New York Toronto Montreal London Munich Paris Madrid

More information

1. Discovering Important Nodes through Graph Entropy The Case of Enron Database

1. Discovering Important Nodes through Graph Entropy The Case of Enron  Database 1. Discovering Important Nodes through Graph Entropy The Case of Enron Email Database ACM KDD 2005 Chicago, Illinois. 2. Optimizing Video Search Reranking Via Minimum Incremental Information Loss ACM MIR

More information

A Model for Information Retrieval Agent System Based on Keywords Distribution

A Model for Information Retrieval Agent System Based on Keywords Distribution A Model for Information Retrieval Agent System Based on Keywords Distribution Jae-Woo LEE Dept of Computer Science, Kyungbok College, 3, Sinpyeong-ri, Pocheon-si, 487-77, Gyeonggi-do, Korea It2c@koreaackr

More information

Chapter 3 - Text. Management and Retrieval

Chapter 3 - Text. Management and Retrieval Prof. Dr.-Ing. Stefan Deßloch AG Heterogene Informationssysteme Geb. 36, Raum 329 Tel. 0631/205 3275 dessloch@informatik.uni-kl.de Chapter 3 - Text Management and Retrieval Literature: Baeza-Yates, R.;

More information

DOTNET PROJECTS. DOTNET Projects. I. IEEE based IOT IEEE BASED CLOUD COMPUTING

DOTNET PROJECTS. DOTNET Projects. I. IEEE based IOT IEEE BASED CLOUD COMPUTING DOTNET PROJECTS I. IEEE based IOT 1. A Fuzzy Model-based Integration Framework for Vision-based Intelligent Surveillance Systems 2. Learning communities in social networks and their relationship with the

More information

Taming Text. How to Find, Organize, and Manipulate It MANNING GRANT S. INGERSOLL THOMAS S. MORTON ANDREW L. KARRIS. Shelter Island

Taming Text. How to Find, Organize, and Manipulate It MANNING GRANT S. INGERSOLL THOMAS S. MORTON ANDREW L. KARRIS. Shelter Island Taming Text How to Find, Organize, and Manipulate It GRANT S. INGERSOLL THOMAS S. MORTON ANDREW L. KARRIS 11 MANNING Shelter Island contents foreword xiii preface xiv acknowledgments xvii about this book

More information

Data Structures and Abstractions with Java

Data Structures and Abstractions with Java Global edition Data Structures and Abstractions with Java Fourth edition Frank M. Carrano Timothy M. Henry Data Structures and Abstractions with Java TM Fourth Edition Global Edition Frank M. Carrano University

More information

Mastering OSF/Motif Widgets

Mastering OSF/Motif Widgets Mastering OSF/Motif Widgets SECOND EDITION Donald L McMinds TV Addison-Wesley Publishing Company, Inc. Reading, Massachusetts Menlo Park, California New York Don Mills, Ontario Wokingham, England Amsterdam

More information

B.Eng. (Computer Science) Course Contents Applicable to Students Matriculating in 2016 onwards

B.Eng. (Computer Science) Course Contents Applicable to Students Matriculating in 2016 onwards FIRST YEAR B.Eng. (Computer Science) Course Contents Applicable to Students Matriculating in 2016 onwards CZ1003 INTRODUCTION TO COMPUTATIONAL THINKING Computing and Algorithms; Introduction to Python;

More information

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data American Journal of Applied Sciences (): -, ISSN -99 Science Publications Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data Ibrahiem M.M. El Emary and Ja'far

More information

Information Retrieval

Information Retrieval Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,

More information

PATTERN CLASSIFICATION AND SCENE ANALYSIS

PATTERN CLASSIFICATION AND SCENE ANALYSIS PATTERN CLASSIFICATION AND SCENE ANALYSIS RICHARD O. DUDA PETER E. HART Stanford Research Institute, Menlo Park, California A WILEY-INTERSCIENCE PUBLICATION JOHN WILEY & SONS New York Chichester Brisbane

More information

Image Processing, Analysis and Machine Vision

Image Processing, Analysis and Machine Vision Image Processing, Analysis and Machine Vision Milan Sonka PhD University of Iowa Iowa City, USA Vaclav Hlavac PhD Czech Technical University Prague, Czech Republic and Roger Boyle DPhil, MBCS, CEng University

More information

Foundations of Multidimensional and Metric Data Structures

Foundations of Multidimensional and Metric Data Structures Foundations of Multidimensional and Metric Data Structures Hanan Samet University of Maryland, College Park ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE

More information

Essentials of Database Management

Essentials of Database Management Essentials of Database Management Jeffrey A. Hoffer University of Dayton Heikki Topi Bentley University V. Ramesh Indiana University PEARSON Boston Columbus Indianapolis New York San Francisco Upper Saddle

More information

The Essential Guide to Video Processing

The Essential Guide to Video Processing The Essential Guide to Video Processing Second Edition EDITOR Al Bovik Department of Electrical and Computer Engineering The University of Texas at Austin Austin, Texas AMSTERDAM BOSTON HEIDELBERG LONDON

More information

INFORMATION RETRIEVAL SYSTEMS: Theory and Implementation

INFORMATION RETRIEVAL SYSTEMS: Theory and Implementation INFORMATION RETRIEVAL SYSTEMS: Theory and Implementation THE KLUWER INTERNATIONAL SERIES ON INFORMATION RETRIEVAL Series Editor W. Bruce Croft University of Massachusetts Amherst, MA 01003 Also in the

More information

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures Springer Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web

More information

MFC Internals. Inside the Microsoft Foundation Class Architecture. George Shepherd and Scot Wingo. Foreword by Dean D. McCrory.

MFC Internals. Inside the Microsoft Foundation Class Architecture. George Shepherd and Scot Wingo. Foreword by Dean D. McCrory. MFC Internals Inside the Microsoft Foundation Class Architecture George Shepherd and Scot Wingo Foreword by Dean D. McCrory HLuHB Darmstadt I III II III 13376492 Addison-Wesley Developers Press Reading,

More information

Digital System Design with SystemVerilog

Digital System Design with SystemVerilog Digital System Design with SystemVerilog Mark Zwolinski AAddison-Wesley Upper Saddle River, NJ Boston Indianapolis San Francisco New York Toronto Montreal London Munich Paris Madrid Capetown Sydney Tokyo

More information

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. 6 What is Web Mining? p. 6 Summary of Chapters p. 8 How

More information

AN INTRODUCTION TO FUZZY SETS Analysis and Design. Witold Pedrycz and Fernando Gomide

AN INTRODUCTION TO FUZZY SETS Analysis and Design. Witold Pedrycz and Fernando Gomide AN INTRODUCTION TO FUZZY SETS Analysis and Design Witold Pedrycz and Fernando Gomide A Bradford Book The MIT Press Cambridge, Massachusetts London, England Foreword - Preface Introduction xiii xxv xxi

More information

Department of Computer Science and Engineering B.E/B.Tech/M.E/M.Tech : B.E. Regulation: 2013 PG Specialisation : _

Department of Computer Science and Engineering B.E/B.Tech/M.E/M.Tech : B.E. Regulation: 2013 PG Specialisation : _ COURSE DELIVERY PLAN - THEORY Page 1 of 6 Department of Computer Science and Engineering B.E/B.Tech/M.E/M.Tech : B.E. Regulation: 2013 PG Specialisation : _ LP: CS6007 Rev. No: 01 Date: 27/06/2017 Sub.

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction to Information Retrieval Mohsen Kamyar چهارمین کارگاه ساالنه آزمایشگاه فناوری و وب بهمن ماه 1391 Outline Outline in classic categorization Information vs. Data Retrieval IR Models Evaluation

More information

Domain-Specific. Languages. Martin Fowler. AAddison-Wesley. Sydney Tokyo. With Rebecca Parsons

Domain-Specific. Languages. Martin Fowler. AAddison-Wesley. Sydney Tokyo. With Rebecca Parsons Domain-Specific Languages Martin Fowler With Rebecca Parsons AAddison-Wesley Upper Saddle River, NJ Boston Indianapolis San Francisco New York Toronto Montreal London Munich Paris Madrid Sydney Tokyo Singapore

More information

MariaDB Crash Course. A Addison-Wesley. Ben Forta. Upper Saddle River, NJ Boston. Indianapolis. Singapore Mexico City. Cape Town Sydney.

MariaDB Crash Course. A Addison-Wesley. Ben Forta. Upper Saddle River, NJ Boston. Indianapolis. Singapore Mexico City. Cape Town Sydney. MariaDB Crash Course Ben Forta A Addison-Wesley Upper Saddle River, NJ Boston Indianapolis San Francisco New York Toronto Montreal London Munich Paris Madrid Cape Town Sydney Tokyo Singapore Mexico City

More information

Text Mining: A Burgeoning technology for knowledge extraction

Text Mining: A Burgeoning technology for knowledge extraction Text Mining: A Burgeoning technology for knowledge extraction 1 Anshika Singh, 2 Dr. Udayan Ghosh 1 HCL Technologies Ltd., Noida, 2 University School of Information &Communication Technology, Dwarka, Delhi.

More information

Elementary IR: Scalable Boolean Text Search. (Compare with R & G )

Elementary IR: Scalable Boolean Text Search. (Compare with R & G ) Elementary IR: Scalable Boolean Text Search (Compare with R & G 27.1-3) Information Retrieval: History A research field traditionally separate from Databases Hans P. Luhn, IBM, 1959: Keyword in Context

More information

Efficiency. Efficiency: Indexing. Indexing. Efficiency Techniques. Inverted Index. Inverted Index (COSC 488)

Efficiency. Efficiency: Indexing. Indexing. Efficiency Techniques. Inverted Index. Inverted Index (COSC 488) Efficiency Efficiency: Indexing (COSC 488) Nazli Goharian nazli@cs.georgetown.edu Difficult to analyze sequential IR algorithms: data and query dependency (query selectivity). O(q(cf max )) -- high estimate-

More information

Eclipse Support for Using Eli and Teaching Programming Languages

Eclipse Support for Using Eli and Teaching Programming Languages Electronic Notes in Theoretical Computer Science 141 (2005) 189 194 www.elsevier.com/locate/entcs Eclipse Support for Using Eli and Teaching Programming Languages Anthony M. Sloane 1,2 Department of Computing

More information

James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence!

James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence! James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence! (301) 219-4649 james.mayfield@jhuapl.edu What is Information Retrieval? Evaluation

More information

MODERN DATABASE MANAGEMENT

MODERN DATABASE MANAGEMENT MODERN DATABASE MANAGEMENT FOURTH EDITION FRED R. MCFADDEN JEFFREY A. HOFFER r(\) THE BENJAMIN/CUMMINGS PUBLISHING COMPANY INC. REDWOOD CITY, CALIFORNIA MENLO PARK, CALIFORNIA READING, MASSACHUSETTS NEW

More information

BVRIT HYDERABAD College of Engineering for Women. Department of Computer Science and Engineering. Course Hand Out

BVRIT HYDERABAD College of Engineering for Women. Department of Computer Science and Engineering. Course Hand Out BVRIT HYDERABAD College of Engineering for Women Department of Computer Science and Engineering Course Hand Out Subject Name : Information Retrieval Systems Prepared by : Dr.G.Naga Satish, Associate Professor

More information

Part I: Data Mining Foundations

Part I: Data Mining Foundations Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web and the Internet 2 1.3. Web Data Mining 4 1.3.1. What is Data Mining? 6 1.3.2. What is Web Mining?

More information

INTRODUCTION TO LINEAR AND NONLINEAR PROGRAMMING

INTRODUCTION TO LINEAR AND NONLINEAR PROGRAMMING INTRODUCTION TO LINEAR AND NONLINEAR PROGRAMMING DAVID G. LUENBERGER Stanford University TT ADDISON-WESLEY PUBLISHING COMPANY Reading, Massachusetts Menlo Park, California London Don Mills, Ontario CONTENTS

More information

Query Optimization in Search Engines

Query Optimization in Search Engines Proc. 1 st International Conference on Machine Learning and Data Engineering (icmlde2017) 20-22 Nov 2017, Sydney, Australia ISBN: 978-0-6480147-3-7 Query Optimization in Search Engines Baisakhi 1, Nirjhar

More information

Domain-specific Concept-based Information Retrieval System

Domain-specific Concept-based Information Retrieval System Domain-specific Concept-based Information Retrieval System L. Shen 1, Y. K. Lim 1, H. T. Loh 2 1 Design Technology Institute Ltd, National University of Singapore, Singapore 2 Department of Mechanical

More information

Computer Architecture A Quantitative Approach

Computer Architecture A Quantitative Approach Computer Architecture A Quantitative Approach Third Edition John L. Hennessy Stanford University David A. Patterson University of California at Berkeley With Contributions by David Goldberg Xerox Palo

More information

Information Retrieval CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Information Retrieval CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science Information Retrieval CS 6900 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Information Retrieval Information Retrieval (IR) is finding material of an unstructured

More information

Introduction to Computer Graphics

Introduction to Computer Graphics Introduction to Computer Graphics James D. Foley Georgia Institute of Technology Andries van Dam Brown University Steven K. Feiner Columbia University John F. Hughes Brown University Richard L. Phillips

More information

CIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensive Computing. University of Florida, CISE Department Prof.

CIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensive Computing. University of Florida, CISE Department Prof. CIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensive Computing University of Florida, CISE Department Prof. Daisy Zhe Wang Text To Knowledge IR and Boolean Search Text to Knowledge (IE)

More information

Syntax Analysis. Chapter 4

Syntax Analysis. Chapter 4 Syntax Analysis Chapter 4 Check (Important) http://www.engineersgarage.com/contributio n/difference-between-compiler-andinterpreter Introduction covers the major parsing methods that are typically used

More information

NETWORKING KEITH W. ROSS. Polytechnic Institute of NYU. Addison-Wesley

NETWORKING KEITH W. ROSS. Polytechnic Institute of NYU. Addison-Wesley COMPUTER FIFTH EDITION NETWORKING JAMES F. KUROSE University of Massachusetts, Amherst KEITH W. ROSS Polytechnic Institute of NYU Addison-Wesley New York Boston San Francisco London Toronto Sydney Tokyo

More information

PRACTICAL SPEECH USER INTERFACE DESIGN

PRACTICAL SPEECH USER INTERFACE DESIGN ; ; : : : : ; : ; PRACTICAL SPEECH USER INTERFACE DESIGN й fail James R. Lewis. CRC Press Taylor &. Francis Group Boca Raton London New York CRC Press is an imprint of the Taylor & Francis Group, an informa

More information

CS377: Database Systems Text data and information. Li Xiong Department of Mathematics and Computer Science Emory University

CS377: Database Systems Text data and information. Li Xiong Department of Mathematics and Computer Science Emory University CS377: Database Systems Text data and information retrieval Li Xiong Department of Mathematics and Computer Science Emory University Outline Information Retrieval (IR) Concepts Text Preprocessing Inverted

More information

Compiler Design Overview. Compiler Design 1

Compiler Design Overview. Compiler Design 1 Compiler Design Overview Compiler Design 1 Preliminaries Required Basic knowledge of programming languages. Basic knowledge of FSA and CFG. Knowledge of a high programming language for the programming

More information

CLASSIC DATA STRUCTURES IN JAVA

CLASSIC DATA STRUCTURES IN JAVA CLASSIC DATA STRUCTURES IN JAVA Timothy Budd Oregon State University Boston San Francisco New York London Toronto Sydney Tokyo Singapore Madrid Mexico City Munich Paris Cape Town Hong Kong Montreal CONTENTS

More information

Information Retrieval

Information Retrieval Information Retrieval CSC 375, Fall 2016 An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have

More information

Formal Languages and Compilers Lecture I: Introduction to Compilers

Formal Languages and Compilers Lecture I: Introduction to Compilers Formal Languages and Compilers Lecture I: Introduction to Compilers Free University of Bozen-Bolzano Faculty of Computer Science POS Building, Room: 2.03 artale@inf.unibz.it http://www.inf.unibz.it/ artale/

More information

Framework Design Guidelines

Framework Design Guidelines Framework Design Guidelines Conventions, Idioms, and Patterns for Reusable.NET Libraries Krzysztof Cwalina Brad Abrams Addison-Wesley Upper Saddle River, NJ Boston Indianapolis San Francisco New York Toronto

More information

Contents. Foreword to Second Edition. Acknowledgments About the Authors

Contents. Foreword to Second Edition. Acknowledgments About the Authors Contents Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments About the Authors xxxi xxxv Chapter 1 Introduction 1 1.1 Why Data Mining? 1 1.1.1 Moving toward the Information Age 1

More information

Contents. Chapter 1 SPECIFYING SYNTAX 1

Contents. Chapter 1 SPECIFYING SYNTAX 1 Contents Chapter 1 SPECIFYING SYNTAX 1 1.1 GRAMMARS AND BNF 2 Context-Free Grammars 4 Context-Sensitive Grammars 8 Exercises 8 1.2 THE PROGRAMMING LANGUAGE WREN 10 Ambiguity 12 Context Constraints in Wren

More information

CS 6320 Natural Language Processing

CS 6320 Natural Language Processing CS 6320 Natural Language Processing Information Retrieval Yang Liu Slides modified from Ray Mooney s (http://www.cs.utexas.edu/users/mooney/ir-course/slides/) 1 Introduction of IR System components, basic

More information

Contents. List of Figures. List of Tables. Acknowledgements

Contents. List of Figures. List of Tables. Acknowledgements Contents List of Figures List of Tables Acknowledgements xiii xv xvii 1 Introduction 1 1.1 Linguistic Data Analysis 3 1.1.1 What's data? 3 1.1.2 Forms of data 3 1.1.3 Collecting and analysing data 7 1.2

More information

FPGAs: Instant Access

FPGAs: Instant Access FPGAs: Instant Access Clive"Max"Maxfield AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO % ELSEVIER Newnes is an imprint of Elsevier Newnes Contents

More information

Software Architectures

Software Architectures Software Architectures Richard N. Taylor Information and Computer Science University of California, Irvine Irvine, California 92697-3425 taylor@ics.uci.edu http://www.ics.uci.edu/~taylor +1-949-824-6429

More information

Information Management (IM)

Information Management (IM) 1 2 3 4 5 6 7 8 9 Information Management (IM) Information Management (IM) is primarily concerned with the capture, digitization, representation, organization, transformation, and presentation of information;

More information

A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet

A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet Joerg-Uwe Kietz, Alexander Maedche, Raphael Volz Swisslife Information Systems Research Lab, Zuerich, Switzerland fkietz, volzg@swisslife.ch

More information

UNIT-4 (COMPILER DESIGN)

UNIT-4 (COMPILER DESIGN) UNIT-4 (COMPILER DESIGN) An important part of any compiler is the construction and maintenance of a dictionary containing names and their associated values, such type of dictionary is called a symbol table.

More information

Stress-Free Success Using Microsoft WORD 2004

Stress-Free Success Using Microsoft WORD 2004 Stress-Free Success Using Microsoft WORD 2004 Lynn D. Brown Table of Contents Chapter 1 Getting Started 1.1 Symbols 5 1.2 Consistent Steps 6 1.3 Toolbars 7 1.4 Custom Toolbars 8 Chapter 2 Document Set-up

More information

Chapter 2 - Concepts and Definitions

Chapter 2 - Concepts and Definitions Prof. Dr.-Ing. Stefan Deßloch AG Heterogene Informationssysteme Geb. 36, Raum 329 Tel. 0631/205 3275 dessloch@informatik.uni-kl.de Chapter 2 - Concepts and Definitions Introduction and Requirements Database

More information

Structured Parallel Programming Patterns for Efficient Computation

Structured Parallel Programming Patterns for Efficient Computation Structured Parallel Programming Patterns for Efficient Computation Michael McCool Arch D. Robison James Reinders ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO

More information

The University of Jordan. Accreditation & Quality Assurance Center. Curriculum for Doctorate Degree

The University of Jordan. Accreditation & Quality Assurance Center. Curriculum for Doctorate Degree Accreditation & Quality Assurance Center Curriculum for Doctorate Degree 1. Faculty King Abdullah II School for Information Technology 2. Department Computer Science الدكتوراة في علم الحاسوب (Arabic).3

More information

Programming Guide. Aaftab Munshi Dan Ginsburg Dave Shreiner. TT r^addison-wesley

Programming Guide. Aaftab Munshi Dan Ginsburg Dave Shreiner. TT r^addison-wesley OpenGUES 2.0 Programming Guide Aaftab Munshi Dan Ginsburg Dave Shreiner TT r^addison-wesley Upper Saddle River, NJ Boston Indianapolis San Francisco New York Toronto Montreal London Munich Paris Madrid

More information

Name of the lecturer Doç. Dr. Selma Ayşe ÖZEL

Name of the lecturer Doç. Dr. Selma Ayşe ÖZEL Y.L. CENG-541 Information Retrieval Systems MASTER Doç. Dr. Selma Ayşe ÖZEL Information retrieval strategies: vector space model, probabilistic retrieval, language models, inference networks, extended

More information

SQL Queries. for. Mere Mortals. Third Edition. A Hands-On Guide to Data Manipulation in SQL. John L. Viescas Michael J. Hernandez

SQL Queries. for. Mere Mortals. Third Edition. A Hands-On Guide to Data Manipulation in SQL. John L. Viescas Michael J. Hernandez SQL Queries for Mere Mortals Third Edition A Hands-On Guide to Data Manipulation in SQL John L. Viescas Michael J. Hernandez r A TT TAddison-Wesley Upper Saddle River, NJ Boston Indianapolis San Francisco

More information

Word Processing. Delete text: Allows you to erase characters, words, lines, or pages as easily as you can cross them out on paper.

Word Processing. Delete text: Allows you to erase characters, words, lines, or pages as easily as you can cross them out on paper. Word Processing Practice Of all computer applications, word processing is the most common. To perform word processing, you need a computer, a special program called a word processor, and a printer. A word

More information

AIDAS: Incremental Logical Structure Discovery in PDF Documents

AIDAS: Incremental Logical Structure Discovery in PDF Documents AIDAS: Incremental Logical Structure Discovery in PDF Documents Anjo Anjewierden Social Science Informatics, University of Amsterdam Roetersstraat 15, 1018 WB Amsterdam, The Netherlands anjo@swi.psy.uva.nl

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction to Information Retrieval (Supplementary Material) Zhou Shuigeng March 23, 2007 Advanced Distributed Computing 1 Text Databases and IR Text databases (document databases) Large collections

More information

In fact, in many cases, one can adequately describe [information] retrieval by simply substituting document for information.

In fact, in many cases, one can adequately describe [information] retrieval by simply substituting document for information. LµŒ.y A.( y ý ó1~.- =~ _ _}=ù _ 4.-! - @ \{=~ = / I{$ 4 ~² =}$ _ = _./ C =}d.y _ _ _ y. ~ ; ƒa y - 4 (~šƒ=.~². ~ l$ y C C. _ _ 1. INTRODUCTION IR System is viewed as a machine that indexes and selects

More information

COMPUTATIONAL SEMANTICS WITH FUNCTIONAL PROGRAMMING JAN VAN EIJCK AND CHRISTINA UNGER. lg Cambridge UNIVERSITY PRESS

COMPUTATIONAL SEMANTICS WITH FUNCTIONAL PROGRAMMING JAN VAN EIJCK AND CHRISTINA UNGER. lg Cambridge UNIVERSITY PRESS COMPUTATIONAL SEMANTICS WITH FUNCTIONAL PROGRAMMING JAN VAN EIJCK AND CHRISTINA UNGER lg Cambridge UNIVERSITY PRESS ^0 Contents Foreword page ix Preface xiii 1 Formal Study of Natural Language 1 1.1 The

More information

DB2 SQL Tuning Tips for z/os Developers

DB2 SQL Tuning Tips for z/os Developers DB2 SQL Tuning Tips for z/os Developers Tony Andrews IBM Press, Pearson pic Upper Saddle River, NJ Boston Indianapolis San Francisco New York Toronto Montreal London Munich Paris Madrid Cape Town Sydney

More information

An Efficient Implementation of PATR for Categorial Unification Grammar

An Efficient Implementation of PATR for Categorial Unification Grammar An Efficient Implementation of PATR for Categorial Unification Grammar Todd Yampol Stanford University Lauri Karttunen Xerox PARC and CSLI 1 Introduction This paper describes C-PATR, a new C implementation

More information

Introduction to IR Systems: Supporting Boolean Text Search

Introduction to IR Systems: Supporting Boolean Text Search Introduction to IR Systems: Supporting Boolean Text Search Ramakrishnan & Gehrke: Chapter 27, Sections 27.1 27.2 CPSC 404 Laks V.S. Lakshmanan 1 Information Retrieval A research field traditionally separate

More information

Text Search and Similarity Search

Text Search and Similarity Search Text Search and Similarity Search PG 12.1 12.2, F.30 Dr. Chris Mayfield Department of Computer Science James Madison University Apr 03, 2017 Hello DBLP Database of CS journal articles and conference proceedings

More information