CNIPA, FUB and University of Rome Tor Vergata at TREC 2008 Legal Track

Similar documents
FUB, IASI-CNR and University of Tor Vergata at TREC 2008 Blog Track

Exploring the Query Expansion Methods for Concept Based Representation

SNUMedinfo at TREC CDS track 2014: Medical case-based retrieval task

BUPT at TREC 2009: Entity Track

Northeastern University in TREC 2009 Million Query Track

Using Model-Theoretic Invariants for Semantic Integration. Michael Gruninger NIST / Institute for Systems Research University of Maryland

Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web

ICTNET at Web Track 2010 Diversity Task

Multi-Modal Communication

Dana Sinno MIT Lincoln Laboratory 244 Wood Street Lexington, MA phone:

4. Lessons Learned in Introducing MBSE: 2009 to 2012

Service Level Agreements: An Approach to Software Lifecycle Management. CDR Leonard Gaines Naval Supply Systems Command 29 January 2003

PRIS at TREC2012 KBA Track

UMass at TREC 2006: Enterprise Track

Fondazione Ugo Bordoni at TREC 2003: robust and web track

COMPUTATIONAL FLUID DYNAMICS (CFD) ANALYSIS AND DEVELOPMENT OF HALON- REPLACEMENT FIRE EXTINGUISHING SYSTEMS (PHASE II)

Kathleen Fisher Program Manager, Information Innovation Office

FUDSChem. Brian Jordan With the assistance of Deb Walker. Formerly Used Defense Site Chemistry Database. USACE-Albuquerque District.

BIT at TREC 2009 Faceted Blog Distillation Task

Fondazione Ugo Bordoni at TREC 2004

Experiments on Related Entity Finding Track at TREC 2009 Qing Yang,Peng Jiang, Chunxia Zhang, Zhendong Niu

Introducing I 3 CON. The Information Interpretation and Integration Conference

A Methodology for End-to-End Evaluation of Arabic Document Image Processing Software

A Review of the 2007 Air Force Inaugural Sustainability Report

Experiments with ClueWeb09: Relevance Feedback and Web Tracks

DoD Common Access Card Information Brief. Smart Card Project Managers Group

Shallow Ocean Bottom BRDF Prediction, Modeling, and Inversion via Simulation with Surface/Volume Data Derived from X-Ray Tomography

Setting the Standard for Real-Time Digital Signal Processing Pentek Seminar Series. Digital IF Standardization

Empirically Based Analysis: The DDoS Case

Running CyberCIEGE on Linux without Windows

2013 US State of Cybercrime Survey

Concept of Operations Discussion Summary

ASSESSMENT OF A BAYESIAN MODEL AND TEST VALIDATION METHOD

Topology Control from Bottom to Top

Opinions in Federated Search: University of Lugano at TREC 2014 Federated Web Search Track

Concept Based Tie-breaking and Maximal Marginal Relevance Retrieval in Microblog Retrieval

University of Pittsburgh at TREC 2014 Microblog Track

ARINC653 AADL Annex Update

A Distributed Parallel Processing System for Command and Control Imagery

Defense Hotline Allegations Concerning Contractor-Invoiced Travel for U.S. Army Corps of Engineers' Contracts W912DY-10-D-0014 and W912DY-10-D-0024

Microsoft Research Asia at the Web Track of TREC 2009

Information, Decision, & Complex Networks AFOSR/RTC Overview

Cyber Threat Prioritization

Use of the Polarized Radiance Distribution Camera System in the RADYO Program

TNO and RUN at the TREC 2012 Contextual Suggestion Track: Recommending personalized touristic sights using Google Places

University of TREC 2009: Indexing half a billion web pages

Architecting for Resiliency Army s Common Operating Environment (COE) SERC

DEVELOPMENT OF A NOVEL MICROWAVE RADAR SYSTEM USING ANGULAR CORRELATION FOR THE DETECTION OF BURIED OBJECTS IN SANDY SOILS

Advanced Numerical Methods for Numerical Weather Prediction

Accuracy of Computed Water Surface Profiles

SETTING UP AN NTP SERVER AT THE ROYAL OBSERVATORY OF BELGIUM

Space and Missile Systems Center

Balancing Transport and Physical Layers in Wireless Ad Hoc Networks: Jointly Optimal Congestion Control and Power Control

High-Assurance Security/Safety on HPEC Systems: an Oxymoron?

Combining fields for query expansion and adaptive query expansion

Monte Carlo Techniques for Estimating Power in Aircraft T&E Tests. Todd Remund Dr. William Kitto EDWARDS AFB, CA. July 2011

Vision Protection Army Technology Objective (ATO) Overview for GVSET VIP Day. Sensors from Laser Weapons Date: 17 Jul 09 UNCLASSIFIED

ENVIRONMENTAL MANAGEMENT SYSTEM WEB SITE (EMSWeb)

MODELING AND SIMULATION OF LIQUID MOLDING PROCESSES. Pavel Simacek Center for Composite Materials University of Delaware

Lessons Learned in Adapting a Software System to a Micro Computer

Dynamic Information Management and Exchange for Command and Control Applications

Advanced Numerical Methods for Numerical Weather Prediction

Dr. Stuart Dickinson Dr. Donald H. Steinbrecher Naval Undersea Warfare Center, Newport, RI May 10, 2011

ATCCIS Replication Mechanism (ARM)

Energy Security: A Global Challenge

The C2 Workstation and Data Replication over Disadvantaged Tactical Communication Links

75th Air Base Wing. Effective Data Stewarding Measures in Support of EESOH-MIS

C2-Simulation Interoperability in NATO

73rd MORSS CD Cover Page UNCLASSIFIED DISCLOSURE FORM CD Presentation

73rd MORSS CD Cover Page UNCLASSIFIED DISCLOSURE FORM CD Presentation

Technological Advances In Emergency Management

Corrosion Prevention and Control Database. Bob Barbin 07 February 2011 ASETSDefense 2011

David W. Hyde US Army Engineer Waterways Experiment Station Vicksburg, Mississippi ABSTRACT

Towards a Formal Pedigree Ontology for Level-One Sensor Fusion

Space and Missile Systems Center

NOT(Faster Implementation ==> Better Algorithm), A Case Study

Detection, Classification, & Identification of Objects in Cluttered Images

Shedding Light on the Graph Schema

Using Templates to Support Crisis Action Mission Planning

Wireless Connectivity of Swarms in Presence of Obstacles

Web Site update. 21st HCAT Program Review Toronto, September 26, Keith Legg

Components and Considerations in Building an Insider Threat Program

AFRL-ML-WP-TM

Shallow Ocean Bottom BRDF Prediction, Modeling, and Inversion via Simulation with Surface/Volume Data Derived from X-ray Tomography

US Army Industry Day Conference Boeing SBIR/STTR Program Overview

SCIENTIFIC VISUALIZATION OF SOIL LIQUEFACTION

University of Waterloo: Logistic Regression and Reciprocal Rank Fusion at the Microblog Track

CENTER FOR ADVANCED ENERGY SYSTEM Rutgers University. Field Management for Industrial Assessment Centers Appointed By USDOE

Data Warehousing. HCAT Program Review Park City, UT July Keith Legg,

INTEGRATING LOCAL AND GLOBAL NAVIGATION IN UNMANNED GROUND VEHICLES

An Efficient Architecture for Ultra Long FFTs in FPGAs and ASICs

Basing a Modeling Environment on a General Purpose Theorem Prover

QCRI at TREC 2014: Applying the KISS principle for the TTG task in the Microblog Track

The Last Word on TLE. James Brownlow AIR FORCE TEST CENTER EDWARDS AFB, CA May, Approved for public release ; distribution is unlimited.

Distributed Real-Time Embedded Video Processing

ENHANCED FRAGMENTATION MODELING

32nd Annual Precise Time and Time Interval (PTTI) Meeting. Ed Butterline Symmetricom San Jose, CA, USA. Abstract

Speaker Verification Using SVM

Requirements for Scalable Application Specific Processing in Commercial HPEC

Washington University

Transcription:

CNIPA, FUB and University of Rome Tor Vergata at TREC 2008 Legal Track Giambattista Amati 1, Marco Bianchi 2, Alessandro Celi 3, Mauro Draoli 2, Giorgio Gambosi 45, and Giovanni Stilo 5 1 Fondazione Ugo Bordoni (FUB), Rome, Italy gba@fub.it 2 Italian National Centre for ICT in the Public Administrations (CNIPA), Rome, Italy marco.bianchi@cnipa.it, draoli@cnipa.it 3 Computer Science Department at University of L Aquila, L Aquila, Italy alessandro.celi@di.univaq.it 4 Matematics Department at University of Rome Tor Vergata, Rome, Italy gambosi@mat.uniroma2.it 5 NESTOR - Laboratory of the University of Rome Tor Vergata, Rome, Italy stilo@nestor.uniroma2.it 1 Introduction The TREC Legal track was introduced in TREC 2006 with the claimed purpose of to evaluate the efficacy of automated support for review and production of electronic records in the context of litigation, regulation and legislation. The TREC Legal track 2008 runs three tasks: (1) an automatic ad hoc task, (2) an automatic relevance feedback task, and (3) an interactive task. We have only taken part in the automatic ad hoc task of the TREC Legal track 2008, and focused on the following issues: 1. Indexing. The CDIP test collection is characterized by an large number of unique terms due to OCR mistakes. We have defined a term selection strategy to reduce the number of terms, as described in Section 2. 2. Querying. The analysis of the past TREC results for the Legal track showed that the best retrieval strategy basically returned a ranked list of the boolean retrieved documents. As a consequence, we have defined a strategy aimed to boost the score of documents satisfying the final negotiated boolean query. Furthermore, we defined a method for automatic construction of a weighted query from the request text, as reported in Section 3. 3. Estimation of the K value. We have used a query performance prediction approach to try to estimate K values. The query weighting model that we have adopted is described in Section 4. Submitted runs and their evaluation are reported in Section 5. 2 Indexing The IIT Complex Document Information Processing (CDIP) test collection is based on a snapshot of the Tobacco Master Settlement Agreement (MSA) sub-collection of the LTDL.

Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. 1. REPORT DATE NOV 2008 2. REPORT TYPE 3. DATES COVERED 00-00-2008 to 00-00-2008 4. TITLE AND SUBTITLE CNIPA, FUB and University of Rome Tor Vergata at TREC 2008 Legal Track 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) Fondazione Ugo Bordoni (FUB),Rome, Italy, 8. PERFORMING ORGANIZATION REPORT NUMBER 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR S ACRONYM(S) 12. DISTRIBUTION/AVAILABILITY STATEMENT Approved for public release; distribution unlimited 11. SPONSOR/MONITOR S REPORT NUMBER(S) 13. SUPPLEMENTARY NOTES Seventeenth Text REtrieval Conference (TREC 2008) held in Gaithersburg, Maryland, November 18-21, 2008. The conference was co-sponsored bythe National Institute of Standards and Technology (NIST) the Defense Advanced Research Projects Agency (DARPA) and the Advanced Research and Development Activity (ARDA). 14. ABSTRACT 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT a. REPORT unclassified b. ABSTRACT unclassified c. THIS PAGE unclassified Same as Report (SAR) 18. NUMBER OF PAGES 6 19a. NAME OF RESPONSIBLE PERSON Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18

The CDIP collection presents a large number of unique terms due to OCR mistakes. For example, GOV2, one of the biggest test collections (426GB), contains about 50ML unique terms for 25ML of documents, whereas CDIP (57GB of text) contains almost twice as much: 95ML unique terms for 7ML documents. We have used the Terrier (TERabyte RetrIEveR) Information Retrieval platform [1] using the following indexing configuration: TrecDocTags.doctag=record TrecDocTags.idtag=tid TrecDocTags.process=ot TrecDocTags.casesensitive=false Termpipelines = Stopwords, PorterStemmer With this configuration all meta-data were ignored. In addition, words containing more than 20 characters were excluded. With such settings the generated index would have had 95ML terms with 14GB of index structures (direct, inverted and lexicon files). Therefore, we have removed from index all terms having a very low frequency (i.e. 25) producing a lexicon with only 9ML terms. We have noticed that the presence of OCR errors, by altering both global and local statistics, affects the retrieval quality. 3 Retrieval We have used a variation of the DFRee model, a parameter free IR model offered by the Terrier platform. With this model we have not had to tune any parameters, and we have focused only on the proposed methodology, by evaluating the gain in performance with respect to the baseline. We have extracted the most informative terms from the top-returned documents as performed in query expansion. In this expansion process, terms in the topreturned documents are weighted using a particular DFR term weighting model. During our experiment we have found that Bo1 (Bose-Einstein statistics) term weighting models best fit with this collection. We have also made tuning on the number of top-returned documents and number of expanded term; we have obtained the best performance considering 5 documents and 10 terms. 3.1 Boolean re-rank We have boosted the scores of all documents d i B, where B is the set of documents satisfying the final negotiated boolean query. The score s i of a document d i B was augmented by a value s: s i = s i + s where s = s k is the score of the k-th document and k = α b

b is the number of documents in B and α a parameter. Notice that s k varies on each topic because b depends on the final negotiated boolean query. If α = 0 then the score s 0 of the top-rank document is added to all d B, and all these documents are shifted up in all topmost positions. Documents also keep the order computed by the DFR model. On the other hand when α then s i s i. 3.2 Topic processing According to TREC Legal Track 2008 guideline each topic is composed by 4 fields: Request- Text, ProposalByDefendant, RejoinderByPlaintiff, andfinalquery. The union of all these fields would have produced a very long query. We have thus indexed all topics to obtain a query lexicon with the aim to remove all query-terms that are non informative. Then for each topic we weighted the remaining query-terms by the number of occurrences in the four fields of the topic. We show here an example of a produced query (topic n.52): affect 1 agricultur 2 fertil 2 commerci 2 yield 4 rai 1 output 2 hpf 3 phosphoru 2 crop 6 greater 1 multipl 1 augment 1 increa 1 introduc 1 boost 3 fertiliz 3 soil 1 phosphat 5 doubl 1 high 4 purpos 1 tripl 1 4 Estimating K values 4.1 Query performance prediction Robustness is an important measure reflecting the retrieval performance of an Information Retrieval (IR) system. It particularly refers to how an IR system deals with poorlyperforming queries. As stressed by Cronen-Townsend et. al. [2], poorly-performing queries considerably hurt the effectiveness of an IR system. Indeed, this issue has become important in IR research. Moreover, the use of reliable query performance predictors is a step towards determining for each query the most optimal corresponding retrieval strategy. For example, in [3], the use of query performance predictors allowed to devise a selective decision methodology avoiding the failure of query expansion. In order to predict the performance of a query, the first step is to differentiate the highly-performing queries from the poorly-performing queries. This problem has recently been the focus of an increasing research attention. In [2], Cronen-Townsend et. al. suggested that query performance is correlated with the clarity of a query. Following this idea, they used a clarity score as the predictor of query performance. In their work, the clarity score is defined as the Kullback-Leibler divergence of the query model from the collection model. In [3], Amati et. al. proposed the notion of query-difficulty to predict query performance. Their basic idea is that the term weight, as obtained in query expansion, provides evidence of the query performance. We use a query performance prediction approach to try to estimate a correct K measure. The basic idea is quite simple: with an easy query we doesn t need to go deep in the retrieval because all relevant documents are on top; otherwise if we try with an hard one, we need to goo much deeper to find all relevant documents: easy query = small K value hard query = large K value

4.2 Query weighting model Since we had not time to extract expanded query weights we simply used the inverse document frequency: D i = t q i log ( nt ) N Where t is a term of a query q i, n t is the number of relevant documents where the term t appears, N is the number of all the documents in the collection. The correlation factor between D i and K i, with K i the optimal value for the F 1 @K, was about 30% with TREC data of 2007, whereas an higher correlation there was between D i and the estimated number of relevant documents (37%). 4.3 Normalization s variants To turn the coefficient D i into an estimated K value we used the Z-Score [4]. For the new queries we have formulated three different models: ( 1. K O = 1 D µ ) idf µ k σ idf where µ idf and σ idf are the mean and the standard deviation of all D i respectively and µ k is the mean ( of the optimal K. 2. K R = 1 D µ ) idf µ R σ idf where µ R is the ( mean of the cardinalities of relevant documents. 3. K B = 1 D µ ) idf µ B σ idf where µ B is the mean of the cardinalities of B. 5 Submitted runs and results For the participation to the ad hoc task we have submitted the following runs: The run labeled as CTFrtSk represents the compulsory baseline. It is computed just using the request text field as query. The K values are computed using the first model described in Section 4.3. The run labeled as CTFrtSkBr0 is computed just using the (typically one-sentence) request text field as query. The boolean re-rank strategy is applied with α = 0.0. The K values are computed using the first model described in Section 4.3. The run labeled as CTFggeSkBr0 computed using the query elaboration method described are computed using the first model described in Section 4.3. The run labeled as CTFgge4kBr0 is computed using the query elaboration method described are equal to 40,000 for each topic.

The run labeled as CTFgge10kBr0 is computed using the query elaboration method described are equal to 100,000 for each topic. The run labeled as CTFggeBkBr0 is computed using the query elaboration method described are computed using the third model described in Section 4.3. The run labeled as CTFggeBkBr1 computed using the query elaboration method described in Section 3. The boolean re-rank strategy is applied with α = 1.0. The K values are computed using the third model described in Section 4.3. The run labeled as CTFggeRkBr0 computed using the query elaboration method described are computed using the second model described in Section 4.3. Results of submitted runs are shown in Table 1. Table 1. Summary of the CTF Legal track runs. F 1 HF 1 Run label est F 1 @K est F 1 @R est HF 1 @K h est F 1 @R h Best run of the ad hoc task 08 0.2204 0.2458 0.1064 0.1770 Median run of the ad hoc task 08 0.1429 0.1823 0.0702 0.1109 CTFrtSk (baseline) 0.1282 0.1736 0.0481 0.0844 CTFrtSkBr0 0.1346 0.1822 0.0723 0.1113 CTFggeSkBr0 0.1544 0.2152 0.0811 0.1008 CTFgge4kBr0 0.1849 0.2152 0.0818 0.1008 CTFgge10kBr0 0.2160 0.2152 0.0788 0.1008 CTFggeBkBr0 0.1800 0.2152 0.0799 0.1008 CTFggeBkBr1 0.1856 0.2196 0.0824 0.1016 CTFggeRkBr0 0.1785 0.2152 0.0947 0.1008 With respect to the main metric (i.e. estimated F 1 @K) the evaluation of our baseline (CT- FrtSk) is 0.1282. Six of the eight submitted runs are over the median of the ad hoc task of Legal track 2008. Applying the boolean re-rank to the baseline (CTFrtSkBr0) we improve our performance of +4.99%. The introduction of the the query elaboration method (CTFggeSkBr0) allows to improve our performance of +20.43% with respect to the baseline. If we consider only those runs in which the K value is computed on basis of the query complexity, the best improvement with respect to the baseline is 44.77% (CTFggeBkBr1). This is not our best result. In fact the best run is CTFgge10kBr0. It improves the baseline performance of 68.48% considering K=10000 for each topic. CTFgge10kBr0 differs only -2.03% from the best run among all 64 runs submitted to the ad hoc task of Legal track 2008.

With respect to the estimated HF 1 @K h metric the baseline is evaluated 0.0481. Seven of the eight submitted runs are over the median of the ad hoc task of Legal track 2008. Our best run is CTFggeRkBr0 with an evaluation of 0.0947 and an improvement of 96.88% over the baseline. CTFggeRkBr0 differs 12.35% from the best run among all 64 runs submitted for the ad hoc task of Legal track 2008. We have already started a experimentation aimed to better analyze the relationship between the D i values, introduced in Section 4.2, and the retrieval performance. References 1. I. Ounis, G. Amati, V. Plachouras, B. He, C. Macdonald, and C. Lioma. Terrier: A High Performance and Scalable Information Retrieval Platform. In Proceedings of ACM SIGIR 06 Workshop on Open Source Information Retrieval (OSIR 2006), 2006. 2. S. Cronen-Townsend, Y. Zhou, and W.B. Croft. Predicting query performance. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 299 306, Tampere, Finland, 2002. 3. G. Amati, C. Carpineto, and G. Romano. Query difficulty, robustness, and selective application of query expansion. In Advances in Information Retrieval, Proceedings of the 26th European Conference on IR Research, ECIR 2004, pages 127 137, Sunderland UK, 2004. 4. E. Altman. Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. Journal of Finance, 1968.