Analyzing a Large Corpus of Crowdsourced Plagiarism

Size: px
Start display at page:

Download "Analyzing a Large Corpus of Crowdsourced Plagiarism"

Transcription

1 Analyzing a Large Corpus of Crowdsourced Plagiarism Master s Thesis Michael Völske Fakultät Medien Bauhaus-Universität Weimar Supervised by: Prof. Dr. Benno Stein Prof. Dr. Volker Rodehorst Advisors: Dr. Steven Burrows Dr. Matthias Hagen Dr. Martin Potthast

2 Analyzing a Large Corpus of Crowdsourced Plagiarism Master s Thesis Outline Introduction The Webis-TRC-12 Dataset Categorizing Crowdsourced Text Reuse Search Missions For Source Retrieval Summary

3 Introduction Modeling Text Reuse From the Web Search Copy & Paste Paraphrase Search I m Feeling Lucky [Potthast 2011] 3 [ ] Michael Völske 2013

4 Introduction Previous Plagiarism Corpora Source Selection Automatic Excerpt Automatic Modification Insertion Automatically generated artificial plagiarism PAN-PC-10/11: > documents; > plagiarism cases Automatic transformations do not preserve semantics 4 [ ] Michael Völske 2013

5 The Webis-TRC-12 Dataset 5 [ ] Michael Völske 2013

6 The Webis-TRC-12 Dataset Construction Overview Topic Author Editor ChatNoir SE ClueWeb API Revision log Query log ClueWeb 6 [ ] Michael Völske 2013

7 The Webis-TRC-12 Dataset Construction Overview: Topics Topic Author Editor ChatNoir SE ClueWeb API Revision log Query log ClueWeb 7 [ ] Michael Völske 2013

8 The Webis-TRC-12 Dataset Construction Overview: Topics Example topic: Obama s family. Write about President Barack Obama s family history, including genealogy, national origins, places and dates of birth, etc. Where did Barack Obama s parents and grandparents come from? Also include a brief biography of Obama s mother. Based on TREC Web Track topics topics, 297 essays Target essay length: 5000 words 8 [ ] Michael Völske 2013

9 The Webis-TRC-12 Dataset Construction Overview: Authors Topic Author Editor ChatNoir SE ClueWeb API Revision log Query log ClueWeb 9 [ ] Michael Völske 2013

10 The Webis-TRC-12 Dataset Construction Overview: Authors Number of Essays Author ID Crowdsourcing: 27 total Professional writers hired on odesk + volunteers Fluent English speakers 10 [ ] Michael Völske 2013

11 The Webis-TRC-12 Dataset Construction Overview: Authors Number of Essays Author Demographics (n=12) Age (Median) 37 Years Writing (Median) 8 Academic degree Postgrad 33% Undergrad 25% English Native 67% Second Language 33% Author ID Crowdsourcing: 27 total Professional writers hired on odesk + volunteers Fluent English speakers 11 [ ] Michael Völske 2013

12 The Webis-TRC-12 Dataset Construction Overview: Sources Topic Author Editor ChatNoir SE ClueWeb API Revision log Query log ClueWeb 12 [ ] Michael Völske 2013

13 The Webis-TRC-12 Dataset Construction Overview: Sources ClueWeb09: 500 million English pages Representative sample of the web Commonly used in search engine evaluation (TREC) 13 [ ] Michael Völske 2013

14 The Webis-TRC-12 Dataset Construction Overview: Search Engine Topic Author Editor ChatNoir SE ClueWeb API Revision log Query log ClueWeb 14 [ ] Michael Völske 2013

15 The Webis-TRC-12 Dataset Construction Overview: Search Engine [Potthast et al. 2012a] Used for source retrieval Indexes ClueWeb Records fine-grained interaction log [example] 15 [ ] Michael Völske 2013

16 The Webis-TRC-12 Dataset Construction Overview: Editor Topic Author Editor ChatNoir SE ClueWeb API Revision log Query log ClueWeb 16 [ ] Michael Völske 2013

17 The Webis-TRC-12 Dataset Construction Overview: Editor [Potthast et al. 2012b] 17 [ ] Michael Völske 2013

18 The Webis-TRC-12 Dataset Construction Overview: Editor [Potthast et al. 2012b] Custom web-based rich text editor Records sources of re-used text passages New revision every 300ms of inactivity Detailed revision history [example] 18 [ ] Michael Völske 2013

19 The Webis-TRC-12 Dataset Three Main Data Sources Topic Author Editor ChatNoir SE ClueWeb API Revision log Query log ClueWeb 19 [ ] Michael Völske 2013

20 The Webis-TRC-12 Dataset Research Questions 20 [ ] Michael Völske 2013

21 The Webis-TRC-12 Dataset Research Questions 1. Different text reuse approaches distinguishable? 2. Relation to existing plagiarism categorizations? 3. Influence of text reuse task on search engine interaction? 4. Detectable via query log analysis? We expect new research insights and impacts to text reuse detection query formulation paraphrasing 21 [ ] Michael Völske 2013

22 The Webis-TRC-12 Dataset Research Questions 1. Different text reuse approaches distinguishable? 2. Relation to existing plagiarism categorizations? 3. Influence of text reuse task on search engine interaction? 4. Detectable via query log analysis? We expect new research insights and impacts to text reuse detection query formulation paraphrasing 22 [ ] Michael Völske 2013

23 The Webis-TRC-12 Dataset Research Questions 1. Different text reuse approaches distinguishable? 2. Relation to existing plagiarism categorizations? 3. Influence of text reuse task on search engine interaction? 4. Detectable via query log analysis? We expect new research insights and impacts to text reuse detection query formulation paraphrasing 23 [ ] Michael Völske 2013

24 Categorizing Crowdsourced Text Reuse 24 [ ] Michael Völske 2013

25 Categorizing Crowdsourced Text Reuse Build-Up Versus Boil-Down Reuse Build-up reuse (left) versus boil-down reuse (right). text length (y-axis) over text revision (x-axis) colors: different source documents (original text is white) blue dots: position of the writer s last edit Build-up: 45%; boil-down: 40%; mixed: 12% 25 [ ] Michael Völske 2013

26 Categorizing Crowdsourced Text Reuse Build-Up Versus Boil-Down Reuse Build-up reuse (left) versus boil-down reuse (right). text length (y-axis) over text revision (x-axis) colors: different source documents (original text is white) blue dots: position of the writer s last edit Build-up: 45%; boil-down: 40%; mixed: 12% 26 [ ] Michael Völske 2013

27 Categorizing Crowdsourced Text Reuse Build-Up Versus Boil-Down Reuse Author 6 (12 topics) Author 20 (9 topics) Author 21 (21 topics) Text length Text length Text length Edit number Edit number Edit number Build-up reuse: Averaged editing histories by authors. one author per plot gray lines: individual essays black line: average 27 [ ] Michael Völske 2013

28 Categorizing Crowdsourced Text Reuse Build-Up Versus Boil-Down Reuse Author 2 (66 topics) Author 7 (20 topics) Author 24 (27 topics) Text length Text length Text length Edit number Edit number Edit number Boil-down reuse: Averaged editing histories by authors. one author per plot gray lines: individual essays black line: average 28 [ ] Michael Völske 2013

29 Categorizing Crowdsourced Text Reuse Classification Scheme for Text Reuse Find-Replace Remix Clone, Ctrl-C Mashup Classification Scheme for Text Reuse. types of plagiarism as distinguished by Turnitin [Turnitin 2012] interpret text reuse (plagiarism) as a combination of two factors: paraphrasing and interleaving 29 [ ] Michael Völske 2013

30 Categorizing Crowdsourced Text Reuse Classification Scheme for Text Reuse high Find-Replace Remix Paraphrasing low Clone, Ctrl-C Mashup low Interleaving high Classification Scheme for Text Reuse. types of plagiarism as distinguished by Turnitin [Turnitin 2012] interpret text reuse (plagiarism) as a combination of two factors: paraphrasing and interleaving 30 [ ] Michael Völske 2013

31 Categorizing Crowdsourced Text Reuse Classification Scheme for Text Reuse high Paraphrasing Find-Replace Remix Quantify: N-Gram similarity and ratio of passages to sources [details] Measure for all essays low Clone, Ctrl-C low Interleaving Mashup high Hypothesis: will show evidence of authors individual text reuse styles Classification Scheme for Text Reuse. types of plagiarism as distinguished by Turnitin [Turnitin 2012] interpret text reuse (plagiarism) as a combination of two factors: paraphrasing and interleaving 31 [ ] Michael Völske 2013

32 Categorizing Crowdsourced Text Reuse Classification Scheme for Text Reuse high Find-Replace Remix Paraphrasing Clone, Ctrl-C Mashup low low Interleaving high Median 5-gram passage similarity (paraphrasing) Passages per source (interleaving) Author (topic count) A0 (2) A14 (1) A1 (3) A15 (3) A2 (66) A16 (1) A3 (1) A17 (33) A4 (1) A18 (32) A5 (37) A6 (11) A7 (20) A19 (7) A20 (8) A21 (21) A8 (1) A22 (1) A9 (1) A23 (1) A10 (12) A11 (1) A12 (1) A24 (27) A25 (1) A26 (1) A13 (1) Classification Scheme for Text Reuse. types of plagiarism as distinguished by Turnitin [Turnitin 2012] interpret text reuse (plagiarism) as a combination of two factors: paraphrasing and interleaving 32 [ ] Michael Völske 2013

33 Categorizing Crowdsourced Text Reuse Classification Scheme for Text Reuse Median 5-gram passage similarity (paraphrasing) Passages per source (interleaving) Author (essay count) A0 (2) A14 (1) A1 (3) A15 (3) A2 (66) A16 (1) A3 (1) A17 (33) A4 (1) A18 (32) A5 (37) A6 (11) A7 (20) A19 (7) A20 (8) A21 (21) A8 (1) A22 (1) A9 (1) A23 (1) A10 (12) A11 (1) A12 (1) A24 (27) A25 (1) A26 (1) A13 (1) 33 [ ] Michael Völske 2013

34 Categorizing Crowdsourced Text Reuse Classification Scheme for Text Reuse Median 5-gram passage similarity (paraphrasing) Passages per source (interleaving) Author (essay count) A0 (2) A14 (1) A1 (3) A15 (3) A2 (66) A16 (1) A3 (1) A17 (33) A4 (1) A18 (32) A5 (37) A6 (11) A7 (20) A19 (7) A20 (8) A21 (21) A8 (1) A22 (1) A9 (1) A23 (1) A10 (12) A11 (1) A12 (1) A24 (27) A25 (1) A26 (1) A13 (1) 34 [ ] Michael Völske 2013

35 Categorizing Crowdsourced Text Reuse Classification Scheme for Text Reuse Median 5-gram passage similarity (paraphrasing) Passages per source (interleaving) Author (essay count) A0 (2) A14 (1) A1 (3) A15 (3) A2 (66) A16 (1) A3 (1) A17 (33) A4 (1) A18 (32) A5 (37) A6 (11) A7 (20) A19 (7) A20 (8) A21 (21) A8 (1) A22 (1) A9 (1) A23 (1) A10 (12) A11 (1) A12 (1) A24 (27) A25 (1) A26 (1) A13 (1) 35 [ ] Michael Völske 2013

36 Categorizing Crowdsourced Text Reuse Classification Scheme for Text Reuse Median 5-gram passage similarity (paraphrasing) Passages per source (interleaving) Author (essay count) A0 (2) A14 (1) A1 (3) A15 (3) A2 (66) A16 (1) A3 (1) A17 (33) A4 (1) A18 (32) A5 (37) A6 (11) A7 (20) A19 (7) A20 (8) A21 (21) A8 (1) A22 (1) A9 (1) A23 (1) A10 (12) A11 (1) A12 (1) A24 (27) A25 (1) A26 (1) A13 (1) 36 [ ] Michael Völske 2013

37 Categorizing Crowdsourced Text Reuse Classification Scheme for Text Reuse Median 5-gram passage similarity (paraphrasing) Passages per source (interleaving) Author (essay count) A0 (2) A14 (1) A1 (3) A15 (3) A2 (66) A16 (1) A3 (1) A17 (33) A4 (1) A18 (32) A5 (37) A6 (11) A7 (20) A19 (7) A20 (8) A21 (21) A8 (1) A22 (1) A9 (1) A23 (1) A10 (12) A11 (1) A12 (1) A24 (27) A25 (1) A26 (1) A13 (1) 37 [ ] Michael Völske 2013

38 Search Missions For Source Retrieval 38 [ ] Michael Völske 2013

39 Search Missions For Source Retrieval Distribution of Queries Over Time A B C D E F Distribution of queries over time. fraction of posed queries (y-axis) over elapsed time (x-axis) between the first query until essay completion each cell represents one of 150 essays the numbers denote the total amount of posed queries the cells are sorted by area under the curve 39 [ ] Michael Völske 2013

40 Search Missions For Source Retrieval Distribution of Queries Over Time A B C D E F Distribution of queries over time. fraction of posed queries (y-axis) over elapsed time (x-axis) between the first query until essay completion each cell represents one of 150 essays the numbers denote the total amount of posed queries the cells are sorted by area under the curve 40 [ ] Michael Völske 2013

41 Search Missions For Source Retrieval Distribution of Queries Over Time Distribution of queries over time. fraction of posed queries (y-axis) over elapsed time (x-axis) between the first query until essay completion each cell represents one of 150 essays the numbers denote the total amount of posed queries the cells are sorted by area under the curve 41 [ ] Michael Völske 2013

42 Search Missions For Source Retrieval Correlation of Editing and Querying 1 Author 5 (18 topics) Author 20 (9 topics) Author 2 (33 topics) Author 24 (13 topics) Correlation of editing and querying behavior. averaged editing histories by authors [plots] distribution of queries over time [plots] 42 [ ] Michael Völske 2013

43 Summary 1. Novel quality of the Webis-TRC-12 dataset of crowdsourced text reuse 2. Evidence of two fundamental editing strategies: build-up & boil-down 3. New classification scheme for documents in a text reuse corpus 4. Relationship between editing behavior and search engine use Future Work 1. Interleaving and paraphrasing in the time dimension 2. Authors text reuse strategies across multiple documents 3. Paraphrasing study: track individual passages over time Thank you for your attention! 43 [ ] Michael Völske 2013

44 Summary 1. Novel quality of the Webis-TRC-12 dataset of crowdsourced text reuse 2. Evidence of two fundamental editing strategies: build-up & boil-down 3. New classification scheme for documents in a text reuse corpus 4. Relationship between editing behavior and search engine use Future Work 1. Interleaving and paraphrasing in the time dimension 2. Authors text reuse strategies across multiple documents 3. Paraphrasing study: track individual passages over time Thank you for your attention! 44 [ ] Michael Völske 2013

45 Summary 1. Novel quality of the Webis-TRC-12 dataset of crowdsourced text reuse 2. Evidence of two fundamental editing strategies: build-up & boil-down 3. New classification scheme for documents in a text reuse corpus 4. Relationship between editing behavior and search engine use Future Work 1. Interleaving and paraphrasing in the time dimension 2. Authors text reuse strategies across multiple documents 3. Paraphrasing study: track individual passages over time Thank you for your attention! 45 [ ] Michael Völske 2013

46

47 References [Potthast 2011] Martin Potthast. Technologies for Reusing Text from the Web. Dissertation, Bauhaus-Universität Weimar, December [Potthast et al. 2012a] Martin Potthast, Matthias Hagen, Benno Stein, Jan Graßegger, Maximilian Michel, Martin Tippmann, and Clement Welsch. ChatNoir: A Search Engine for the ClueWeb09 Corpus. SIGIR 2012, Portland, Oregon, August [Potthast et al. 2012b] Martin Potthast, Tim Gollub, Matthias Hagen, Jan Graßegger, Johannes Kiesel, Maximilian Michel, Arnd Oberländer, Martin Tippmann, Alberto Barrón-Cedeño, Parth Gupta, Paolo Rosso, and Benno Stein. Overview of the 4th International Competition on Plagiarism Detection. CLEF 2012 Evaluation Labs and Workshop Working Notes Papers, September [<]

48 Example topic: Obama s family. Write about President Barack Obama s family history, including genealogy, national origins, places and dates of birth, etc. Where did Barack Obama s parents and grandparents come from? Also include a brief biography of Obama s mother. [<]

49 Example topic: Obama s family. Write about President Barack Obama s family history, including genealogy, national origins, places and dates of birth, etc. Where did Barack Obama s parents and grandparents come from? Also include a brief biography of Obama s mother. Original topic 001 of the TREC Web Track 2009: Query. obama family tree Description. Find information on President Barack Obama s family history, including genealogy, national origins, places and dates of birth, etc. Sub-topic 1. Find the TIME magazine photo essay Barack Obama s Family Tree. Sub-topic 2. Where did Barack Obama s parents and grandparents come from? Sub-topic 3. Find biographical information on Barack Obama s mother. [<]

50 Type Rank Frequency Severity Clone 1 1 Exact copy of another author s work Mashup 2 3 A mix of material copied verbatim from several sources Ctrl-C 3 2 Significant portions of text copied from a single source Remix 4 9 Paraphrasing from several sources and making the content fit together seamlessly Recycle 5 5 Self-plagiarism Re-Tweet 6 10 Proper citation, but closely follows a single source Find-Replace 7 7 Near copy of a single source, with key phrases changed Aggregator 8 4 Proper citation, but (almost) no original work 404 Error 9 6 Citations to non-existent or inaccurate information about sources Hybrid 10 8 Combining properly cited sources with plagiarism in one paper [Turnitin 2012] [<]

51 Details: Paraphrasing & Interleaving Classification Scheme for Text Reuse high Find-Replace Remix How to quantify? Paraphrasing Measure at the passage level Passage: Block of text reused from the same source Paraphrasing: simple N-Gram similarity low Clone, Ctrl-C Mashup low Interleaving high Classification Scheme for Text Reuse. types of plagiarism as distinguished by Turnitin [Turnitin 2012] interpret text reuse (plagiarism) as a combination of two factors: paraphrasing and interleaving [<]

52 Details: Paraphrasing & Interleaving Classification Scheme for Text Reuse high Find-Replace Remix Three passages: Paraphrasing low Clone, Ctrl-C Mashup low Interleaving high Classification Scheme for Text Reuse. types of plagiarism as distinguished by Turnitin [Turnitin 2012] interpret text reuse (plagiarism) as a combination of two factors: paraphrasing and interleaving [<]

53 Details: Paraphrasing & Interleaving Classification Scheme for Text Reuse high Find-Replace Remix Paraphrasing: N-Gram similarity Paraphrasing low Clone, Ctrl-C Mashup low Interleaving high Classification Scheme for Text Reuse. types of plagiarism as distinguished by Turnitin [Turnitin 2012] interpret text reuse (plagiarism) as a combination of two factors: paraphrasing and interleaving [<]

54 Details: Paraphrasing & Interleaving Classification Scheme for Text Reuse high Find-Replace Remix Paraphrasing: N-Gram similarity Paraphrasing low Clone, Ctrl-C Mashup low Interleaving high Classification Scheme for Text Reuse. types of plagiarism as distinguished by Turnitin [Turnitin 2012] interpret text reuse (plagiarism) as a combination of two factors: paraphrasing and interleaving [<]

55 Details: Paraphrasing & Interleaving Classification Scheme for Text Reuse high Find-Replace Remix Paraphrasing: N-Gram similarity Paraphrasing low Clone, Ctrl-C Mashup low Interleaving high Classification Scheme for Text Reuse. types of plagiarism as distinguished by Turnitin [Turnitin 2012] interpret text reuse (plagiarism) as a combination of two factors: paraphrasing and interleaving [<]

56 Details: Paraphrasing & Interleaving Classification Scheme for Text Reuse high Paraphrasing Find-Replace Remix Paraphrasing: N-Gram similarity ϕ n (N c, N s ) := N c N s N c N c : N-Grams in the passage N s : N-Grams in the source low Clone, Ctrl-C Mashup We choose n = 5. low Interleaving high Classification Scheme for Text Reuse. types of plagiarism as distinguished by Turnitin [Turnitin 2012] interpret text reuse (plagiarism) as a combination of two factors: paraphrasing and interleaving [<]

57 Details: Paraphrasing & Interleaving Classification Scheme for Text Reuse high Paraphrasing Find-Replace Remix Interleaving: Passages per source C: passages S: sources pps(c, S) := C S low Clone, Ctrl-C Mashup low Interleaving high Classification Scheme for Text Reuse. types of plagiarism as distinguished by Turnitin [Turnitin 2012] interpret text reuse (plagiarism) as a combination of two factors: paraphrasing and interleaving [<]

Exploratory Search Missions for TREC Topics

Exploratory Search Missions for TREC Topics Exploratory Search Missions for TREC Topics EuroHCIR 2013 Dublin, 1 August 2013 Martin Potthast Matthias Hagen Michael Völske Benno Stein Bauhaus-Universität Weimar www.webis.de Dataset for studying text

More information

Overview of the 5th International Competition on Plagiarism Detection

Overview of the 5th International Competition on Plagiarism Detection Overview of the 5th International Competition on Plagiarism Detection Martin Potthast, Matthias Hagen, Tim Gollub, Martin Tippmann, Johannes Kiesel, and Benno Stein Bauhaus-Universität Weimar www.webis.de

More information

Crowdsourcing Interaction Logs to Understand Text Reuse from the Web

Crowdsourcing Interaction Logs to Understand Text Reuse from the Web In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1212-1221, August 2013. ACL. Crowdsourcing Interaction Logs to Understand Text Reuse

More information

Improving Synoptic Querying for Source Retrieval

Improving Synoptic Querying for Source Retrieval Improving Synoptic Querying for Source Retrieval Notebook for PAN at CLEF 2015 Šimon Suchomel and Michal Brandejs Faculty of Informatics, Masaryk University {suchomel,brandejs}@fi.muni.cz Abstract Source

More information

Overview of the 6th International Competition on Plagiarism Detection

Overview of the 6th International Competition on Plagiarism Detection Overview of the 6th International Competition on Plagiarism Detection Martin Potthast, Matthias Hagen, Anna Beyer, Matthias Busse, Martin Tippmann, and Benno Stein Bauhaus-Universität Weimar www.webis.de

More information

Diverse Queries and Feature Type Selection for Plagiarism Discovery

Diverse Queries and Feature Type Selection for Plagiarism Discovery Diverse Queries and Feature Type Selection for Plagiarism Discovery Notebook for PAN at CLEF 2013 Šimon Suchomel, Jan Kasprzak, and Michal Brandejs Faculty of Informatics, Masaryk University {suchomel,kas,brandejs}@fi.muni.cz

More information

Expanded N-Grams for Semantic Text Alignment

Expanded N-Grams for Semantic Text Alignment Expanded N-Grams for Semantic Text Alignment Notebook for PAN at CLEF 2014 Samira Abnar, Mostafa Dehghani, Hamed Zamani, and Azadeh Shakery School of ECE, College of Engineering, University of Tehran,

More information

Webis at the TREC 2012 Session Track

Webis at the TREC 2012 Session Track Webis at the TREC 2012 Session Track Extended Abstract for the Conference Notebook Matthias Hagen, Martin Potthast, Matthias Busse, Jakob Gomoll, Jannis Harder, and Benno Stein Bauhaus-Universität Weimar

More information

Hashing and Merging Heuristics for Text Reuse Detection

Hashing and Merging Heuristics for Text Reuse Detection Hashing and Merging Heuristics for Text Reuse Detection Notebook for PAN at CLEF-2014 Faisal Alvi 1, Mark Stevenson 2, Paul Clough 2 1 King Fahd University of Petroleum & Minerals, Saudi Arabia, 2 University

More information

Predicting Retrieval Success Based on Information Use for Writing Tasks

Predicting Retrieval Success Based on Information Use for Writing Tasks This is a post-print version of the article: Vakkari P., Völske M., Potthast M., Hagen M., Stein B. (2018) Predicting Retrieval Success Based on Information Use for Writing Tasks. In: Méndez E., Crestani

More information

Similarity Overlap Metric and Greedy String Tiling at PAN 2012: Plagiarism Detection

Similarity Overlap Metric and Greedy String Tiling at PAN 2012: Plagiarism Detection Similarity Overlap Metric and Greedy String Tiling at PAN 2012: Plagiarism Detection Notebook for PAN at CLEF 2012 Arun kumar Jayapal The University of Sheffield, UK arunkumar.jeyapal@gmail.com Abstract

More information

Supervised Ranking for Plagiarism Source Retrieval

Supervised Ranking for Plagiarism Source Retrieval Supervised Ranking for Plagiarism Source Retrieval Notebook for PAN at CLEF 2013 Kyle Williams, Hung-Hsuan Chen, and C. Lee Giles, Information Sciences and Technology Computer Science and Engineering Pennsylvania

More information

Supporting Scholarly Search with Keyqueries

Supporting Scholarly Search with Keyqueries Supporting Scholarly Search with Keyqueries Matthias Hagen Anna Beyer Tim Gollub Kristof Komlossy Benno Stein Bauhaus-Universität Weimar matthias.hagen@uni-weimar.de @matthias_hagen ECIR 2016 Padova, Italy

More information

Candidate Document Retrieval for Web-scale Text Reuse Detection

Candidate Document Retrieval for Web-scale Text Reuse Detection Candidate Document Retrieval for Web-scale Text Reuse Detection Matthias Hagen Benno Stein Bauhaus-Universität Weimar matthias.hagen@uni-weimar.de SPIRE 2011 Pisa, Italy October 19, 2011 Matthias Hagen,

More information

New Issues in Near-duplicate Detection

New Issues in Near-duplicate Detection New Issues in Near-duplicate Detection Martin Potthast and Benno Stein Bauhaus University Weimar Web Technology and Information Systems Motivation About 30% of the Web is redundant. [Fetterly 03, Broder

More information

Interpreting Turnitin Originality Reports

Interpreting Turnitin Originality Reports Interpreting Turnitin Originality Reports This guide will provide a brief introduction to using and interpreting Originality Reports in Turnitin via Canvas. Turnitin (TII) Originality reports allow you

More information

Three way search engine queries with multi-feature document comparison for plagiarism detection

Three way search engine queries with multi-feature document comparison for plagiarism detection Three way search engine queries with multi-feature document comparison for plagiarism detection Notebook for PAN at CLEF 2012 Šimon Suchomel, Jan Kasprzak, and Michal Brandejs Faculty of Informatics, Masaryk

More information

The Winning Approach to Text Alignment for Text Reuse Detection at PAN 2014

The Winning Approach to Text Alignment for Text Reuse Detection at PAN 2014 The Winning Approach to Text Alignment for Text Reuse Detection at PAN 2014 Notebook for PAN at CLEF 2014 Miguel A. Sanchez-Perez, Grigori Sidorov, Alexander Gelbukh Centro de Investigación en Computación,

More information

Plagiarism Detection. Lucia D. Krisnawati

Plagiarism Detection. Lucia D. Krisnawati Plagiarism Detection Lucia D. Krisnawati Overview Introduction to Automatic Plagiarism Detection (PD) External PD Intrinsic PD Evaluation Framework Have You Heard, Seen or Used These? The Available PD

More information

Author Masking through Translation

Author Masking through Translation Author Masking through Translation Notebook for PAN at CLEF 2016 Yashwant Keswani, Harsh Trivedi, Parth Mehta, and Prasenjit Majumder Dhirubhai Ambani Institute of Information and Communication Technology

More information

Detection of Text Plagiarism and Wikipedia Vandalism

Detection of Text Plagiarism and Wikipedia Vandalism Detection of Text Plagiarism and Wikipedia Vandalism Benno Stein Bauhaus-Universität Weimar www.webis.de Keynote at SEPLN, Valencia, 8. Sep. 2010 The webis Group Benno Stein Dennis Hoppe Maik Anderka Martin

More information

Author Verification: Exploring a Large set of Parameters using a Genetic Algorithm

Author Verification: Exploring a Large set of Parameters using a Genetic Algorithm Author Verification: Exploring a Large set of Parameters using a Genetic Algorithm Notebook for PAN at CLEF 2014 Erwan Moreau 1, Arun Jayapal 2, and Carl Vogel 3 1 moreaue@cs.tcd.ie 2 jayapala@cs.tcd.ie

More information

Classifying and Ranking Search Engine Results as Potential Sources of Plagiarism

Classifying and Ranking Search Engine Results as Potential Sources of Plagiarism Classifying and Ranking Search Engine Results as Potential Sources of Plagiarism Kyle Williams, Hung-Hsuan Chen, C. Lee Giles Information Sciences and Technology, Computer Science and Engineering Pennsylvania

More information

Dynamically Adjustable Approach through Obfuscation Type Recognition

Dynamically Adjustable Approach through Obfuscation Type Recognition Dynamically Adjustable Approach through Obfuscation Type Recognition Notebook for PAN at CLEF 2015 Miguel A. Sanchez-Perez, Alexander Gelbukh, and Grigori Sidorov Centro de Investigación en Computación,

More information

TE Teacher s Edition PE Pupil Edition Page 1

TE Teacher s Edition PE Pupil Edition Page 1 Standard 4 WRITING: Writing Process Students discuss, list, and graphically organize writing ideas. They write clear, coherent, and focused essays. Students progress through the stages of the writing process

More information

Question Answering Systems

Question Answering Systems Question Answering Systems An Introduction Potsdam, Germany, 14 July 2011 Saeedeh Momtazi Information Systems Group Outline 2 1 Introduction Outline 2 1 Introduction 2 History Outline 2 1 Introduction

More information

Contextual Information Retrieval Using Ontology-Based User Profiles

Contextual Information Retrieval Using Ontology-Based User Profiles Contextual Information Retrieval Using Ontology-Based User Profiles Vishnu Kanth Reddy Challam Master s Thesis Defense Date: Jan 22 nd, 2004. Committee Dr. Susan Gauch(Chair) Dr.David Andrews Dr. Jerzy

More information

*Note: To find a complete list of sources, click on the List of Sources link in the top portion of every Biography Resource Center search screen.

*Note: To find a complete list of sources, click on the List of Sources link in the top portion of every Biography Resource Center search screen. Biography Resource Center Navigation Guide OVERVIEW The Biography Resource Center (BioRC) is a comprehensive database of biographical information on over 380,000 people from throughout history, around

More information

Rishiraj Saha Roy and Niloy Ganguly IIT Kharagpur India. Monojit Choudhury and Srivatsan Laxman Microsoft Research India India

Rishiraj Saha Roy and Niloy Ganguly IIT Kharagpur India. Monojit Choudhury and Srivatsan Laxman Microsoft Research India India Rishiraj Saha Roy and Niloy Ganguly IIT Kharagpur India Monojit Choudhury and Srivatsan Laxman Microsoft Research India India ACM SIGIR 2012, Portland August 15, 2012 Dividing a query into individual semantic

More information

Query Session Detection as a Cascade

Query Session Detection as a Cascade Query Session Detection as a Cascade Extended Abstract Matthias Hagen, Benno Stein, and Tino Rüb Bauhaus-Universität Weimar .@uni-weimar.de Abstract We propose a cascading method

More information

Natural Language Processing. SoSe Question Answering

Natural Language Processing. SoSe Question Answering Natural Language Processing SoSe 2017 Question Answering Dr. Mariana Neves July 5th, 2017 Motivation Find small segments of text which answer users questions (http://start.csail.mit.edu/) 2 3 Motivation

More information

SacCT Creating a Turnitin Assignment How to Guide

SacCT Creating a Turnitin Assignment How to Guide SacCT Creating a Turnitin Assignment How to Guide HOW TO GUIDE WHAT IS A TURNITIN ASSIGNMENT CALIFORNIA STATE UNIVERSITY, SACRAMENTO Turnitin is an online plagiarism detection resource that is available

More information

Online Compiler with Plagiarism Checker

Online Compiler with Plagiarism Checker Volume 118 No. 18 2018, 1547-1555 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu Online Compiler with Plagiarism Checker Aarthi G.V. 1, a) 2, b),,

More information

code pattern analysis of object-oriented programming languages

code pattern analysis of object-oriented programming languages code pattern analysis of object-oriented programming languages by Xubo Miao A thesis submitted to the School of Computing in conformity with the requirements for the degree of Master of Science Queen s

More information

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners Data Mining 3.5 (Instance-Based Learners) Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction k-nearest-neighbor Classifiers References Introduction Introduction Lazy vs. eager learning Eager

More information

Social Behavior Prediction Through Reality Mining

Social Behavior Prediction Through Reality Mining Social Behavior Prediction Through Reality Mining Charlie Dagli, William Campbell, Clifford Weinstein Human Language Technology Group MIT Lincoln Laboratory This work was sponsored by the DDR&E / RRTO

More information

Research Question: Do people perform better on a math task after observing Impressionist art or modern art?

Research Question: Do people perform better on a math task after observing Impressionist art or modern art? MEDIA LAB TUTORIAL Research Question: Do people perform better on a math task after observing Impressionist art or modern art? All of the graphics you will use should be stored in a folder that you will

More information

Adaptive Algorithm for Plagiarism Detection: The Best-performing Approach at PAN 2014 Text Alignment Competition

Adaptive Algorithm for Plagiarism Detection: The Best-performing Approach at PAN 2014 Text Alignment Competition Adaptive Algorithm for Plagiarism Detection: The Best-performing Approach at PAN 2014 Text Alignment Competition Miguel A. Sanchez-Perez, Alexander Gelbukh, and Grigori Sidorov Centro de Investigación

More information

A Deep Relevance Matching Model for Ad-hoc Retrieval

A Deep Relevance Matching Model for Ad-hoc Retrieval A Deep Relevance Matching Model for Ad-hoc Retrieval Jiafeng Guo 1, Yixing Fan 1, Qingyao Ai 2, W. Bruce Croft 2 1 CAS Key Lab of Web Data Science and Technology, Institute of Computing Technology, Chinese

More information

Multi-label classification using rule-based classifier systems

Multi-label classification using rule-based classifier systems Multi-label classification using rule-based classifier systems Shabnam Nazmi (PhD candidate) Department of electrical and computer engineering North Carolina A&T state university Advisor: Dr. A. Homaifar

More information

Exploration of Fuzzy C Means Clustering Algorithm in External Plagiarism Detection System

Exploration of Fuzzy C Means Clustering Algorithm in External Plagiarism Detection System Exploration of Fuzzy C Means Clustering Algorithm in External Plagiarism Detection System N. Riya Ravi, K. Vani and Deepa Gupta Abstract With the advent of World Wide Web, plagiarism has become a prime

More information

Repurposing Benchmark Corpora for Reconstructing Provenance

Repurposing Benchmark Corpora for Reconstructing Provenance Repurposing Benchmark Corpora for Reconstructing Provenance Sara Magliacane, Paul Groth Department of Computer Science VU University Amsterdam s.magliacane@vu.nl, p.t.groth@vu.nl Abstract. Provenance is

More information

Alexander Calder Surrealist Sculptor. By Klara Schmitt

Alexander Calder Surrealist Sculptor. By Klara Schmitt Alexander Calder Surrealist Sculptor By Klara Schmitt Table of Contents 2 Creative Brief...3 Target Audience...4 Content Outline...6 Navigation Map...7 Wireframes and Design Comps...8 Styles...34 Credits...35

More information

The information here will help you avoid violating university policy about copying material written by others.

The information here will help you avoid violating university policy about copying material written by others. Blackboard and its plagiarism checking tool SafeAssign will fully retire throughout all components of UTHealth by fall 2016. At that point, all UTHealth courses will be taught in Canvas using Turnitin

More information

International Graduate School of Genetic and Molecular Epidemiology (GAME) Computing Notes and Introduction to Stata

International Graduate School of Genetic and Molecular Epidemiology (GAME) Computing Notes and Introduction to Stata International Graduate School of Genetic and Molecular Epidemiology (GAME) Computing Notes and Introduction to Stata Paul Dickman September 2003 1 A brief introduction to Stata Starting the Stata program

More information

Automatically Generating Queries for Prior Art Search

Automatically Generating Queries for Prior Art Search Automatically Generating Queries for Prior Art Search Erik Graf, Leif Azzopardi, Keith van Rijsbergen University of Glasgow {graf,leif,keith}@dcs.gla.ac.uk Abstract This report outlines our participation

More information

Deep Learning based Authorship Identification

Deep Learning based Authorship Identification Deep Learning based Authorship Identification Chen Qian Tianchang He Rao Zhang Department of Electrical Engineering Stanford University, Stanford, CA 94305 cqian23@stanford.edu th7@stanford.edu zhangrao@stanford.edu

More information

A RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH

A RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH A RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH A thesis Submitted to the faculty of the graduate school of the University of Minnesota by Vamshi Krishna Thotempudi In partial fulfillment of the requirements

More information

A Study of MatchPyramid Models on Ad hoc Retrieval

A Study of MatchPyramid Models on Ad hoc Retrieval A Study of MatchPyramid Models on Ad hoc Retrieval Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Xueqi Cheng Institute of Computing Technology, Chinese Academy of Sciences Text Matching Many text based

More information

Turnitin Instructor Guide

Turnitin Instructor Guide Table of Contents Turnitin Overview... 1 Recommended Steps... 2 IMPORTANT! Do NOT Copy Turnitin Assignments... 2 Browser Requirements... 2 Creating a Turnitin Assignment... 2 Editing a Turnitin Assignment...

More information

EVALUATION AND IMPLEMENTATION OF N -GRAM-BASED ALGORITHM FOR FAST TEXT COMPARISON

EVALUATION AND IMPLEMENTATION OF N -GRAM-BASED ALGORITHM FOR FAST TEXT COMPARISON Computing and Informatics, Vol. 36, 2017, 887 907, doi: 10.4149/cai 2017 4 887 EVALUATION AND IMPLEMENTATION OF N -GRAM-BASED ALGORITHM FOR FAST TEXT COMPARISON Maciej Wielgosz, Pawe l Szczepka, Pawe l

More information

Informativeness for Adhoc IR Evaluation:

Informativeness for Adhoc IR Evaluation: Informativeness for Adhoc IR Evaluation: A measure that prevents assessing individual documents Romain Deveaud 1, Véronique Moriceau 2, Josiane Mothe 3, and Eric SanJuan 1 1 LIA, Univ. Avignon, France,

More information

Fine-grained Software Version Control Based on a Program s Abstract Syntax Tree

Fine-grained Software Version Control Based on a Program s Abstract Syntax Tree Master Thesis Description and Schedule Fine-grained Software Version Control Based on a Program s Abstract Syntax Tree Martin Otth Supervisors: Prof. Dr. Peter Müller Dimitar Asenov Chair of Programming

More information

Lab Benchmarking Symposium Introduction. Alison Farmer

Lab Benchmarking Symposium Introduction. Alison Farmer Lab Benchmarking Symposium Introduction Alison Farmer Purpose of the Symposium Describe state of lab benchmarking today Introduce the I 2 SL Lab Benchmarking Working Group Reveal latest updates to Labs21

More information

The Labs21 Benchmarking Tool

The Labs21 Benchmarking Tool I 2 SL Tools for Sustainability Success: The Labs21 Benchmarking Tool Alison Farmer Outline Why we care about benchmarking The Labs21 Benchmarking Tool New upgrades by the I 2 SL Benchmarking Working Group

More information

Natural Language Processing SoSe Question Answering. (based on the slides of Dr. Saeedeh Momtazi)

Natural Language Processing SoSe Question Answering. (based on the slides of Dr. Saeedeh Momtazi) Natural Language Processing SoSe 2015 Question Answering Dr. Mariana Neves July 6th, 2015 (based on the slides of Dr. Saeedeh Momtazi) Outline 2 Introduction History QA Architecture Outline 3 Introduction

More information

Overview of Classification Subtask at NTCIR-6 Patent Retrieval Task

Overview of Classification Subtask at NTCIR-6 Patent Retrieval Task Overview of Classification Subtask at NTCIR-6 Patent Retrieval Task Makoto Iwayama *, Atsushi Fujii, Noriko Kando * Hitachi, Ltd., 1-280 Higashi-koigakubo, Kokubunji, Tokyo 185-8601, Japan makoto.iwayama.nw@hitachi.com

More information

Semantic Estimation for Texts in Software Engineering

Semantic Estimation for Texts in Software Engineering Semantic Estimation for Texts in Software Engineering 汇报人 : Reporter:Xiaochen Li Dalian University of Technology, China 大连理工大学 2016 年 11 月 29 日 Oscar Lab 2 Ph.D. candidate at OSCAR Lab, in Dalian University

More information

Introduction to Microsoft Office PowerPoint 2010

Introduction to Microsoft Office PowerPoint 2010 Introduction to Microsoft Office PowerPoint 2010 TABLE OF CONTENTS Open PowerPoint 2010... 1 About the Editing Screen... 1 Create a Title Slide... 6 Save Your Presentation... 6 Create a New Slide... 7

More information

A Study on Query Expansion with MeSH Terms and Elasticsearch. IMS Unipd at CLEF ehealth Task 3

A Study on Query Expansion with MeSH Terms and Elasticsearch. IMS Unipd at CLEF ehealth Task 3 A Study on Query Expansion with MeSH Terms and Elasticsearch. IMS Unipd at CLEF ehealth Task 3 Giorgio Maria Di Nunzio and Alexandru Moldovan Dept. of Information Engineering University of Padua giorgiomaria.dinunzio@unipd.it,alexandru.moldovan@studenti.unipd.it

More information

Scanning methods and language modeling for binary switch typing

Scanning methods and language modeling for binary switch typing Scanning methods and language modeling for binary switch typing Brian Roark, Jacques de Villiers, Christopher Gibbons and Melanie Fried-Oken Center for Spoken Language Understanding Child Development &

More information

Master Project. Various Aspects of Recommender Systems. Prof. Dr. Georg Lausen Dr. Michael Färber Anas Alzoghbi Victor Anthony Arrascue Ayala

Master Project. Various Aspects of Recommender Systems. Prof. Dr. Georg Lausen Dr. Michael Färber Anas Alzoghbi Victor Anthony Arrascue Ayala Master Project Various Aspects of Recommender Systems May 2nd, 2017 Master project SS17 Albert-Ludwigs-Universität Freiburg Prof. Dr. Georg Lausen Dr. Michael Färber Anas Alzoghbi Victor Anthony Arrascue

More information

QDA Miner. Addendum v2.0

QDA Miner. Addendum v2.0 QDA Miner Addendum v2.0 QDA Miner is an easy-to-use qualitative analysis software for coding, annotating, retrieving and reviewing coded data and documents such as open-ended responses, customer comments,

More information

Overview of the Patent Retrieval Task at the NTCIR-6 Workshop

Overview of the Patent Retrieval Task at the NTCIR-6 Workshop Overview of the Patent Retrieval Task at the NTCIR-6 Workshop Atsushi Fujii, Makoto Iwayama, Noriko Kando Graduate School of Library, Information and Media Studies University of Tsukuba 1-2 Kasuga, Tsukuba,

More information

MMC 4936 Web GIS for Journalists

MMC 4936 Web GIS for Journalists Florida International University FIU Digital Commons Course Syllabi Special Collections and University Archives Fall 2014 MMC 4936 Web GIS for Journalists Susan Jacobson Journalism and Mass Communications

More information

Control Invitation

Control Invitation Online Appendices Appendix A. Invitation Emails Control Invitation Email Subject: Reviewer Invitation from JPubE You are invited to review the above-mentioned manuscript for publication in the. The manuscript's

More information

TEXT CHAPTER 5. W. Bruce Croft BACKGROUND

TEXT CHAPTER 5. W. Bruce Croft BACKGROUND 41 CHAPTER 5 TEXT W. Bruce Croft BACKGROUND Much of the information in digital library or digital information organization applications is in the form of text. Even when the application focuses on multimedia

More information

NYU CSCI-GA Fall 2016

NYU CSCI-GA Fall 2016 1 / 45 Information Retrieval: Personalization Fernando Diaz Microsoft Research NYC November 7, 2016 2 / 45 Outline Introduction to Personalization Topic-Specific PageRank News Personalization Deciding

More information

Writing Reports with Report Designer and SSRS 2014 Level 1

Writing Reports with Report Designer and SSRS 2014 Level 1 Writing Reports with Report Designer and SSRS 2014 Level 1 Duration- 2days About this course In this 2-day course, students are introduced to the foundations of report writing with Microsoft SQL Server

More information

Get the most value from your surveys with text analysis

Get the most value from your surveys with text analysis SPSS Text Analysis for Surveys 3.0 Specifications Get the most value from your surveys with text analysis The words people use to answer a question tell you a lot about what they think and feel. That s

More information

Supplementary Material

Supplementary Material Supplementary Material Figure 1S: Scree plot of the 400 dimensional data. The Figure shows the 20 largest eigenvalues of the (normalized) correlation matrix sorted in decreasing order; the insert shows

More information

NVIvo 11 workshop. Hui Bian Office for Faculty Excellence

NVIvo 11 workshop. Hui Bian Office for Faculty Excellence 1 NVIvo 11 workshop Hui Bian Office for Faculty Excellence Contact information 2 Email: bianh@ecu.edu Phone: 328-5428 1001 Joyner Library, room 1006 Website: http://core.ecu.edu/ofe/st atisticsresearch/

More information

A Language Independent Author Verifier Using Fuzzy C-Means Clustering

A Language Independent Author Verifier Using Fuzzy C-Means Clustering A Language Independent Author Verifier Using Fuzzy C-Means Clustering Notebook for PAN at CLEF 2014 Pashutan Modaresi 1,2 and Philipp Gross 1 1 pressrelations GmbH, Düsseldorf, Germany {pashutan.modaresi,

More information

Geometric Techniques. Part 1. Example: Scatter Plot. Basic Idea: Scatterplots. Basic Idea. House data: Price and Number of bedrooms

Geometric Techniques. Part 1. Example: Scatter Plot. Basic Idea: Scatterplots. Basic Idea. House data: Price and Number of bedrooms Part 1 Geometric Techniques Scatterplots, Parallel Coordinates,... Geometric Techniques Basic Idea Visualization of Geometric Transformations and Projections of the Data Scatterplots [Cleveland 1993] Parallel

More information

Nick Terkay CSCI 7818 Web Services 11/16/2006

Nick Terkay CSCI 7818 Web Services 11/16/2006 Nick Terkay CSCI 7818 Web Services 11/16/2006 Ning? Start-up co-founded by Marc Andreeson, the co- founder of Netscape. October 2005 Ning is an online platform for painlessly creating web apps in a jiffy.

More information

Research Article. August 2017

Research Article. August 2017 International Journals of Advanced Research in Computer Science and Software Engineering ISSN: 2277-128X (Volume-7, Issue-8) a Research Article August 2017 English-Marathi Cross Language Information Retrieval

More information

Scientific Graphing in Excel 2013

Scientific Graphing in Excel 2013 Scientific Graphing in Excel 2013 When you start Excel, you will see the screen below. Various parts of the display are labelled in red, with arrows, to define the terms used in the remainder of this overview.

More information

Learning to Reweight Terms with Distributed Representations

Learning to Reweight Terms with Distributed Representations Learning to Reweight Terms with Distributed Representations School of Computer Science Carnegie Mellon University August 12, 215 Outline Goal: Assign weights to query terms for better retrieval results

More information

E-Marefa User Guide. "Arab Theses and Dissertations"

E-Marefa User Guide. Arab Theses and Dissertations E-Marefa User Guide "Arab Theses and Dissertations" Table of Contents What is E-Marefa Database.3 System Requirements 3 Inside this User Guide 3 Access to E-Marefa Database.....4 Choosing Database to Search.5

More information

MDSWriter: Annotation tool for creating high-quality. multi-document summarization corpora.

MDSWriter: Annotation tool for creating high-quality. multi-document summarization corpora. MDSWriter: Annotation tool for creating high-quality multi-document summarization corpora Christian M. Meyer, Darina Benikova, Margot Mieskes, and Iryna Gurevych Research Training Group AIPHES Ubiquitous

More information

QueryLines: Approximate Query for Visual Browsing

QueryLines: Approximate Query for Visual Browsing MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com QueryLines: Approximate Query for Visual Browsing Kathy Ryall, Neal Lesh, Tom Lanning, Darren Leigh, Hiroaki Miyashita and Shigeru Makino TR2005-015

More information

Information Search in Web Archives

Information Search in Web Archives Information Search in Web Archives Miguel Costa Advisor: Prof. Mário J. Silva Co-Advisor: Prof. Francisco Couto Department of Informatics, Faculty of Sciences, University of Lisbon PhD thesis defense,

More information

Introduction April 27 th 2016

Introduction April 27 th 2016 Social Web Mining Summer Term 2016 1 Introduction April 27 th 2016 Dr. Darko Obradovic Insiders Technologies GmbH Kaiserslautern d.obradovic@insiders-technologies.de Outline for Today 1.1 1.2 1.3 1.4 1.5

More information

Lab Activity #2- Statistics and Graphing

Lab Activity #2- Statistics and Graphing Lab Activity #2- Statistics and Graphing Graphical Representation of Data and the Use of Google Sheets : Scientists answer posed questions by performing experiments which provide information about a given

More information

Deep Character-Level Click-Through Rate Prediction for Sponsored Search

Deep Character-Level Click-Through Rate Prediction for Sponsored Search Deep Character-Level Click-Through Rate Prediction for Sponsored Search Bora Edizel - Phd Student UPF Amin Mantrach - Criteo Research Xiao Bai - Oath This work was done at Yahoo and will be presented as

More information

Query Heartbeat: A Strange Property of Keyword Queries on the Web

Query Heartbeat: A Strange Property of Keyword Queries on the Web Query Heartbeat: A Strange Property of Keyword Queries on the Web Karthik.B.R Aditya Ramana Rachakonda Dr. Srinath Srinivasa International Institute of Information Technology, Bangalore. December 16, 2008

More information

Practice Questions. AGENDA: Brief Library Overview + APA style ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Practice Questions. AGENDA: Brief Library Overview + APA style ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ RESEARCH SKILLS INSTRUCTION - Library Class Outline Centennial Libraries homepage http://library.centennialcollege.ca/ - Liz Dobson & Eric Wilding COMM 170-28 & 29 October 2015 2015 Sept AGENDA: Brief

More information

Extracting Information from Social Networks

Extracting Information from Social Networks Extracting Information from Social Networks Reminder: Social networks Catch-all term for social networking sites Facebook microblogging sites Twitter blog sites (for some purposes) 1 2 Ways we can use

More information

Aggregates Geometric Parameters

Aggregates Geometric Parameters Aggregates Geometric Parameters On-Line or Lab Measurement of Size and Shapes by 3D Image Analysis Terry Stauffer Application Note SL-AN-48 Revision A Provided By: Microtrac Particle Characterization Solutions

More information

Entity and Knowledge Base-oriented Information Retrieval

Entity and Knowledge Base-oriented Information Retrieval Entity and Knowledge Base-oriented Information Retrieval Presenter: Liuqing Li liuqing@vt.edu Digital Library Research Laboratory Virginia Polytechnic Institute and State University Blacksburg, VA 24061

More information

This research aims to present a new way of visualizing multi-dimensional data using generalized scatterplots by sensitivity coefficients to highlight

This research aims to present a new way of visualizing multi-dimensional data using generalized scatterplots by sensitivity coefficients to highlight This research aims to present a new way of visualizing multi-dimensional data using generalized scatterplots by sensitivity coefficients to highlight local variation of one variable with respect to another.

More information

EXCEL SPREADSHEET TUTORIAL

EXCEL SPREADSHEET TUTORIAL EXCEL SPREADSHEET TUTORIAL Note to all 200 level physics students: You will be expected to properly format data tables and graphs in all lab reports, as described in this tutorial. Therefore, you are responsible

More information

DELL EMC UNITY: DATA REDUCTION

DELL EMC UNITY: DATA REDUCTION DELL EMC UNITY: DATA REDUCTION Overview ABSTRACT This white paper is an introduction to the Dell EMC Unity Data Reduction feature. It provides an overview of the feature, methods for managing data reduction,

More information

Illinois State University Events Calendar

Illinois State University Events Calendar Illinois State University Events Calendar Adding an Event To add an event to the Illinois State University Events Calendar, visit Events.IllinoisState.edu and scroll to the footer to login. Click the Copyright

More information

TREC-7 Experiments at the University of Maryland Douglas W. Oard Digital Library Research Group College of Library and Information Services University

TREC-7 Experiments at the University of Maryland Douglas W. Oard Digital Library Research Group College of Library and Information Services University TREC-7 Experiments at the University of Maryland Douglas W. Oard Digital Library Research Group College of Library and Information Services University of Maryland, College Park, MD 20742 oard@glue.umd.edu

More information

HOLY SPIRIT UNIVERSITY OF KASLIK LIBRARY HELP GUIDES. USEK. Student Manual: Detecting and Deterring plagiarism in student work

HOLY SPIRIT UNIVERSITY OF KASLIK LIBRARY HELP GUIDES. USEK. Student Manual: Detecting and Deterring plagiarism in student work HOLY SPIRIT UNIVERSITY OF KASLIK LIBRARY HELP GUIDES Turnitin @ USEK Student Manual: Detecting and Deterring plagiarism in student work Updated September 14, 2018 OUTLINE 1. Definition of plagiarism 1

More information

Tilburg University. Authoritative re-ranking of search results Bogers, A.M.; van den Bosch, A. Published in: Advances in Information Retrieval

Tilburg University. Authoritative re-ranking of search results Bogers, A.M.; van den Bosch, A. Published in: Advances in Information Retrieval Tilburg University Authoritative re-ranking of search results Bogers, A.M.; van den Bosch, A. Published in: Advances in Information Retrieval Publication date: 2006 Link to publication Citation for published

More information

Query Session Detection as a Cascade

Query Session Detection as a Cascade Query Session Detection as a Cascade Matthias Hagen Benno Stein Tino Rüb Bauhaus-Universität Weimar matthias.hagen@uni-weimar.de CIKM 2011 Glasgow, Scotland October 25, 2011 Hagen, Stein, Rüb Query Session

More information

Canvas Student Guide. The Office of Online Learning Massasoit Community College

Canvas Student Guide. The Office of Online Learning Massasoit Community College Canvas Student Guide The Office of Online Learning Massasoit Community College www.massasoit.edu TABLE OF CONTENTS What is Canvas?... 1 Computer and Browser Requirements... 1 Mobile Support... 1 Accessing

More information

ResPubliQA 2010

ResPubliQA 2010 SZTAKI @ ResPubliQA 2010 David Mark Nemeskey Computer and Automation Research Institute, Hungarian Academy of Sciences, Budapest, Hungary (SZTAKI) Abstract. This paper summarizes the results of our first

More information