Studying the Impact of Text Summarization on Contextual Advertising

Similar documents
Experimenting Text Summarization Techniques for Contextual Advertising

Tansu Alpcan C. Bauckhage S. Agarwal

Extraction of Web Image Information: Semantic or Visual Cues?

Chapter 27 Introduction to Information Retrieval and Web Search

Chapter 6: Information Retrieval and Web Search. An introduction

Large Scale Chinese News Categorization. Peng Wang. Joint work with H. Zhang, B. Xu, H.W. Hao

On Identifying Disaster-Related Tweets: Matching-based or Learning based?

Introduction to Web Clustering

Chrome based Keyword Visualizer (under sparse text constraint) SANGHO SUH MOONSHIK KANG HOONHEE CHO

Video annotation based on adaptive annular spatial partition scheme

Hierarchical Link Analysis for Ranking Web Data

Fusing Document, Collection and Label Graph-based Representations with Word Embeddings for Text Classification. June 8, 2018

Learning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search

Multimedia Information Systems

Feature LDA: a Supervised Topic Model for Automatic Detection of Web API Documentations from the Web

Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval

Automatic Summarization

Contextual Information Retrieval Using Ontology-Based User Profiles

Weighted Suffix Tree Document Model for Web Documents Clustering

Automatic Cluster Number Selection using a Split and Merge K-Means Approach

Application of rough ensemble classifier to web services categorization and focused crawling

A Deep Relevance Matching Model for Ad-hoc Retrieval

Learning Similarity Metrics for Event Identification in Social Media. Hila Becker, Luis Gravano

Text Analytics (Text Mining)

Overview. Lab 2: Information Retrieval. Assignment Preparation. Data. .. Fall 2015 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar..

Reproducible & Transparent Computational Science with Galaxy. Jeremy Goecks The Galaxy Team

SIFT, BoW architecture and one-against-all Support vector machine

Text Analytics (Text Mining)

Outline. Structures for subject browsing. Subject browsing. Research issues. Renardus

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Domain-specific Concept-based Information Retrieval System

SHOT-BASED OBJECT RETRIEVAL FROM VIDEO WITH COMPRESSED FISHER VECTORS. Luca Bertinetto, Attilio Fiandrotti, Enrico Magli

CS490W. Text Clustering. Luo Si. Department of Computer Science Purdue University

Information Retrieval. (M&S Ch 15)

Information Retrieval and Web Search

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

Automated Online News Classification with Personalization

PTE : Predictive Text Embedding through Large-scale Heterogeneous Text Networks

Supervised classification of law area in the legal domain

A Framework for Summarization of Multi-topic Web Sites

Cluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University

A Document Graph Based Query Focused Multi- Document Summarizer

User Profiling for Interest-focused Browsing History

Extractive Text Summarization Techniques

WEB SPAM IDENTIFICATION THROUGH LANGUAGE MODEL ANALYSIS

Recognition. Topics that we will try to cover:

Approach Research of Keyword Extraction Based on Web Pages Document

Project Report on winter

Landmark Recognition: State-of-the-Art Methods in a Large-Scale Scenario

A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems

MODELLING DOCUMENT CATEGORIES BY EVOLUTIONARY LEARNING OF TEXT CENTROIDS

Three things everyone should know to improve object retrieval. Relja Arandjelović and Andrew Zisserman (CVPR 2012)

Automatic summarization of video data

Representation/Indexing (fig 1.2) IR models - overview (fig 2.1) IR models - vector space. Weighting TF*IDF. U s e r. T a s k s

Approaches to Mining the Web

Concept-Based Document Similarity Based on Suffix Tree Document

Ranking Error-Correcting Output Codes for Class Retrieval

Introduction to Information Retrieval

Fig 1. Overview of IE-based text mining framework

Information Retrieval: Retrieval Models

Query Expansion using Wikipedia and DBpedia

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

Today s topic CS347. Results list clustering example. Why cluster documents. Clustering documents. Lecture 8 May 7, 2001 Prabhakar Raghavan

Keywords Machine learning, Traffic classification, feature extraction, signature generation, cluster aggregation.

What is this Song About?: Identification of Keywords in Bollywood Lyrics

Exploiting Internal and External Semantics for the Clustering of Short Texts Using World Knowledge

Automated Application Signature Generation Using LASER and Cosine Similarity

Information Retrieval

Query Languages. Berlin Chen Reference: 1. Modern Information Retrieval, chapter 4

Published in: Proceedings of the 20th ACM international conference on Information and knowledge management

Social Search Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer

Clustering (COSC 488) Nazli Goharian. Document Clustering.

Chapter 2. Architecture of a Search Engine

Web Information Retrieval using WordNet

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 2013 ISSN:

Heading-aware Snippet Generation for Web Search

Open Research Online The Open University s repository of research publications and other research outputs

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

A Document-centered Approach to a Natural Language Music Search Engine

Tag-based Social Interest Discovery

Developing Focused Crawlers for Genre Specific Search Engines

60-538: Information Retrieval

A Family of Contextual Measures of Similarity between Distributions with Application to Image Retrieval

Information Retrieval. Information Retrieval and Web Search

Automatic Identification of User Goals in Web Search [WWW 05]

Link Recommendation Method Based on Web Content and Usage Mining

Focused crawling: a new approach to topic-specific Web resource discovery. Authors

Reducing Over-generation Errors for Automatic Keyphrase Extraction using Integer Linear Programming

CS47300: Web Information Search and Management

Impact of Term Weighting Schemes on Document Clustering A Review

Clustering Web Documents using Hierarchical Method for Efficient Cluster Formation

Clustering (COSC 416) Nazli Goharian. Document Clustering.

International Journal of Advanced Research in Computer Science and Software Engineering

2009 M. Elena Renda. A Personalized Information Search Assistant

Clustering Results. Result List Example. Clustering Results. Information Retrieval

Utilizing Probase in Open Directory Project-based Text Classification

Mining Web Data. Lijun Zhang

MURDOCH RESEARCH REPOSITORY

Keywords Data alignment, Data annotation, Web database, Search Result Record

Transcription:

Studying the Impact of Text Summarization on Contextual Advertising G. Armano, A. Giuliani, and E. Vargiu Intelligent Agents and Soft-Computing Group Dept. of Electrical and Electronic Engineering University of Cagliari, Italy email: alessandro.giuliani@diee.unica.it

Outline Online Advertising and Contextual Advertising Text Summarization Methods Evaluation The Adopted Contextual Advertising System Experimental Results Conclusions

Online Advertising Form of promotion that uses the WWW for delivering marketing messages to attract customers

Contextual Advertising Web Page Ad

Contextual Advertising

Contextual Advertising Task of placing ads within the content of a Web page Ad AdNetwork Network Ads Web page Users

Classic approaches High computation in data, resources, and time Offline advertising

Current approaches W W W highly dynamic! Real time advertising! Need of reducing data text summarization TEXT

Focus: Text Summarization Summaries: obtained by single or multiple documents preserve important information short

Focus: TS in CA Extract vs. abstract lists fragments of text vs. re-phrases content coherently in CA Extractive techniques!!! Low computation Reliance on single documents Simple summaries (but effective) Easier!

TS: Classic Approaches Kolcz's methods selection of meaningful paragraphs: Title First Paragraph First and Second Paragraphs First and Last Paragraphs Paragraph with most Keywords Paragraph with most Title Words

TS in CA: Our Proposal Input: HTML code need of additional features Adoption of title of the web page! Our techniques: Title and First Paragraph Title and First Two Paragraphs Title, First and Last Paragraphs Most Title Words and Keywords Paragraphs

TS: Experimental Results T F2P FLP MK MT TFP TFP TF2P TFLP MTK Precision 0.798 0.606 0.699 0.745 0.702 0.717 0.802 0.822 0.832 0.766 Recall 0.692 0.581 0.673 0.719 0.587 0.568 0.772 0.789 0.801 0.699 F-measure 0.729 0.593 0.686 0.732 0.639 0.634 0.787 0.805 0.816 0.731 Avg. Terms 3 13 24 24 25 15 16 27 26 34 Adding information about the title improves the performances Each novel method have better performances than classic methods The TFLP provides the best performance, as FLP does for the classic techniques

CA: The Adopted System Taxonomy Pre-processor Pre-processor Filtered HTML document Text summarizer Text summarizer Bag of words Classifier Classifier HTML document Bag of words Classification features

CA: The Adopted System Bag of words Classification features Matcher Matcher PAGE Bag of words Classification features Ads Repository

Preprocessor Taxonomy Pre-processor Filtered HTML document Text summarizer Bag of words Classifier HTML document Bag of words Text extraction, stop-words removing, stemming Classification features

Text Summarizer Taxonomy Pre-processor Filtered HTML document Text summarizer Bag of words Classifier HTML document Bag of words Summary of a web page Classification features

Text Summarizer Implemented for each analysed technique Output Vector-based representation of BoW (TF-IDF) Term 1 Summary Term 3 Term 2

Classifier Taxonomy Pre-processor Filtered HTML document Text summarizer Bag of words Classifier HTML document Bag of words Hierarchical classification Classification features

Classifier Semantic centroid-based classification Term 1 Summary Term 3 α Term 2 Class Centroid Output Vector-based representation of CF of page and ads

Matcher Bag of words Classification features PAGE Bag of words Matcher Matcher σ Score Classification features AD P, A = sim BoW P, A simcf P, A

The Impact of TS Adoption of each TS technique in the system and comparison Adopted Dataset (BankSearch)

Experimental results p@1 p@2 p@3 p@4 p@5 T 0.680 0.675 0.652 0.639 0.595 FP 0.488 0.476 0.448 0.421 0.391 F2P 0.613 0.609 0.588 0.558 0.514 FLP 0.674 0.653 0.617 0.582 0.546 MK 0.631 0.620 0.581 0.541 0.500 MT 0.640 0.617 0.610 0.586 0.547 TFP 0.744 0.707 0.691 0.669 0.637 TF2P 0.740 0.723 0.721 0.712 0.678 TFLP 0.768 0.750 0.729 0.701 0.663 MTK 0.711 0.698 0.685 0.663 0.608 Title leads to an improvement of performances

Experimental results TFLP 0,8 0,75 Precision 0,7 0,65 p@1 0,6 p@3 0,55 p@5 0,5 p@2 0,45 p@4 0,4 0,35 0 0,1 0,2 0,3 0,4 0,5 α 0,6 0,7 0,8 0,9 1

Conclusions We presented a comparative study on TS applied to CA. We propose several techniques that improve the classic methods We evaluated the corresponding CA systems. Experimental results confirm the intuition that adopting TS techniques allows to improve performances in term of precision. As for future directions, further experiments are under way, with the adoption of larger dataset extracted by DMOZ.

Thanks for your attention!