Social Tagging. Kristina Lerman. USC Information Sciences Institute Thanks to Anon Plangprasopchok for providing material for this lecture.

Similar documents
Kristina Lerman University of Southern California. This lecture is partly based on slides prepared by Anon Plangprasopchok

Craig A. Knoblock University of Southern California

Exploiting Social Annotation for Automatic Resource Discovery

Social Search Networks of People and Search Engines. CS6200 Information Retrieval

Functionality, Challenges and Architecture of Social Networks

Web Science Your Business, too!

Ning Frequently Asked Questions

Keywords. Collaborative tagging, knowledge discovery, Web 2.0, social bookmarking. 1. Introduction

@toread and Cool: Subjective, Affective and Associative Factors in Tagging

Source Modeling. Kristina Lerman University of Southern California. Based on slides by Mark Carman, Craig Knoblock and Jose Luis Ambite

Semantic Web and Web2.0. Dr Nicholas Gibbins

CHAPTER 2 OVERVIEW OF TAG RECOMMENDATION

Module 1: Internet Basics for Web Development (II)

Text Analytics (Text Mining)

ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 4. Prof. James She

Developing Focused Crawlers for Genre Specific Search Engines

Collaborative Tagging: A New Way of Defining Keywords to Access Web Resources

Semantic Web Systems Ontologies Jacques Fleuriot School of Informatics

Text Analytics (Text Mining)

Tags, Folksonomies, and How to Use Them in Libraries

arxiv: v1 [cs.ai] 24 May 2008

Ermergent Semantics in BibSonomy

MULTIMEDIA ANALYTICS: SYNERGY BETWEEN HUMAN AND MACHINE BY VISUALIZATION

Recommender Systems: Practical Aspects, Case Studies. Radek Pelánek

BaggTaming Learning from Wild and Tame Data

Internet Search. (COSC 488) Nazli Goharian Nazli Goharian, 2005, Outline

Discovering and Building Semantic Models of Web Sources

Tag-Based Contextual Collaborative Filtering

Tag-Based Contextual Collaborative Filtering

How Social Is Social Bookmarking?

PERSONALIZED TAG RECOMMENDATION

Exploring Social Annotations for Web Document Classification

Search Computing: Business Areas, Research and Socio-Economic Challenges

arxiv: v1 [cs.ir] 26 Feb 2011

Classifying Users and Identifying User Interests in Folksonomies

Online Communication. Chat Rooms Instant Messaging Blogging Social Media

SOCIAL MEDIA. Charles Murphy

Master Project. Various Aspects of Recommender Systems. Prof. Dr. Georg Lausen Dr. Michael Färber Anas Alzoghbi Victor Anthony Arrascue Ayala

Topic Classification in Social Media using Metadata from Hyperlinked Objects

SI Networks: Theory and Application, Fall 2008

Understanding the user: Personomy translation for tag recommendation

Tagging tagging. Analysing user keywords in scientific bibliography management systems

Open Research Online The Open University s repository of research publications and other research outputs

Katrin Weller & Isabella Peters

Chapter 10. Social Search

SEO: SEARCH ENGINE OPTIMISATION

Online Communication. Chat Rooms Instant Messaging Blogging Social Media

Springer Science+ Business, LLC

Digital Marketing Proposal

What s an SEO Strategy With Out Social Media?

Folksonomy and Controlled Vocabulary in LibraryThing

TGI Modules for Social Tagging System

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 2013 ISSN:

TEXT ANALYTICS IN SOCIAL MEDIA

Search Engines. Charles Severance

<is web> Information Systems & Semantic Web University of Koblenz Landau, Germany

Collaborative Creation of Communal Hierarchical Taxonomies in Social Tagging Systems

1. Technology Survey. 1. Please provide the following information regarding your current teaching responsibilities. Page 1

Build Your Own SEO Campaign and Options Pricing Guide - Build Your Own SEO

arxiv:cs/ v1 [cs.hc] 7 Dec 2006

SIR Area 2 Computer & Technology Group. All Things. By Neil Schmidt January 21, CAT Web Site: a2cat.sirinc2.org

Telling Experts from Spammers Expertise Ranking in Folksonomies

Natural Language Processing with PoolParty

ANNUAL REPORT Visit us at project.eu Supported by. Mission

Web Search Algorithms - 1 -

Gary Viray Founder, Search Opt Media Inc. Search.Rank.Convert.

Feature LDA: a Supervised Topic Model for Automatic Detection of Web API Documentations from the Web

Topological Tree Clustering of Social Network Search Results

Collaborative Tagging: Providing User Created Organizational Structure for Web 2.0

Activity: Google. Activity #1: Playground. Search Engine Optimization Google Results Organic vs. Paid. SEO = Search Engine Optimization

Chapter 1 AN INTRODUCTION TO TEXT MINING. 1. Introduction. Charu C. Aggarwal. ChengXiang Zhai

Information retrieval

Agenda. Web 2.0: User Generated Metadata. Why metadata? Introduction. Why metadata? Example for Annotations

Sloppy Tags and Metacrap? Quality of User Contributed Tags in Collaborative Social Tagging Systems

Chapter 27 Introduction to Information Retrieval and Web Search

Personalizing Image Search Results on Flickr

Final Project: On The Use and Abuse of Collaborative Tagging Data

Big Data Analytics CSCI 4030

Reading group on Ontologies and NLP:

RSDC 09: Tag Recommendation Using Keywords and Association Rules

FACILITATING VIDEO SOCIAL MEDIA SEARCH USING SOCIAL-DRIVEN TAGS COMPUTING

arxiv: v1 [cs.dl] 23 Feb 2012

WEB SPAM IDENTIFICATION THROUGH LANGUAGE MODEL ANALYSIS

Tag Recommendation for Photos

Representation/Indexing (fig 1.2) IR models - overview (fig 2.1) IR models - vector space. Weighting TF*IDF. U s e r. T a s k s

Mining the Web 2.0 to improve Search

Setting up Flickr. Open a web browser such as Internet Explorer and type this url in the address bar.

Inside vs. Outside. Inside the Box What the computer owner actually has possession of 1/18/2011

MEMA. Memory Management for Museum Exhibitions. Independent Study Report 2970 Fall 2011

Mining Web Data. Lijun Zhang

Introduction to Web 2.0 Data Mashups

<is web> Information Systems & Semantic Web University of Koblenz Landau, Germany

Social Media and Web 2.0. The Social Media and Web 2.0 webinar will begin shortly.

Part I: Data Mining Foundations

Information Retrieval

Computer-gestützte Interaktion. Vorlesung: Information Retrieval 2.

Web 2.0 and Beyond. A look at what Web 2.0 is and what it's good for. By: Derek Hildreth

Semantics to the Bookmarks: A Review of Social Semantic Bookmarking Systems

Seek and Ye shall Find

Storify Tutorial. Click join now! and create an account using your Facebook, Twitter, or address.

Transcription:

Social Tagging Kristina Lerman USC Information Sciences Institute Thanks to Anon Plangprasopchok for providing material for this lecture.

essembly Bugzilla delicious Social Web

essembly Bugzilla Social Web is a platform for people to create, organize, and share information delicious

Create Information People create content (resources) Text posts: blogs, Twitter, Images: Flickr, Picasa, Videos: YouTube, Vimeo, News stories: Digg, Reddit, Slashdot, Bookmarks: Delicious, CiteULike, Bibsonomy, Personal profiles: Facebook, MySpace, Maps: OpenStreetMaps, Locations: FourSquare,

Organize Information People organize resources Annotate with metadata tags: descriptive labels geotags: geographic coordinates Add to folders: organize content within personal hierarchies E.g., sets and collections on Flickr Other types of metadata may include Discussions, comments, reviews Ratings, votes, Social Tagging most popular form of annotation

Social Tagging: Delicious Content (webpage) User Tags

Social Tagging: Flickr submitter + + + + + + + + + + Mackay May 2008 (Set) Birds (Set) Birds (Pool) Canberra (Pool) Field Guide: Birds of the World (Pool) Birds, Birds, Birds (Pool) BIRDPIX (3/day) (Pool) Australian Birds (Pool) Birds Kingfishers, Pittas, and Bee-eaters (Pool) Birds of Queensland (Pool) private albums public groups Rainbow bee-eater Merops ornatus Australia Queensland Mackay Gardens tags discussion

Share Information People share resources Social networks: broadcast to social connections Friends on Facebook, Fans/Followers on Twitter, Digg, Groups affiliations Hotlists: emerge from collective activity E.g., Digg front page, Flickr Explore, Flickr Trends

Social Networks: Facebook

Social Networks: Flickr

Harvesting Knowledge from Social Tagging Resources R R graph: PageRank Resource (web page) User Tags Resource (photo) User Users U U graph: Social network analysis Tags R U T hypergraph: Harvesting knowledge from social tagging Tags

Overview Harvesting knowledge from social tagging Structure of Collaborative Tagging Systems Statistics of tagging activity Consensus about meaning of document quickly emerges from the opinions of many users Exploiting Social Annotation for Automatic Resource Discovery Learn hidden topics in a collection of tagged documents Use hidden topics to find relevant documents

Social Tagging Tags are labels attached to content Chosen from an uncontrolled personal vocabulary Help users to more efficiently Browse Filter Search information Collaborative/social tagging Anyone can attach labels to resources (not only experts or producers of content) Collectively, tags represent a semantic annotation of a resource (alternative to Semantic Web)

Tagging and Taxonomies Taxonomy hierarchical, exclusive organization of objects Linnaean classification felidae panthera tiger felidae felis cat File system: articles about cats in Africa c:\articles\cats c:\articles\africa c:\articles\africa\cats c:\articles\cats\africa Tagging non-hierarchical, inclusive organization of objects Articles tagged cat, africa africa cats cats AND africa Search multiple folders to find all relevant content But, will not find articles tagged with cheetah

Kinds of Tags What content is about (topic) identify who or what document is about: cat, africa What it is what kind of thing it is: article, blog, book Who owns it who owns/created content: nikographer Refining categories refine or qualify categories, especially numbers Identify qualities or characteristics express opinion: funny, interesting Self-reference mystuff Task organizing toread, jobsearch

Social Tagging Dimensions Tagging rights: who can tag? Self-tagging only resource owner (blog posts, Flickr by convention) Free-for-all anyone can tag a resource (Delicious) Consolidation: assisted tag generation? Blind tagging user enters tags independently of other users Suggestive tagging system suggests tags based on annotations of other users Resource type Text Web pages, blog posts, bibliographic material, Multimedia images, videos, Source of content User-owned e.g., images on Flickr Scavenged from the Web e.g., Delicious Connectivity: links between users Reciprocity undirected links (Facebook) vs directed (Flickr, Delicious) Link type friend relationship vs contact (on Flickr) shows degree of trust

User Motivations What are users motivations to tag? Organizational Mark items for future personal retrieval Social Mark items for others to find, e.g., concert photos on Flickr Can result in spamming Express opinion, e.g., funny tag on video Collective value emerges from tagging decisions of individual users How can users be incentivized to contribute high quality annotations?

Social Tagging on del.icio.us Social bookmarking site del.icio.us Users can tag any Web page (URL) Delicious suggests tags based on existing tags for the URL Delicious aggregates popular tags Anyone can see bookmarks of others Users can create social links Value of social tagging Users bookmark for their own benefit Organization Retrieval Useful public good emerges Tag suggestions List of popular URLs and tags (hotlists)

Tagging on del.icio.us Content (webpage) User Tags

Dynamics of del.icio.us Delicious dynamics [Golder & Huberman] User activity Tag vocabulary growth Datasets Bookmarks collected over 4 days in June 2005 Sample of users who posted bookmarks in this period

Dynamics of User Interests Tags reflect how user s interests and knowledge change in time Tag1 and Tag2 are consistent interests of the user Tag3 is new interest Or a new way to differentiate between concepts/interests Times tag has been used tag1 tag2 tag3 bookmark

Stable Patterns in Tagging Consider a single URL As it is tagged by more users Each tag s proportion represents the combined description of the URL by many users After ~100 bookmarks, relative frequency of each tag is fixed Tag proportion (wrt all tags) Number of bookmarks for URL

Findings Consensus about a URL s topics Emerges quickly- after ~100 users bookmark it URLs do not have to become popular for tags to be useful Minority opinions can stably coexist with popular ones Can be used to categorize/organize URLs Reasons for consensus Imitation users imitate tag selection of others But, stable patterns also exist for less common tags (not shown to users) Shared knowledge Can we learn it?

Learning from Social Tagging/Annotation Annotations by an individual user may be inaccurate and incomplete Annotations from many different users may complement each other, making them meaningful in aggregate Goal: Learn concepts from social annotations created by many users

Learning Concepts from Tags By sparky2000 Jaguar =? By A lion Rohrs Animal Car

Goal of Learning Algorithm Resources Animal Car Tags? Flower Group semantically related tags and resources A group ~ A concept

Challenges in Learning from Annotations Sparse data 4-7 tags per bookmark; 3.74 tags per photo [Rattenbury07+] Ambiguity jaguar: car vs. animal Polysemy window: hole in a wall vs. glass pane that resides in it Synonymy kid vs. child Disagreement cats\africa vs. africa\cats Different Levels of Specificity Dog vs. Beagle Multiple facets Bird tagged by appearance, location, scientific/colloquial name

Document Modeling Approaches Bag-of-words tf-idf Document as a vector of word frequencies Small reduction in document description length Does not handle synonymy and polysemy Latent semantic indexing - LSI Identifies subspace of tf-idf that captures most of the variance in a corpus Reduction in document description length (# principal components) Handles polysemy and synonymy Topic modeling plsi, LDA Documents as random mixtures over (hidden) topics, where each topic is a distribution over words Large reduction in description length (# topics) Inference Given a document corpus, estimate parameters of the model Compute distribution of hidden topics given the document

A Stochastic Process of Word Generation plsi (Hofmann99); LDA (Blei03+) Document (r) Topics (z) Possible Words Possible Topics Generated words (t)

Learned Topics Possible Words Possible Topics High probability words in each topic: travel, flights, airline, flight, airlines, guide, aviation, map, maps, world, earth, latitude, longitude, directions, address, geography, distance, zip, usa, gmaps, atlas, video, download, bittorrent, p2p, youtube, media, torrent, torrents, movies,

Apply LDA to Tagging Resource (document) Animal Car Tags (words) LDA Flower

Application to Resource Discovery Resource discovery Given a seed source, find other data sources that provide the same functionality Benefits e.g., find geocoders like http://geocoder.us, which returns geographic coordinates of a specified US address Increase robustness of II applications If http://geocoder.us fails, substitute with another source Increase coverage of II applications http://geocoder.ca geocodes US AND Canadian addresses

Source Discovery and Modeling [Ambite et al, 2009] discovery unisys anotherws Invocation & extraction Seed URL Background knowledge sample input values 90254 unisys http://wunderground.com unisys(zip,temp, ) :-weather(zip,,temp,hi,lo) source modeling definition of known sources sample values patterns domain types unisys(zip,temp,humidity, ) semantic typing

Exploiting Social Annotation for Resource Discovery Approach: Use topic modeling of social annotation obtained from Delicious to find sources similar to a given seed URL Seed URL Candidates Obtain Annotation corpus from Delicious Users URLs Tags Rank by Similarity To seed Probabilistic Learning Model URL s distribution over concepts Compute URL Similarity e.g., LDA, to learn concepts

Corpus of Annotated Resources Crawling strategy For each seed, retrieve the 20 popular tags For each tag, retrieve sources annotated with same tag For each source, retrieve all tags

Topic Modeling of Social Annotations Use LDA to learn 80 topics in each corpus Distributions over topics is used to compute similarity of target URL to seed

Source Discovery Results Manually label top 100 ranked URLs by similarity to seed URL Compare to Google s find similar URLs functionality

Source Discovery Results

Discussion Users express their knowledge through the tags they create while annotating content Apply document modeling techniques to social annotations data Infer hidden topics in annotated data Use topics for source discovery task Outperforms standard Web search Next Extract more complex types of knowledge from social annotations Sentiment Folksonomies