Social Tagging Kristina Lerman USC Information Sciences Institute Thanks to Anon Plangprasopchok for providing material for this lecture.
essembly Bugzilla delicious Social Web
essembly Bugzilla Social Web is a platform for people to create, organize, and share information delicious
Create Information People create content (resources) Text posts: blogs, Twitter, Images: Flickr, Picasa, Videos: YouTube, Vimeo, News stories: Digg, Reddit, Slashdot, Bookmarks: Delicious, CiteULike, Bibsonomy, Personal profiles: Facebook, MySpace, Maps: OpenStreetMaps, Locations: FourSquare,
Organize Information People organize resources Annotate with metadata tags: descriptive labels geotags: geographic coordinates Add to folders: organize content within personal hierarchies E.g., sets and collections on Flickr Other types of metadata may include Discussions, comments, reviews Ratings, votes, Social Tagging most popular form of annotation
Social Tagging: Delicious Content (webpage) User Tags
Social Tagging: Flickr submitter + + + + + + + + + + Mackay May 2008 (Set) Birds (Set) Birds (Pool) Canberra (Pool) Field Guide: Birds of the World (Pool) Birds, Birds, Birds (Pool) BIRDPIX (3/day) (Pool) Australian Birds (Pool) Birds Kingfishers, Pittas, and Bee-eaters (Pool) Birds of Queensland (Pool) private albums public groups Rainbow bee-eater Merops ornatus Australia Queensland Mackay Gardens tags discussion
Share Information People share resources Social networks: broadcast to social connections Friends on Facebook, Fans/Followers on Twitter, Digg, Groups affiliations Hotlists: emerge from collective activity E.g., Digg front page, Flickr Explore, Flickr Trends
Social Networks: Facebook
Social Networks: Flickr
Harvesting Knowledge from Social Tagging Resources R R graph: PageRank Resource (web page) User Tags Resource (photo) User Users U U graph: Social network analysis Tags R U T hypergraph: Harvesting knowledge from social tagging Tags
Overview Harvesting knowledge from social tagging Structure of Collaborative Tagging Systems Statistics of tagging activity Consensus about meaning of document quickly emerges from the opinions of many users Exploiting Social Annotation for Automatic Resource Discovery Learn hidden topics in a collection of tagged documents Use hidden topics to find relevant documents
Social Tagging Tags are labels attached to content Chosen from an uncontrolled personal vocabulary Help users to more efficiently Browse Filter Search information Collaborative/social tagging Anyone can attach labels to resources (not only experts or producers of content) Collectively, tags represent a semantic annotation of a resource (alternative to Semantic Web)
Tagging and Taxonomies Taxonomy hierarchical, exclusive organization of objects Linnaean classification felidae panthera tiger felidae felis cat File system: articles about cats in Africa c:\articles\cats c:\articles\africa c:\articles\africa\cats c:\articles\cats\africa Tagging non-hierarchical, inclusive organization of objects Articles tagged cat, africa africa cats cats AND africa Search multiple folders to find all relevant content But, will not find articles tagged with cheetah
Kinds of Tags What content is about (topic) identify who or what document is about: cat, africa What it is what kind of thing it is: article, blog, book Who owns it who owns/created content: nikographer Refining categories refine or qualify categories, especially numbers Identify qualities or characteristics express opinion: funny, interesting Self-reference mystuff Task organizing toread, jobsearch
Social Tagging Dimensions Tagging rights: who can tag? Self-tagging only resource owner (blog posts, Flickr by convention) Free-for-all anyone can tag a resource (Delicious) Consolidation: assisted tag generation? Blind tagging user enters tags independently of other users Suggestive tagging system suggests tags based on annotations of other users Resource type Text Web pages, blog posts, bibliographic material, Multimedia images, videos, Source of content User-owned e.g., images on Flickr Scavenged from the Web e.g., Delicious Connectivity: links between users Reciprocity undirected links (Facebook) vs directed (Flickr, Delicious) Link type friend relationship vs contact (on Flickr) shows degree of trust
User Motivations What are users motivations to tag? Organizational Mark items for future personal retrieval Social Mark items for others to find, e.g., concert photos on Flickr Can result in spamming Express opinion, e.g., funny tag on video Collective value emerges from tagging decisions of individual users How can users be incentivized to contribute high quality annotations?
Social Tagging on del.icio.us Social bookmarking site del.icio.us Users can tag any Web page (URL) Delicious suggests tags based on existing tags for the URL Delicious aggregates popular tags Anyone can see bookmarks of others Users can create social links Value of social tagging Users bookmark for their own benefit Organization Retrieval Useful public good emerges Tag suggestions List of popular URLs and tags (hotlists)
Tagging on del.icio.us Content (webpage) User Tags
Dynamics of del.icio.us Delicious dynamics [Golder & Huberman] User activity Tag vocabulary growth Datasets Bookmarks collected over 4 days in June 2005 Sample of users who posted bookmarks in this period
Dynamics of User Interests Tags reflect how user s interests and knowledge change in time Tag1 and Tag2 are consistent interests of the user Tag3 is new interest Or a new way to differentiate between concepts/interests Times tag has been used tag1 tag2 tag3 bookmark
Stable Patterns in Tagging Consider a single URL As it is tagged by more users Each tag s proportion represents the combined description of the URL by many users After ~100 bookmarks, relative frequency of each tag is fixed Tag proportion (wrt all tags) Number of bookmarks for URL
Findings Consensus about a URL s topics Emerges quickly- after ~100 users bookmark it URLs do not have to become popular for tags to be useful Minority opinions can stably coexist with popular ones Can be used to categorize/organize URLs Reasons for consensus Imitation users imitate tag selection of others But, stable patterns also exist for less common tags (not shown to users) Shared knowledge Can we learn it?
Learning from Social Tagging/Annotation Annotations by an individual user may be inaccurate and incomplete Annotations from many different users may complement each other, making them meaningful in aggregate Goal: Learn concepts from social annotations created by many users
Learning Concepts from Tags By sparky2000 Jaguar =? By A lion Rohrs Animal Car
Goal of Learning Algorithm Resources Animal Car Tags? Flower Group semantically related tags and resources A group ~ A concept
Challenges in Learning from Annotations Sparse data 4-7 tags per bookmark; 3.74 tags per photo [Rattenbury07+] Ambiguity jaguar: car vs. animal Polysemy window: hole in a wall vs. glass pane that resides in it Synonymy kid vs. child Disagreement cats\africa vs. africa\cats Different Levels of Specificity Dog vs. Beagle Multiple facets Bird tagged by appearance, location, scientific/colloquial name
Document Modeling Approaches Bag-of-words tf-idf Document as a vector of word frequencies Small reduction in document description length Does not handle synonymy and polysemy Latent semantic indexing - LSI Identifies subspace of tf-idf that captures most of the variance in a corpus Reduction in document description length (# principal components) Handles polysemy and synonymy Topic modeling plsi, LDA Documents as random mixtures over (hidden) topics, where each topic is a distribution over words Large reduction in description length (# topics) Inference Given a document corpus, estimate parameters of the model Compute distribution of hidden topics given the document
A Stochastic Process of Word Generation plsi (Hofmann99); LDA (Blei03+) Document (r) Topics (z) Possible Words Possible Topics Generated words (t)
Learned Topics Possible Words Possible Topics High probability words in each topic: travel, flights, airline, flight, airlines, guide, aviation, map, maps, world, earth, latitude, longitude, directions, address, geography, distance, zip, usa, gmaps, atlas, video, download, bittorrent, p2p, youtube, media, torrent, torrents, movies,
Apply LDA to Tagging Resource (document) Animal Car Tags (words) LDA Flower
Application to Resource Discovery Resource discovery Given a seed source, find other data sources that provide the same functionality Benefits e.g., find geocoders like http://geocoder.us, which returns geographic coordinates of a specified US address Increase robustness of II applications If http://geocoder.us fails, substitute with another source Increase coverage of II applications http://geocoder.ca geocodes US AND Canadian addresses
Source Discovery and Modeling [Ambite et al, 2009] discovery unisys anotherws Invocation & extraction Seed URL Background knowledge sample input values 90254 unisys http://wunderground.com unisys(zip,temp, ) :-weather(zip,,temp,hi,lo) source modeling definition of known sources sample values patterns domain types unisys(zip,temp,humidity, ) semantic typing
Exploiting Social Annotation for Resource Discovery Approach: Use topic modeling of social annotation obtained from Delicious to find sources similar to a given seed URL Seed URL Candidates Obtain Annotation corpus from Delicious Users URLs Tags Rank by Similarity To seed Probabilistic Learning Model URL s distribution over concepts Compute URL Similarity e.g., LDA, to learn concepts
Corpus of Annotated Resources Crawling strategy For each seed, retrieve the 20 popular tags For each tag, retrieve sources annotated with same tag For each source, retrieve all tags
Topic Modeling of Social Annotations Use LDA to learn 80 topics in each corpus Distributions over topics is used to compute similarity of target URL to seed
Source Discovery Results Manually label top 100 ranked URLs by similarity to seed URL Compare to Google s find similar URLs functionality
Source Discovery Results
Discussion Users express their knowledge through the tags they create while annotating content Apply document modeling techniques to social annotations data Infer hidden topics in annotated data Use topics for source discovery task Outperforms standard Web search Next Extract more complex types of knowledge from social annotations Sentiment Folksonomies