Searching for Familiar Strangers on Blogosphere: Problems and Challenges. N. Agarwal, H. Liu, J. Salerno, and P. Yu

Similar documents
CS 345A Data Mining Lecture 1. Introduction to Web Mining

Functionality, Challenges and Architecture of Social Networks

Chapter 3: Google Penguin, Panda, & Hummingbird

ALL-KANSAS NEWS WEBSITE CRITIQUE BOOKLET. This guide is designed to be an educational device SCHOOL NAME: NEW WEBSITE NAME: YEAR: ADVISER:

BUYER S GUIDE WEBSITE DEVELOPMENT

Blog was declared word of the year by the Merrian-Webster dictionary in It is an abbreviation for weblog, a term firstly used in 1997.

SEO: SEARCH ENGINE OPTIMISATION

Creating a Course Web Site

Beyond Ten Blue Links Seven Challenges

CURZON PR BUYER S GUIDE WEBSITE DEVELOPMENT

Vishnu Verrma. Calling Dreams, Full-time Blogging. Content Writing, Author & Contributor. e-commerce & Web Development, Developer

Disclaimer Reasonable care has been taken to ensure that the information presented in this book is accurate. However, the reader should understand

Refreshing Your Affiliate Website

Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman

Introduction April 27 th 2016

Web Ecosystem. Jo, SanKu 조산구 차세대개발 KT. June 25, 2008

THE UIGARDEN PROJECT: A BILINGUAL WEBZINE Christina Li, Eleanor Lisney, Sean Liu UiGarden.net

EXECUTIVE OVERVIEW. Upgrading to Magento 2

Semantic Web and Web2.0. Dr Nicholas Gibbins

Keywords. The Foundation of your Internet Business.. By Eric Graudins: TheInternetBloke.com Worldwide Rights Reserved.

The Convio Online Marketing Nonprofit Benchmark Index Study. Executive Summary

Cybersecurity and the Board of Directors

Haku Inc. Founded 2014

Digital Insight PUSHING YOUR SEO TO ITS LIMITS

How to Create a Killer Resources Page (That's Crazy Profitable)

TDWI Data Modeling. Data Analysis and Design for BI and Data Warehousing Systems

WHAT YOU WILL LEARN PT ACADEMY

Basic Blogging Terms Prof. Anthony Collins, Instructor. What is a blog?

Using the Internet to Gather, Connect and as a place of Mission

Advanced Marketing Certification Training

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Infinite data. Filtering data streams

THE SECRET FOR MAKING MONEY ONLINE

Blogging at lawandmedicine. A. How to Do It

Graph Structure Over Time

12 Key Steps to Successful Marketing

Advanced Google Local Maps Ranking Strategies for Local SEO Agencies

Mendeley Help Guide. What is Mendeley? Mendeley is freemium software which is available

Network Self-determination: When building the Internet becomes a right

Predict Topic Trend in Blogosphere

Case Study: Best Strategy To Rank Your Content On Google

How to turn. Reviews into Revenue & Trends in Online Reputation

User s Guide Your Personal Profile and Settings Creating Professional Learning Communities

program self-assessment tool

Frequently Asked Questions- Communication, the Internet, Presentations Question 1: What is the difference between the Internet and the World Wide Web?

Version 11

How To Create Backlinks

Background Brief. The need to foster the IXPs ecosystem in the Arab region

Search Engine Optimization (SEO) Services

Tag-based Social Interest Discovery

Peninsula College Continuing Education - Website Design. Welcome to the new Web (2.0)! Enthusiasm is better than Perfectionism.

Weighted Alternating Least Squares (WALS) for Movie Recommendations) Drew Hodun SCPD. Abstract

seosummit seosummit April 24-26, 2017 Copyright 2017 Rebecca Gill & ithemes

Recipes. Marketing For Bloggers. List Building, Traffic, Money & More. A Free Guide by The Social Ms Page! 1 of! 24

What is SEO? { Search Engine Optimization }

Predicting Messaging Response Time in a Long Distance Relationship

The Establishment of Large Data Mining Platform Based on Cloud Computing. Wei CAI

Centrality Book. cohesion.

Inbound Website. How to Build an. Track 1 SEO and SOCIAL

Latent Space Model for Road Networks to Predict Time-Varying Traffic. Presented by: Rob Fitzgerald Spring 2017

Achieving a Secure and Resilient Cyber Ecosystem: A Way Ahead

Latest Press Release. Trap call with simple mobile

Strong signs your website needs a professional redesign

SEARCH ENGINE OPTIMIZATION ALWAYS, SOMETIMES, NEVER

Prayerful Living Singles User s Manual

TURNING METCALFE ON HIS HEAD: THE MULTIPLE and GROWING COSTS OF NETWORK EXCLUSION. Dr. Rahul Tongia, CMU/CSTEP Dr. Ernest J.

IPv6 Readiness in the Communication Service Provider Industry

Spice UK. Susan Hallam. Susan Hallam Page 1. Spice UK. Agenda for Today

KEYS TO BUILDING A WEBINAR ON-DEMAND STRATEGY

Introduction to Data Mining

Chapter 3. Organizational Design and Support Usability. Organizational Design and Support Usability (cont.) Managing Design Processes

Background Brief. The need to foster the IXPs ecosystem in the Arab region

Introduction to the Publish Content Module

THE USAGE OF VS.

One click away from Sustainable Consumption and Production

Communication in a Digital World

What Drives the Web 2.0 World: Search, Media, and Conversations. John Battelle UCB, Marti Hearst, Presiding

!!!!!! Portfolio Summary!! for more information July, C o n c e r t T e c h n o l o g y

Development, Define, and Design. Peninsula College Continuing Education - Website Development. Welcome to the new Web (2.0)!

HTML 5 and CSS 3, Illustrated Complete. Unit M: Integrating Social Media Tools

Open Research Online The Open University s repository of research publications and other research outputs

Your List Building Questions Answered

Chapter 1. Social Media and Social Computing. October 2012 Youn-Hee Han

6 counterintuitive strategies to put your list building efforts into overdrive

Opportunities from Open Source Search

UANP 6013 INFORMATICS IN SOCIETY

Artificial Neuron Modelling Based on Wave Shape

The Future of P2P. A Retrospective. Eric Armstrong President Kontiki, Inc.

U.S. Digital Video Benchmark Adobe Digital Index Q2 2014

Digital Marketing and Search Engines Optimization

Netiquette. IT Engineering II. IT Engineering II Instructor: Ali B. Hashemi

SEO For Bloggers: Learn How To Rank Your Blog Posts At The Top Of Google's Search Results (The SEO Series) By R. L.

Blogging Basics And Best Practices. Table of Contents

DEMAND INCREASE GROWTH

THE QUICK AND EASY GUIDE

In Search of Reusable Educational Resources in the Web

Databricks Delta: Bringing Unprecedented Reliability and Performance to Cloud Data Lakes

Getting Started Guide. Getting Started With Quick Blogcast. Setting up and configuring your blogcast site.

THALES DATA THREAT REPORT

Account Customer Portal Manual

Recommendation System for Location-based Social Network CS224W Project Report

Transcription:

Searching for Familiar Strangers on Blogosphere: Problems and Challenges N. Agarwal, H. Liu, J. Salerno, and P. Yu http://www.public.asu.edu/~huanliu/papers/ngdm07.pdf

An Ever Evolving Field Fragmentation alters Internet and presents new challenges to Next Generation Data Mining We present a problem encountered in our study The problem is about Web 2.0 Blogosphere Long Tail Connecting dots Challenges

Web 2.0 Universal information space People generate and enrich contents Low barriers for publishing and participating Information consumers are now also Producers, or Prosumers Even more people are joining for various reasons from expression to reputation

Blogosphere Blogosphere Blog site Blogger Blog post and comments (blog entries) Reverse chronologically ordered entries Single authored (Individual Blog Sites) Multi authored (Community blog sites)

A Growing Revelation: Short Head and Long Tail Few people are densely connected: Short Head Many people are sparsely connected: Long Tail Long Tail and Short Head tend to be equally important Examples of albums * Zipf s Law, Power Law, Law of the Vital Few are well-known examples The 80/20 rule becomes more skewed Short Head Long Tail * The Long Tail, C. Anderson

Who are Familiar Strangers? People who do not know each other but share some common patterns Real World Individuals can notice the presence of some others at a train platform Common pattern: leaving at the same time and place Blogosphere What you write is what you are Have similar blogging behavior, interests (e.g., movies, games, technology, and politics, etc.) Never cited (came across) each other

Bloggers in Long Tail Not returned as top hits by search engines Inordinately many Disconnected A Movie Review example Critics Short Head (e.g., nytimes.com) Movie Bloggers Long Tail Long Tail is the most lucrative test-bed for finding Familiar Strangers

Finding Research Groups across Disciplines with Similar Interest An example of `feature selection research Many groups in different disciplines are working on that without calling it feature selection Some working on Feature Selection but mean different things Web 2.0 Marketing 4Ps Numerous book/product reviews Another Web 2.0 application Finding who are the best matches you don t know

Web 2.0 Marketing 4Ps Personalization for the long tail Much easier to fish where the fish are Participation Creating a critical mass and generating crowd effect Peer-to-peer Online word of mouth Smooth information flow Predictive modeling Anticipation and recommendation

Connecting the Dots An example from The Long Tail book Touching the Void, 1988 Largely forgotten after its publication Into Thin Air, a decade later Its publication attracted a few readers who wrote reviews that connect this book to Touching the Void Touching the Void became a new sensation

Aggregating Niches in Long Tail A blogger with familiar-strangers will form a critical mass such that the understanding of one blogger gives us a sensible and representative glimpse to others, more data about familiar strangers can be collected for better customization and services (e.g., personalization and recommendation), the nuances among them present new business opportunities, and knowledge about them can facilitate predictive modeling and trend analysis.

Need for Aggregation Customized attention requires substantial data Majority of blog sites are in the Long Tail and are disconnected Aggregating the similar yet disconnected for obtaining critical mass Lack of data can result in irrelevant ads (see an example on the right) Increase participation Move from the Long Tail closer to the Short Head Smooth knowledge transfer between familiar strangers

Definition Given a blogger b, familiar strangers to b are a set of bloggers B = {b 1,b 2,,b n }, who share common patterns as b, like blogging on similar topics, but have not come across each other or are not related to each other. Familiarity: Blog posts

Definition Strangers: Partial strangers Total strangers b j is in b s Social Network b and b j have disjoint Social Networks b is in b j s Social Network

Different Types Organizational differences in the blogosphere eventuate disparate types of familiar stranger bloggers Community-level familiar strangers Networking-sitelevel familiar strangers Blogosphere-level familiar strangers

Community-Level Familiar MySpace has a community called A group for those who love history It has 38 members Two members, Maria and John blog profusely on the similar topic, but they are not in each other s social network Strangers

Networking-Site-Level 2 groups on MySpace, The Samurai (32 members) The Japanese Sword (84 members) Marc, top blogger on The Samurai and Jeff, top blogger on The Japanese Sword discuss about Japanese martial arts. Neither of them is in the other s social network. This implies, though being active locally and discussing on the same theme, the two bloggers are still strangers. Familiar Strangers

Blogosphere-Level Familiar 2 different social networking sites, MySpace and Orkut. The Samurai (32 members) from MySpace Samurai Sword (29 members) from Orkut Top bloggers from the respective communities in MySpace and Orkut, Marc and Anant, respectively, share the blogging theme but they are not in each others social network. The above example illustrates the existence of blogosphere-level familiar strangers. Strangers

Challenge Link analysis Exhaustive search is impractical One step-look ahead cannot go too far

Challenge Searching via blog posts Defining Similarity If we know exactly what we want, For the example of two books (TTV, ITA),

Challenge Using the context to help search What constitutes the context for a blogger Directed vs. broadcast search An intuitive approach is to integrate link-, similarity-, and context-based methods

Yet Another Challenge How to evaluate the work without standard ground truth? Data and data collection Experiments Performance metrics

Ending Remarks Finding familiar strangers on the blogosphere is challenging Aggregating familiar strangers can help find niches in the Long Tail We present more problems than solutions Long Tail is where abundant new opportunities are A related note: Workshop on Social Computing, Behavioral Modeling, Prediction http://www.public.asu.edu/~huanliu/sbp08