Tatiana Braescu Seminar Visual Analytics Autumn, Supervisor Prof. Dr. Andreas Kerren. Purpose of the presentation

Similar documents
Detecting Controversial Articles in Wikipedia

Building and Annotating Corpora of Collaborative Authoring in Wikipedia

Analysis and visualization with v isone

PALANTIR CYBERMESH INTRODUCTION

Data Mining Concepts & Tasks

DATABASES often contain uncertain and imprecise references

Using SocialAction to Uncover the Catalano & Vidro s Social Structure

CISC859: Topics in Advanced Networks & Distributed Computing: Network & Distributed System Security. A Brief Overview of Security & Privacy Issues

Minimally Meets Expectations

Data Mining Concepts & Tasks

Exploring ENRON with NetLens

Social and Information Network Analysis Positive and Negative Relationships

Evaluation and Design Issues of Nordic DC Metadata Creation Tool

DATA MINING II - 1DL460. Spring 2014"

Pilot: Collaborative Glossary Platform

Migrate from Netezza Workload Migration

Data Integration. Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech. CSE6242 / CX4242: Data & Visual Analytics

DualNet: A Coordinated View Approach to Network Visualization

Kara Greenfield, William Campbell, Joel Acevedo-Aviles

Data Preprocessing. Slides by: Shree Jaswal

Mining Web Data. Lijun Zhang

Chapter 11 - Data Replication Middleware

A Dual-View Approach to Interactive Network Visualization

Link Mining & Entity Resolution. Lise Getoor University of Maryland, College Park

INF-5360 Presentation

Step 1: Open browser to navigate to the data science challenge home page

ICT-SHOK Project Proposal: PROFI

Learning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search

Corporate IT Survey Messaging and Collaboration, Editor: Sara Radicati, Ph.D

Preprocessing Short Lecture Notes cse352. Professor Anita Wasilewska

Practical Database Design Methodology and Use of UML Diagrams Design & Analysis of Database Systems

Describing the architecture: Creating and Using Architectural Description Languages (ADLs): What are the attributes and R-forms?

HPDedup: A Hybrid Prioritized Data Deduplication Mechanism for Primary Storage in the Cloud

Identity Management: Setting Context

Tania Tudorache Stanford University. - Ontolog forum invited talk04. October 2007

Input to ORMS TC Definitions and reference model proposals, Use cases. Daniela Bourges Waldegg

Graphs / Networks CSE 6242/ CX Interactive applications. Duen Horng (Polo) Chau Georgia Tech

What s New in Spotfire DXP 1.1. Spotfire Product Management January 2007

Information Retrieval and Web Search

Business Benefits of Policy Based Data De-Duplication Data Footprint Reduction with Quality of Service (QoS) for Data Protection

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha

Sybil defenses via social networks

Introduction to Data Mining and Data Analytics

Current Topics in OS Research. So, what s hot?

Task Taxonomy for Graph Visualization

Graph Structure Over Time

Managing Update Conflicts in Bayou. Lucy Youxuan Jiang, Hiwot Tadese Kassa

: What are the features of a communications-driven DSS?

Optimize Online Testing for Site Optimization: 101. White Paper. White Paper Webtrends 2014 Webtrends, Inc. All Rights Reserved

KDD 10 Tutorial: Recommender Problems for Web Applications. Deepak Agarwal and Bee-Chung Chen Yahoo! Research

START GUIDE CDMNext V.3.0

Semantic Web and Web2.0. Dr Nicholas Gibbins

A Secure and Dynamic Multi-keyword Ranked Search Scheme over Encrypted Cloud Data

ANNUAL REPORT Visit us at project.eu Supported by. Mission

MAPR DATA GOVERNANCE WITHOUT COMPROMISE

Important Lessons. Today's Lecture. Two Views of Distributed Systems

IEEE 2013 JAVA PROJECTS Contact No: KNOWLEDGE AND DATA ENGINEERING

Case Study: Social Network Analysis. Part II

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat

Overview of Web Mining Techniques and its Application towards Web

Context-Aware Vehicular Cyber-Physical Systems with Cloud Support: Architecture, Challenges, and Solutions

Part I: Data Mining Foundations

Microsoft SQL Server HA and DR with DVX

Enabling vsan on the Cluster First Published On: Last Updated On:

1. WORKSHARE PROJECT OVERVIEW

Collective Entity Resolution in Relational Data

Interactive Campaign Planning for Marketing Analysts

Chapter 6 VIDEO CASES

Lazy Big Data Integration

Analysis of Nokia Customer Tweets with SAS Enterprise Miner and SAS Sentiment Analysis Studio

Your Open Science and Research Publishing Platform. 1st SciShops Summer School

HORIZON2020 FRAMEWORK PROGRAMME TOPIC EUK Federated Cloud resource brokerage for mobile cloud services. D7.1 Initial publication package

Investigating Collaboration Dynamics in Different Ontology Development Environments

Mining Web Data. Lijun Zhang

Rok: Data Management for Data Science

CMO Briefing Google+:

Large Scale Information Visualization. Jing Yang Fall Tree and Graph Visualization (2)

Glyphs. Presentation Overview. What is a Glyph!? Cont. What is a Glyph!? Glyph Fundamentals. Goal of Paper. Presented by Bertrand Low

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

ArgQL: A Declarative Language for Querying Argumentative Dialogues

Visualisasi Informasi

InfoVis Systems & Toolkits

L3.4. Data Management Techniques. Frederic Desprez Benjamin Isnard Johan Montagnat

OASIS PKI Action Plan Overcoming Obstacles to PKI Deployment and Usage

CS260. UI Toolkits. Björn Hartmann University of California, Berkeley EECS, Computer Science Division Fall 2010

HCI Research Methods

Broken Promises.

Deployment Guide For Microsoft Exchange With Cohesity DataProtect

Blockchain for Enterprise: A Security & Privacy Perspective through Hyperledger/fabric

Abstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

BUYER S GUIDE WEBSITE DEVELOPMENT

An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization

groupware chapter 19 Groupware What is groupware? The Time/Space Matrix Classification by Function Time/Space Matrix (ctd)

Caching video contents in IPTV systems with hierarchical architecture

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

Beyond Google Other Good Search Engines Directories Web Page Evaluation Checklist

Grid Computing Systems: A Survey and Taxonomy

Juggling the Jigsaw Towards Automated Problem Inference from Network Trouble Tickets

Transcription:

Tatiana Braescu Seminar Visual Analytics Autumn, 2007 Supervisor Prof. Dr. Andreas Kerren 1 Purpose of the presentation To present an overview of analysis and visualization techniques that reveal: who revises whom in Wikipedia us vs. them conflict patterns between groups of users in Wikipedia who's connected to who on bibliographic collaboration networks To offer insights on design considerations for asynchronous collaboration in visual analysis environments: works parallelization communication social organization Social Analysis & Interaction: Summary 2 1

Who revises Whom in Wikipedia Ulrik Brandes and Jürgen Lerner Department of Computer & Information Science University of Konstanz controversial topics (abortion, gun control) delicate historic events important political persons? Figure 1: Small part of the revision history of the page Gun politics in Wikipedia. A set of analysis and visualization techniques that reveal: The dominant authors of a page The roles they play and the alters they confront Tools to understand how authors collaborate in the presence of controversy Who revises Whom: Motivation 3 Who revises Whom in Wikipedia First step: revision network Figure 1: Small part of the revision history of the page Gun politics. <page><title>gun politics</title>... <revision><timestamp>2006-03-18t22:31:41z</timestamp> <contributor><ip>24.12.208.181</ip></contributor> <comment>/* Self-defense */</comment> </revision> <revision><timestamp>2006-03-18t23:18:38z</timestamp> <contributor><username>yaf</username></contributor> <comment>rv POV edit (discussion belongs on discussion page, not in article)</comment> </revision> <revision><timestamp>2006-03-19t02:39:25z</timestamp> <contributor><ip>24.12.208.181</ip></contributor> <comment>/* General discussion of arguments */ Fact with cite. DO NOT DELETE WITHOUT VERY GOOD REASON!!!!!!! Different placement on page acceptable.</comment> </revision> <revision><timestamp>2006-03-19t02:52:41z</timestamp> <contributor><username>mmx1</username></contributor > <comment>wikipedia is not a collection of facts. This page is a summary of the arguments, not a place to make... Figure 2: Six consecutive revisions of the page Gun politics in XML format. Who revises Whom: Input data 4 2

Who revises Whom in Wikipedia Next step: visual analysis and visual representation What Position Do They Take? How they are involve? Visualization of a revision network determined from Gun politics and related pages. Who revises Whom: Visualization 5 Who revises Whom in Wikipedia Next step: Filtering Restriction to Time Intervals Restriction to Relevant Sub-networks Network clustering reveals a relevant sub-network of the revision network of Gun politics Filtering in time: a peak in the revision plot of Gun politics during 2003 has been caused by authors that vanish in the global image Who revises Whom: Visualization 6 3

Who revises Whom in Wikipedia Final step: identified recurrent patterns of confrontation Who revises Whom: Patterns 7 Who revises Whom in Wikipedia CONCLUSION + reveal the authors that are the most involved in controversy + network visualizations show who confronts whom and who plays which role + identified some recurrent patterns of confrontation + can be applied to Wikipedia articles in any language without the need for adapting NLP algorithms - the revision network should take into account whose text has been changed during a revision - the interpretation of the revisor vs. revised pattern can be quite different. Who revises Whom: Conclusion 8 4

Purpose of the presentation To present an overview of analysis and visualization techniques that reveal: who revises whom in Wikipedia us vs. them conflict patterns between groups of users in Wikipedia who's connected to who on bibliographic collaboration networks. To offer insights on design considerations for asynchronous collaboration in visual analysis environments: works parallelization communication social organization Social Analysis & Interaction: Summary 9 Us vs. Them conflict patterns between groups of users in Wikipedia Bongwon Suh, Ed H. Chi, Bryan A. Pendleton and Aniket Kittur Palo Alto Research Center Wikipedia has been growing at an exponential rate [5, 6, 18, 32, 47] disagreements about article content The overhead cost has increased dramatically procedures administrative issues spam vandalism conflict between user factions A model of how conflicts occur in Wikipedia and how conflicts are resolved Us vs. Them: Motivation 10 5

Us vs. Them conflict patterns between groups of users in Wikipedia The user conflict model: base & methods - users editing history - the relationships between user edits ( reverts ) A revert is defined as undoing the actions of another editor in whole or in part Two different methods Data-driven : to generate a small fingerprint of each revision + User-labeled: to capture partial reverts provides converging evidence on the true change in reverts over time Us vs. Them: Methods 11 Us vs. Them conflict patterns between groups of users in Wikipedia The user conflict model :design choices and layout principles Problem using reverts to identify conflicts: multiple users are often involved in chains of reverts edit history is typically long and tedious to browse various types of reverts - the revert duel - the self-reverts - reverts by multiple users } Disregard Self Revert Degree of Conflict Conflict Group Identity Based Revert Immediate Revert Only A force-directed graphlayout algorithm Nodes are evenly distributed as an initial layout. When forces are deployed, nodes are rearranged in two user groups Us vs. Them: User model 12 6

Us vs. Them conflict patterns between groups of users in Wikipedia The user conflict model visualization tool (Revert Graph ) Administrator Anonymous user Registered user How to use the tool? 1. The user specify an article 2. The revert history of the article is retrieved from DB 3. 1. User The tool clicks loads on a specific group of user users node 2. Shows participating a list of in editing revert relationships this article 4. A node-link graph is formed and 3. displayed Click on a specific relationship 5. 4. Simulation Shows the runs-forces revert history in between the graph stabilize those 2 users 6. Appear structures between users Us vs. Them: Revert Graph 13 Us vs. Them conflict patterns between groups of users in Wikipedia Visualization tool (Revert Graph ): user conflict patterns Analysis: 901 high conflict articles with more than 250 reverts 1. Pattern one- NODE CLUSTERS AND OPINION GROUPS 2. Pattern two MEDIATION primarily 3. Pattern three FIGHTING nonregistered VANDALISM mostly 4. users Pattern four CONTROVERSIAL users EDITOR with Korean point of view NODE CLUSTERS AND OPINION GROUPS mixed point of view mostly users with Japanese point of view Revert Graph for the Wikipedia page on Dokdo Us vs. Them: Patterns 14 7

Us vs. Them conflict patterns between groups of users in Wikipedia CONCLUSION + helps identify important social patterns in Wikipedia + may be applicable to other online communities - not every aspect of social dynamics in online collaboration systems was fully addressed - the tool cannot detect conflicts between users who were not involved in reverts Us vs. Them: Conclusion 15 Purpose of the presentation To present an overview of analysis and visualization techniques that reveal: who revises whom in Wikipedia us vs. them conflict patterns between groups of users in Wikipedia who's connected to who on bibliographic collaboration networks. To offer insights on design considerations for asynchronous collaboration in visual analysis environments: works parallelization communication social organization Social Analysis & Interaction:Summary 16 8

? Who's connected to Who on bibliographic collaboration networks Mustafa Bilgic,Louis Licamele,Lise Getoor and Ben Shneiderman University of Maryland,College Park, MD The problem = the data may inadvertently contain several distinct references to the same underlying entity or actor.! This visual display is misleading: incorrect number of nodes & the edges and paths are inaccurate Calculating of the standard social network measures, would give inaccurate results. The solution= entity-resolution to identify potential duplicates (The process of reconciling, from the underlying data references, the actual real-world entities) automated entity resolution hand cleaning entity resolution Who s connected to Who: Motivation 17 Who's connected to Who on bibliographic collaboration networks D-Dupe: resolve ambiguities either by merging nodes or by marking them distinct. Cleaning large networks by focusing on a small subnetwork containing a potential duplicate pair Two of D-Dupe's novelties are: 1. Stable Visual Layout Optimized for Entity Resolution Shows only the subnetwork relevant for the entity resolution task. Allows visualization to scale to large networks A stable substrate-the potential duplicates and other related entities always appear at the same location 2. User Control for Combining Entity Resolution Algorithms Numerous similarity measures can be used to determine potential duplicates Allows users to flexibly apply and interleave different measures Who s connected to Who : Conclusion 18 9

Who's connected to Who on bibliographic collaboration networks Figure 1 gives an overview of the deduplication process on a small portion of bibliographic dataset used for the InfoVis 2004 Contest Who s connected to Who : Conclusion 19 Who's connected to Who on bibliographic collaboration networks D-Dupe: interface 2 Filtering 1 3 1. Users begin by loading a dataset 2. Choose from a number of possible entity resolution algorithms 3. The entity resolution algorithms ranks pairs of nodes according to how likely they are to be duplicates. 4. Select a potential duplicate pair for analysis 5 View the collaboration context network for the pair and apply filtering and highlighting 6. Users decide that the two nodes are: duplicates or distinct node A video D-Dupe demonstration: http://www.cs.umd.edu/linqs/dd upe/ Who s connected to Who : Conclusion 20 10

Who's connected to Who on bibliographic collaboration networks CONCLUSION + D-Dupe's layout and interaction principles can be used in other social networks + Use an interface which effectively combines visual and analytic information for data cleaning in an interactive tool. -The actors should have properties that can be used by the attribute similarity functions. Who s connected to Who : Conclusion 21 Purpose of the presentation To present an overview of analysis and visualization techniques that reveal: who revises whom in Wikipedia us vs. them conflict patterns between groups of users in Wikipedia who's connected to who on bibliographic collaboration networks. Design considerations for asynchronous collaboration in visual analysis environments: works parallelization communication social organization Social Analysis & Interaction:Summary 22 11

Design considerations for asynchronous collaboration Jeffrey Heer, Maneesh Agrawala University of California, Berkeley Premise: to fully support sensemaking, interactive visualization should also support social interaction The problem collaboration mechanisms for supporting social interaction are not immediately clear - How should collaboration be structured? - What shared artifacts can be used to coordinate contributions? - What are the most effective communication mechanisms? Design consideration: Motivation 23 Design considerations for asynchronous collaboration A set of design considerations 1. Division and allocation of work 2. Common ground and awareness 3. Reference and deixis 4. Incentives and engagement 5. Identity, trust, and reputation 6. Group dynamics 7. Consensus and decision making Consensus and discussion voting or ranking systems Information distribution and presentation discussion prediction markets : individuals can be given a limited amount of points or currency Design consideration: key issues 24 12

Design considerations for asynchronous collaboration Asynchronous Collaborative Visualization Systems Many Wikimapia Eyes Design consideration: Visualization Systems 25 Design considerations for asynchronous collaboration CONCLUSION By partitioning work across both time and space, asynchronous collaboration offers greater scalability for group-oriented analysis Design consideration: Conclusion 26 13

Final Why we need these tools? Human existence depends on collaborative problem solving. Social Analysis & Interaction: Finnal 27 14