Who thinks who knows who? Socio-Cognitive Analysis of an Network

Similar documents
Who Thinks Who Knows Who? Socio-cognitive Analysis of Networks. Technical Report

Implementing Ray Casting in Tetrahedral Meshes with Programmable Graphics Hardware (Technical Report)

A Matching Algorithm for Content-Based Image Retrieval

Sam knows that his MP3 player has 40% of its battery life left and that the battery charges by an additional 12 percentage points every 15 minutes.

Shortest Path Algorithms. Lecture I: Shortest Path Algorithms. Example. Graphs and Matrices. Setting: Dr Kieran T. Herley.

MATH Differential Equations September 15, 2008 Project 1, Fall 2008 Due: September 24, 2008

Nonparametric CUSUM Charts for Process Variability

Coded Caching with Multiple File Requests

An Improved Square-Root Nyquist Shaping Filter

Image segmentation. Motivation. Objective. Definitions. A classification of segmentation techniques. Assumptions for thresholding

Definition and examples of time series

Video Content Description Using Fuzzy Spatio-Temporal Relations

4. Minimax and planning problems

Learning in Games via Opponent Strategy Estimation and Policy Search

Probabilistic Detection and Tracking of Motion Discontinuities

Lecture 18: Mix net Voting Systems

STEREO PLANE MATCHING TECHNIQUE

The Impact of Product Development on the Lifecycle of Defects

IntentSearch:Capturing User Intention for One-Click Internet Image Search

A time-space consistency solution for hardware-in-the-loop simulation system

EECS 487: Interactive Computer Graphics

LOW-VELOCITY IMPACT LOCALIZATION OF THE COMPOSITE TUBE USING A NORMALIZED CROSS-CORRELATION METHOD

An Adaptive Spatial Depth Filter for 3D Rendering IP

Difficulty-aware Hybrid Search in Peer-to-Peer Networks

NEWTON S SECOND LAW OF MOTION

Simple Network Management Based on PHP and SNMP

Improving Ranking of Search Engines Results Based on Power Links

COSC 3213: Computer Networks I Chapter 6 Handout # 7

Time Expression Recognition Using a Constituent-based Tagging Scheme

Analysis of Various Types of Bugs in the Object Oriented Java Script Language Coding

Relevance Ranking using Kernels

CENG 477 Introduction to Computer Graphics. Modeling Transformations

CAMERA CALIBRATION BY REGISTRATION STEREO RECONSTRUCTION TO 3D MODEL

An Efficient Delivery Scheme for Coded Caching

Open Access Research on an Improved Medical Image Enhancement Algorithm Based on P-M Model. Luo Aijing 1 and Yin Jin 2,* u = div( c u ) u

Mobile Robots Mapping

Handling uncertainty in semantic information retrieval process

Assignment 2. Due Monday Feb. 12, 10:00pm.

Towards a Realistic Model for Failure Propagation in Interdependent Networks

MOTION DETECTORS GRAPH MATCHING LAB PRE-LAB QUESTIONS

Network management and QoS provisioning - QoS in Frame Relay. . packet switching with virtual circuit service (virtual circuits are bidirectional);

4 Error Control. 4.1 Issues with Reliable Protocols

Video-Based Face Recognition Using Probabilistic Appearance Manifolds

Voltair Version 2.5 Release Notes (January, 2018)

Visual Indoor Localization with a Floor-Plan Map

Low-Cost WLAN based. Dr. Christian Hoene. Computer Science Department, University of Tübingen, Germany

Design Alternatives for a Thin Lens Spatial Integrator Array

Distributed Task Negotiation in Modular Robots

Packet Scheduling in a Low-Latency Optical Interconnect with Electronic Buffers

Improved TLD Algorithm for Face Tracking

Landmarks: A New Model for Similarity-Based Pattern Querying in Time Series Databases

Performance Evaluation of Implementing Calls Prioritization with Different Queuing Disciplines in Mobile Wireless Networks

Test - Accredited Configuration Engineer (ACE) Exam - PAN-OS 6.0 Version

Rao-Blackwellized Particle Filtering for Probing-Based 6-DOF Localization in Robotic Assembly

Gauss-Jordan Algorithm

PART 1 REFERENCE INFORMATION CONTROL DATA 6400 SYSTEMS CENTRAL PROCESSOR MONITOR

source managemen, naming, proecion, and service provisions. This paper concenraes on he basic processor scheduling aspecs of resource managemen. 2 The

Scheduling. Scheduling. EDA421/DIT171 - Parallel and Distributed Real-Time Systems, Chalmers/GU, 2011/2012 Lecture #4 Updated March 16, 2012

BI-TEMPORAL INDEXING

4.1 3D GEOMETRIC TRANSFORMATIONS

The Beer Dock: Three and a Half Implementations of the Beer Distribution Game

Reinforcement Learning by Policy Improvement. Making Use of Experiences of The Other Tasks. Hajime Kimura and Shigenobu Kobayashi

1.4 Application Separable Equations and the Logistic Equation

An efficient approach to improve throughput for TCP vegas in ad hoc network

It is easier to visualize plotting the curves of cos x and e x separately: > plot({cos(x),exp(x)},x = -5*Pi..Pi,y = );

Modelling urban travel time variability with the Burr regression technique

A METHOD OF MODELING DEFORMATION OF AN OBJECT EMPLOYING SURROUNDING VIDEO CAMERAS

Hyelim Oh. School of Computing, National University of Singapore, 13 Computing Drive, Singapore SINGAPORE

Virtual Recovery of Excavated Archaeological Finds

Moving Object Detection Using MRF Model and Entropy based Adaptive Thresholding

FIELD PROGRAMMABLE GATE ARRAY (FPGA) AS A NEW APPROACH TO IMPLEMENT THE CHAOTIC GENERATORS

Autonomic Cognitive-based Data Dissemination in Opportunistic Networks

Quantitative macro models feature an infinite number of periods A more realistic (?) view of time

Chapter 4 Sequential Instructions

Vulnerability Evaluation of Multimedia Subsystem Based on Complex Network

Real-time 2D Video/3D LiDAR Registration

A Fast Non-Uniform Knots Placement Method for B-Spline Fitting

Multiple View Discriminative Appearance Modeling with IMCMC for Distributed Tracking

Protecting User Privacy in a Multi-Path Information-Centric Network Using Multiple Random-Caches

Opportunistic Flooding in Low-Duty-Cycle Wireless Sensor Networks with Unreliable Links

MORPHOLOGICAL SEGMENTATION OF IMAGE SEQUENCES

EVALUATING ACCURACY OF A TIME ESTIMATOR IN A PROJECT

MoBAN: A Configurable Mobility Model for Wireless Body Area Networks

3 Conceptual Graphs and Cognitive Mapping

The Roots of Lisp paul graham

AML710 CAD LECTURE 11 SPACE CURVES. Space Curves Intrinsic properties Synthetic curves

A Formalization of Ray Casting Optimization Techniques

Chapter 3 MEDIA ACCESS CONTROL

Dynamic Route Planning and Obstacle Avoidance Model for Unmanned Aerial Vehicles

IDEF3 Process Description Capture Method

Rule-Based Multi-Query Optimization

Design and Application of Computer-aided English Online Examination System NONG DeChang 1, a

A MAC layer protocol for wireless networks with asymmetric links

Weighted Voting in 3D Random Forest Segmentation

Why not experiment with the system itself? Ways to study a system System. Application areas. Different kinds of systems

Optics and Light. Presentation

Partition-based document identifier assignment (PBDIA) algorithm. (long queries)

IROS 2015 Workshop on On-line decision-making in multi-robot coordination (DEMUR 15)

Lemonia Ragia and Stephan Winter 1 CONTRIBUTIONS TO A QUALITY DESCRIPTION OF AREAL OBJECTS IN SPATIAL DATA SETS

Improving the Efficiency of Dynamic Service Provisioning in Transport Networks with Scheduled Services

Transcription:

Who hinks who knows who? Socio-Cogniive Analysis of an Email Nework Nishih Pahak Deparmen of Compuer Science Universiy of Minnesoa Minneapolis, MN, USA npahak@cs.umn.edu Sandeep Mane Deparmen of Compuer Science Universiy of Minnesoa Minneapolis, MN, USA smane@cs.umn.edu Jaideep Srivasava Deparmen of Compuer Science Universiy of Minnesoa Minneapolis, MN, USA srivasa@cs.umn.edu ABSTRACT Inerpersonal ineracion plays an imporan role in organizaional dynamics, and undersanding hese ineracion neworks is key for any organizaion, since hey can be apped o faciliae various organizaional processes. The principal roadblock o sudying organizaional neworks, however, has been he difficuly in collecing daa abou hem. The approach of conducing surveys/inerviews is fraugh wih issues of scalabili logisics, and reporing bias especially since a survey of his naure can seem quie inrusive. Widespread use of compuer neworks for organizaional communicaion provides a unique opporuniy o eliminae all hese difficulies and auomaically map he organizaional nework o a high degree of accuracy and deail. This paper describes an approach o auomaically build organizaional neworks by apping ino he e-mail server, which observes all organizaional communicaion. Specificall our approach focuses on sudying how communicaion beween acors is perceived by oher acors in a social nework. Such ype of neworks are formally referred o as socio-cogniive neworks (i.e. a "who knows, who knows who" nework). The key issues addressed by his paper are he represenaion and consrucion of a socio-cogniive nework from elecronic communicaion daa as well as idenifying and proposing echniques analyses of such neworks. Each acor in he socio-cogniive nework has a "se of beliefs" abou he communicaion beween oher acors in he nework. We propose a model for represening and consrucing he sociocogniive nework, where communicaion beween acors is represened as probabiliy disribuions. Each acor aemps o esimae hese disribuions using Bayesian inference, based only on he communicaion he/she observes. The conflic in beliefs beween differen acors, and heir divergence from reali is analyzed by comparing he belief models of acors. Measures which quanify he noions of "mispercepion" (divergence of an acor s beliefs from real "agreemen" (similariy in beliefs of differen acors) and "consensus" (general agreemen in a group of acors) are inroduced. We presen resuls for hese on he Enron email corpus. Our work, in addiion o providing novel compuaional schemes o sociologiss, also highlighs an imporan aspec of social nework analysis research where one focuses on how acors' percepions evolve over ime. Caegories and Subec Descripors H.2.8 [Daabase Managemen]: Daabase Applicaions Daa Mining. General Terms Algorihms, Measuremen Keywords Socio-cogniive nework, mispercepion, agreemen, consensus, Enron email daa. 1. INTRODUCTION Organizaion dynamics plays an imporan role in he funcioning of an enerprise. Undersanding he dynamics of organizaional processes empower managers and enable hem o effecively manage an enerprise's resources. Informal social and socio-cogniive neworks in an organizaion play an imporan role in such processes and significan effor has been made o sudy such organizaional neworks. However, mos research has relied on daa colleced manually (usually using surveys and observing communicaions beween individuals in meeings) and hence are subec o a variey of noise (e.g. biased opinions). The emergence of compuer neworks has enabled new ways of communicaion, e.g. e-mail and insan messaging, beween individuals in an organizaion, leading o new social neworks being esablished. In addiion o enabling geographicallyunresriced communicaion beween individuals, compuer neworks also enable collecion of gigabyes of unbiased daa abou communicaion beween individuals. This provides an unprecedened opporuniy o sudy organizaional social neworks, and new compuaional echniques are required o do so. In his research, we sudy socio-cogniive neworks based on email communicaion in an organizaion. Socio-cogniive nework analysis involves undersanding who knows who knows who in a social nework (see Figure 1). In case of email communicaion, an acor observes only hose emails which are addressed o him/her (i.e. he acor is on he To, Cc or Bcc fields

Figure 1. Illusraion of a Socio-cogniive nework of hose emails). For example, consider an e-mail sen by acor A o B, wih Cc o C and Bcc o D. The analysis of he header reveals he following: B and C know ha A and B communicaed, and ha all hree of hem know abou his communicaion. However, neiher B nor C knows ha D was also sen his e-mail. A and D know everyhing, and boh of hem also know ha B and C do no know of D's geing he e- mail. This analysis illusraes ha a single e-mail can creae differen beliefs among differen people, depending on wheher and how hey are included in i. Based on he observed emails, an acor forms his/her beliefs abou communicaion probabiliies beween differen acors. An email communicaion nework can be defined using he acors as he nodes and edges beween acors represening email communicaion beween hem. Thus, he communicaion probabiliies form he weighs for he edges. Modeling of such a communicaion nework is useful for analysis of such neworks, and hence we propose a model using probabiliy disribuions for communicaion probabiliies. A Bayesian inference echnique is used for updaing he probabiliies in he model. For our analysis, a closed world assumpion is made, i.e. acors beliefs abou probabiliies in a communicaion nework are based only on he emails exchanged, and no oher ineracion occurs ouside of he email nework. A characerisic of modeling email communicaion probabiliies beween acors is heir subecive naure, since acors observe only he se of emails enering heir respecive mailboxes. Thus, here can be differences in percepions abou communicaions for differen acors. We propose a new meric, a-closeness, for measuring he agreemen in he percepions of wo acors. The modeling of communicaion nework separaely for each acor is egocenric socio-cogniive analysis, since he communicaion is sudied from he acor's (called ego) perspecive. On he oher hand, an email server observes all communicaion beween acors and hus can model he acual communicaion beween all acors. This is he socio-cenric socio-cogniive nework analysis. The discrepancy beween he egocenric and sociocenric socio-cogniive nework analysis provides a measure of he discrepancy beween an acor's belief and he realiy. A novel measure r-closeness is proposed for measuring such discrepancies. Applicaion of he proposed model and belief divergence analysis o Enron email corpus is illusraed. The main conribuions of his paper are:- 1) Proposing a model for represening and capuring percepions in an email communicaion based social nework. 2) Proposing ypes of analyses ha can be done on a socio-cogniive nework namel agreemen beween acors' percepions and closeness of acors' percepions o realiy. 3) Proposing novel measures (r-closeness and a-closeness) for such analyses. 4) Applicaion of proposed approach and measures for knowledge discover on organizaional communicaion daa. The res of he paper is organized as follows: Secion 2 provides background on relaed lieraure in sociology and compuer science. Secion 3 describes he model for consrucing a sociocogniive nework from email daa. Secion 4 describes differen analyses on a socio-cogniive nework and proposes measures for he same. Secion 5 discusses applicaions of he proposed measures. Secion 6 presens experimenal resuls on he Enron email daase. Secion 7 summarizes and concludes he paper. 2. BACKGROUND Social nework analysis has been an acive field of sudy in sociology as well as anhropology. A social nework is a social srucure of people called acors, relaed (direcly or indirecl o each oher hrough a common relaion of ineres [10]. A social nework plays an imporan role in he disseminaion of ideas, informaion or influences among he group. However, in any social nework, i is no possible for everyone o be conneced o everyone else, nor is i desirable [2]. Thus, he main moive of social nework analysis is o sudy who knows who in a social nework. There are wo ypes of analyses of social neworks: (i) Socio-cenric (whole) nework analysis, where he ineracions beween he enire well-defined se of people are sudied; and (ii) Egocenric (personal) nework analysis, where one sudies he ineracions beween an acor (called ego ) and only hose acors relaed (direcly or indirecl o he ego. Subsanial research has illusraed he imporance of such analyses in organizaions. In an organizaion, informal neworks are formed by relaionships beween employees across funcions and/or divisions in order o accomplish asks quickly [5]. Such informal neworks can cu hrough formal reporing procedures o ump sar salled iniiaives and mee exraordinary deadlines. Informal neworks can us as easily saboage companies' bes laid plans by blocking communicaion and fomening opposiion o change unless managers know how o idenify and direc hem. Social nework analysis enables he undersanding of which acors are perceived as "friends" or "adversaries" by ohers, and which acors are aware of he presence of which oher acors. Taking his a sep furher is socio-cogniive nework analysis, which analyzes who knows who knows who in he social nework. This analysis is useful as i affecs he percepions of an acor abou oher acors, and hence he behavior of acors owards oher acors, which is of prime imporance in an organizaion. The beliefs for each acor are ranslaed ino a weighed digraph corresponding o he social nework ha exiss from ha acor's perspecive. Using hese digraphs, one can deermine who hinks who is influenial in he organizaion. This

informaion is highly valuable for a manager o undersand he exising informal nework in he organizaion. One of he main reasons for compuer neworks (and Inerne) o come ino exisence was o foser collaboraive work beween geographically dispersed researchers. These compuer neworks have now urned ino an infrasrucure ha suppors social neworks; connecing people, organizaions as well as knowledge [9]. The widespread use of inerne and he growing online communiy of users have enabled he formaion of social neworks based on differen relaions of ineres. For example, Usene a widely used online newsgroup had more han 80,000 opic-oriened discussion groups (or social neworks) in 2000. These discussion groups allow individuals o form geographically dispersed, loosely bound, social neworks. On he oher hand, compuer neworks also faciliae an acor o paricipae in differen social neworks (communiies). Thus enabling he acor o know many more oher acors and increase his/her social capial. In an organizaion, i is possible o map he online acor (e.g. email address) o a real-world acor (e.g. employee), and analysis of hese ineracions has he poenial of providing unbiased measures abou social relaionships beween real-world acors. An email server logs all he email exchanged beween employees in an organizaion. Thus, i ges an unbiased view of all communicaion occurring beween employees. Analysis of such email logs will provide an insigh ino he compuer nework based (in his case, email-based) social neworks. However, o analyze such gigabyes of daa abou emails exchanged beween employees (considering a medium scale organizaion) asks for new compuaional echniques. Wih he availabiliy of he Enron email corpus, here has been a growing ineres in applying compuaional echniques o analyze e-mailbased social neworks. Iniial research on analysis of such email daa has concenraed on applying radiional social nework echniques and/or graph-based measures [3, 6 and 8]. In his research, we ake a sep furher by providing novel compuaional echniques for socio-cogniive analysis of email daa. 3. CONSTRUCTING A SOCIO- COGNITIVE NETWORK This secion presens a novel approach for auomaic consrucion of a socio-cogniive nework from he analysis of an organizaion s email communicaion. 3.1 Basic conceps Each acor paricipaing in he communicaion nework, has his/her view of (a cerain porion of) he enire nework. Over a period of ime, he acor develops cerain beliefs regarding he communicaions in he social nework around self, i.e. beliefs regarding who communicaes wih whom, based on wha he/she has observed so far. Thus, in his framework, a socio-cogniive nework is he se of such beliefs for every acor in he social nework. Consider an email communicaion nework consising of N acors denoed by he se, {A k 1 k N}. Le P k denoe he probabiliy ha a given email in he nework is from acor A k. Thus, Number of emails sen by Ak Pk Toal number of emails exchanged in he nework Le P y x denoe he probabiliy of A y being a recipien of an email, given ha A x has sen ha email. Thus, Number of emails sen by Ax and received by Ay P y x Toal number of emails sen by A Hence, Px Py x P( Number of emails sen by and received by Toal number of emails exchanged in he nework Thus, P( is probabiliy ha acor A x sends an email o acor A y. P( represens he srengh of he acor A x s communicaion wih acor A y. Noe ha each email has a unique (single) sender, and hus evens corresponding o an email being sen by differen acors are muually exclusive. Hence, he following condiion mus always hold - z acors Az P 1 and P 0, acors A K z The evens corresponding o differen acors being recipiens of an email are no muually exclusive as i is possible for an email o have muliple recipiens. This leads P(, i.e. an acor A x being a sender and anoher acor A y being a recipien, no being muually exclusive for all pairs of acors. For example an email can be an insance of he even from sender A x o recipien A z as well as from sender A x o recipien A y. Bu, since each email has one sender and a leas one recipien, he following condiion mus always hold - P( ) 1 and P( ) 0, A, A K (2) Ai, A Noe he even of some acor A x being he sender and A y being a recipien is muually exclusive o is complemen, i.e. he even where eiher A x is no he sender or A y is no a recipien or boh. The probabiliies of hese wo evens are P( and 1-P( respecively. 3.2 Socio-cogniive Nework Modeling We define a Bernoulli disribuion over hese wo evens, i.e. L( [P(,1-P(]. P( is he parameer for he Bernoulli disribuion L(. There will N(N-1) such disribuions, one for every ordered pair of acors (A x,a y ) Figure 2. Communicaion beween acors expressed as Bernoulli disribuions being a recipien or i is no (see figure 2). A x z i x (1) where A x A y. Thus, he communicaion beween acors is represened by Bernoulli disribuions, where every email exchanged in he nework is a Bernoulli rial, i.e. eiher a given email is from a given acor wih anoher given acor A y

Figure 3. Belief Sae of acor A k, wih beliefs as Bea disribuions Based on his/her observaions, every acor mainains a disribuion over all possible probabiliies P( for a given ordered pair (A x,a y ), i.e. a disribuion over all possible values for he parameer of he Bernoulli disribuion L(. Since here are N(N-1) ordered pairs of acors, he acor will mainain N(N-1) such disribuions. For mainaining his disribuion over all possible parameers of a Bernoulli disribuion, a Bea disribuion is used. Since, he Bea disribuion is he conugae prior for he Bernoulli disribuion, a Bayesian inference on he parameers of a Bea disribuion is he mos likely choice for mainaining acors beliefs. We define he belief sae of an acor o be a se of N(N-1) Bea disribuions, where each Bea disribuion J( is defined over he corresponding Bernoulli disribuion L(. If we denoe he belief sae for a given acor A k by B k, hen B k {J() k for all order pairs (A i,a ) such ha A i A } where, J() k, he Bea disribuion over he parameer of L(), is defined as A k s belief regarding he communicaion from A i o A (see figure 3). The probabiliy densiy funcion for a Bea disribuion wih parameers α and β is, Γ( α + β ) α 1 β 1 Bea ( α, β )( p) p (1 p) K Γ( α ) Γ( β ) 0 0 p 1 and α, β > 0 where, y 1 x Γ( x e dx The above expression gives he probabiliy densiy of a Bernoulli disribuion ha has probabiliies p and 1-p (or parameer p). Higher values of α relaive o β cause he Bea disribuion o favor Bernoulli disribuions wih higher values of p and vice-versa. The expeced value for he parameer p (based on he Bea disribuion) of he Bernoulli disribuion is given b E [ p] α α + β Thus, he expeced Bernoulli disribuion i.e. {E[p], 1-E[p]} is obained by simply normalizing he parameers of he Bea disribuion. Normally he parameers α and β are associaed wih paricular oucomes of he Bernoulli rials over which he Bea disribuion is defined. For example, if he Bernoulli disribuion is over he probabiliy p of obaining heads on a coin oss, hen a Bea disribuion over he possible values of p is defined as follows. The parameer α is aken as he number of imes heads is observed and parameer β is he number of imes ails is observed. For each observaion of heads, α is incremened by 1 and on observing ails β is incremened by 1. This is same as Bayesian inference using a Bea disribuion. For each Bea disribuion J( k in he belief sae B k of acor A k, here are wo parameers α( and β(. Based on he (3) communicaion A k observes, he/she updaes he parameers for all J( k in B k. Each email observed by he acor is a Bernoulli rial, where success for he Bernoulli L( is realized only if he email is from A x wih A y being a recipien. We associae he parameer α( wih he number of successes, i.e. he number of emails, observed by A k, ha have been sen by A x o A y, and parameer β( wih failures i.e. number of emails, observed by A k, ha are eiher no sen by A x or A y is no a recipien or boh. Over ime, as A k observes communicaion in he nework, he/she mainains he parameers (for he various Bea disribuions) by incremening he corresponding parameers each observaion ime. This is equivalen o Bayesian inference using Bea disribuion, based on he observed communicaion. 3.3 Priors Selecion This sub-secion addresses he issue of selecing priors for he parameers of each of he disribuions J( k in a given belief sae B k. We choose priors such ha α(δ x ε y x and β(1- α(, where δ x is he prior probabiliy for A x being he sender of an email and ε y x is he prior probabiliy for A y being a recipien given ha A x has sen he email. Each probabiliy in an acor s belief sae is expressed as a fracion of he communicaion in he nework. Hence, he sum of he expeced probabiliies for all communicaions mus always be greaer han or equal o 1 and less han or equal o (N-1) (see appendix A). Since, he evens of differen acors being senders is muually exclusive, we mus have δ i 1. A simple soluion is o go wih uniform priors where each δ x 1/N, N being he number of acors. For ε y x also we choose uniform priors i.e. he evens of A y being a recipien or no, given ha he email has been sen by A x, can be equally likely. In his case we assign ε y x 0.5. Thus, a simple soluion for priors is α(1/2n and β(1- (1/2Ν). This choice of priors will no have much effec as he number of observaions is usually relaively large compared o he prior magniudes, which resuls in he washing ou of priors. Thus small iniial values of α( and β( imply he low confidence in he priors. In he modeling of communicaion probabiliies as Bernoulli disribuions, we implicily assume ha hese probabiliies are consan over ime. However, his may no hold, since as emails are sen ou over ime, hese probabiliies may change. This is illusraed in figure 4-a, where, he curve corresponds o he (a) (b) Figure 4. Communicaion probabiliy as a Bernoulli disribuion compared wih he acual probabiliy acual communicaion probabiliies a each insance from ime 0 o and he sraigh line represens he same communicaion probabiliy modeled as a Bernoulli disribuion, a ime. To

address his issue, we choose o capure he dynamic naure of he communicaion probabiliies using ime slicing. Disinc ime inervals of lengh δ each are defined and he communicaion probabiliies are assumed o be consan over each ime inerval. Hence, he Bea disribuion parameers are mainained for a lengh of one ime inerval onl using he observaions made in ha ime inerval. For each new ime inerval, one can eiher compleely wash ou he previous observaions and sar wih fresh defaul prior values for he parameers or one may choose o scale down he parameers obained a he end of ime inerval -1, and use hem as priors for he nex ime inerval. This is bu an approximaion owards capuring he dynamic naure of he communicaion probabiliies, wih he accuracy inversely dependen on he lengh of he ime window (see figure 4-b). The opimum size of he ime window will usually depend on he naure of he social nework whose acors beliefs are being modeled. To model he emporally varying naure of beliefs, we denoe he belief sae of an acor a ime as B k,. Formall he belief sae for he given acor A k a he given ime, is defined as, B k, {J() k, for all order pairs (A i,a ) such ha A i A }. Here J() k, is he Bea disribuion, for a given ordered pair of acors (A i,a ), mainained by he acor A k, a ime. The belief sae of a given acor a ime reflecs wha he acor believes o be he probabiliies of he possible srenghs of differen acor communicaions in he nework a ime. A socio-cogniive nework a a given ime is he se of belief saes of all acors a ha ime. We sar wih an iniial belief sae for each acor, i.e. B k,0 for all 1 k N, wih parameers for all disribuions having defaul prior values. As acors observe email communicaion, hey updae heir belief saes. In a socio-cogniive nework, we also inroduce a superacor, i.e. an acor who observes all he communicaion in he nework. Such an acor is analogous o an eniy such as he email server. A S he super-acor observes all, is beliefs represen he complee and rue knowledge. Hence, we will use his super-acor as a benchmark when we compare acors beliefs wih realiy. This conforms o he closed world assumpion where all ha has occurred is observed by he superacor and anyhing ha he super-acor has no observed does no occur. The super acor is reaed similar o any oher acor w.r.. assigning priors and ime slicing. The Bea disribuions in he belief sae of an acor are compleely expressed hrough heir parameers. Therefore, in order o mainain he belief saes one needs o mainain a se of couners corresponding o he various parameers. For mainaining each of he α( parameers a se of N(N-1) couners is used; one for each Bea disribuion. In addiion, a couner is mainained for he oal number of emails observed by each acor. The β( parameer can be obained by subracing he α( couner from he oal number of emails couner. Couners for each of he α( parameers are iniialized wih heir corresponding priors and he oal number of emails couner sars wih an iniial value of (α(+β(). All couners mainained by an acor keep rack only of he communicaion observed by ha acor. Thus, he mainenance of all acors as well as he super-acor s belief saes is implemened. Algorihm 1 shows he process for belief updae. 3.4 Time Complexiy Analysis Consider an email sen in he communicaion nework. Le he average number of recipiens in he To, Cc and Bcc fields be n o, n cc and n bcc, respecively. The sender and he super-acor will be able o perceive all he recipiens of he mail and so will updae a oal of 2(n o +n cc +n bcc ) parameers (all ordered pairs of he form (sender, x) where x is an acor in he To, Cc and Bcc fields) in heir belief saes. Each Bcc recipien observes all recipiens in he To and Cc fields as well as iself, hence i will updae n o +n cc +1 parameers. This makes a oal of (n bcc )(n o +n cc +1) updaes for all Bcc recipiens. The acors appearing in he To and Cc fields will no observe he Bcc field acors as recipiens. Hence hey will updae n o +n cc enries each in heir belief saes. This makes a oal of (n cc +n o )(n cc +n o ) updaes for he To and Cc recipiens. Also, each acor updaes is couner of oal emails observed, which resuls in anoher n o +n cc +n bcc +2 updaes. Thus, he email exchange causes an aggregae of, 2(n bcc +n cc +n o )+(n cc +n o )(n cc +n o )+(n bcc )(n o +n cc +1)+(n o +n cc + n bcc +2) parameer updaes. Thus, he average complexiy of belief updae for an email exchange is O(n 2 ) where, n is he average number of recipiens of an email. 4. ANALYSIS OF BELIEF CONFLICT AND DIVERGENCE FROM REALITY Wha an acor perceives regarding he communicaion nework around i can be quie differen from wha is acually happening. Also, wha one acor perceives abou he social nework can be quie differen from anoher acor s percepion. In his secion, we presen wo ypes of analyses one can perform using he socio-cogniive nework consruced above. The firs is he

divergence of acors beliefs from realiy and he second is conflic in differen acors beliefs. Bu firs we define he concep of divergence or closeness beween wo belief saes. 4.1 Divergence across Beliefs and Belief Saes Consider wo acors A x and A y wih belief saes B and B a ime. In compuing he conflic beween B wih B we compare he beliefs in one se wih he corresponding beliefs in he oher se. For his he KL-divergence across he expeced Bernoulli disribuions for he wo respecive beliefs is compued. The expeced Bernoulli disribuion for a belief is he expecaion of he Bea disribuion corresponding o ha belief. If J(a,b) k, is he Bea disribuion, hen he corresponding expeced Bernoulli disribuion can be denoed by E[J(a,b) k, ], which is obained by normalizing he parameers of Bea disribuion J(a,b) k,. E[ J ( k, α ( k, ] [ α ( + β ( k, k, β ( k,, α ( + β ( KL-divergence is an informaion-heoreic measure for measuring direced divergence beween wo probabiliy disribuions. KL-divergence of a probabiliy disribuion p from anoher disribuion q, denoed as KL(q p), is given as, KL( q p) x q( x) q( x) log p( x) I is an asymmeric measure and does no obey he riangle law of inequaliy. To remove asymmer symmeric KL-divergence KL sym (q p) is defined as, k, KL sym ( q p) KL( q p) + KL( p q) DEFINITION 1. The divergence of one belief, expressed by he Bea disribuion J(a,b), from anoher, expressed by J(a,b), a a given ime, is defined as, p 1 p KL( E[ J( a, b) x E[ J ( a, b) ) p log + (1 p) log q 1 q Where, α( p α( + β (, K and k, α ( q α( + β ( According o [1] Bregman divergences beween corresponding feaures of wo eniies can be aggregaed ino a single divergence (which can be aken as he divergence beween he wo eniies hemselves) by aking a convex combinaion of he Bregman disances for each of he feaures. KL-divergence is a Bregman divergence and differen beliefs are analogous o feaures of a belief sae. Thus, we deermine he divergence across wo belief saes by aking a convex combinaion of he KL-divergences across corresponding beliefs. DEFINITION 2. The divergence of a belief sae B from he belief sae B for wo acors A y and A x respecivel a a given ime is defined as, div( B, B ) ( a, b) ( B B ) KL( E[ J ( a, b) n( B B ] E[ J ( a, b) ) ]) ] (4) K (5) Noe ha he numeraor in div(b,b ) sums up KL-disances for only hose beliefs which belong o he inersecion of he belief saes B and B. The denominaor is a normalizaion facor which ensures ha div(b,b ) does no depend on he number of inersecing beliefs since, he number of inersecing beliefs may vary for differen pairs of acors. This is equivalen o aking a convex combinaion of he KL-divergences obained for differen beliefs in he belief sae, where beliefs ha are no in he inersecion are given a weigh of 0 and all he oher beliefs are given a uniform weigh of n( B x B ). 1, The inersecion of B and B conains he beliefs corresponding o only hose communicaions for which boh A x and A y have observed a leas one insance. For example, if boh A x and A y have observed a leas one email from acor A i wih A being a recipien, hen he belief corresponding o his communicaion is included in he inersecion of B and B. If eiher one of A x or A y has no observed even one insance of his communicaion hen ha belief would no be included in he inersecion of B and B. The following are he reasons for considering only hose beliefs which are in he inersecion (of he belief saes) raher han all of hem. The communicaion nework will normally be quie sparse, i.e. ou of all possible ordered pairs of acors only a few of hem will acually communicae. Therefore, he belief saes of he acors being compared will be even sparser and he beliefs associaed wih maoriy of communicaions will be derived from he prior values for boh he acors. In such a case, i is desirable o consider only hose beliefs for which boh he acors have observed a leas one insance. This is analogous o deermining documen similari where one compues similariy based only on hose words ha are presen in boh he documens. Anoher reason for using he inersecion of beliefs is ha, if he whole se of beliefs is considered while compuing divergence across wo belief saes, one implicily assumes ha he wo acors are aware of he presence of all acors (since hey enerain beliefs regarding every possible pair of communicaion) in he social nework, which is quie unrealisic. An inerpreaion of he proposed divergence measure is as follows: Consider hree acors A x, A y and A z. Suppose we wan o deermine how divergen are A y s and A z s belief saes from ha of A x s. If A y and A x have few beliefs in common, bu low divergence for each of hese few common beliefs hen heir belief saes are closer han hose of A z and A x, who have a relaively larger number of common beliefs wih greaer divergence across hem. The concep of divergence across belief saes of wo acors is an open issue and muliple inerpreaions and argumens are possible, many of which can be inuiively appealing. However, no all of hem can be capured wih us one ype of measure. If wo acors have no beliefs in common, i.e. he inersecion of heir belief saes is empy hen we ake he divergence beween heir belief saes o be infiniy. Noe ha he divergence across wo belief saes is an asymmeric measure i.e. div(b,b ) div(b,b ). 4.2 a-closeness Measure In addiion o being asymmeric he divergence of one belief sae from anoher, div(b,b ) ranges from 0 o infiniy. This makes i inconvenien o work wih. Therefore, using his divergence measure we define a closeness measure which we call a-closeness.

DEFINITION 3 (a-closeness). The a-closeness measure is defined as he level of agreemen beween wo given acors A x and A y wih belief saes B and B respecivel a a given ime and is given b 1 a closeness( B, B ) K (6) 1+ div( B, B ) + div( B, B ) Noe ha a-closeness beween wo belief saes is symmeric and ranges from 0 o 1, wih 0 indicaing minimum and 1 indicaing maximum closeness beween he wo belief saes. If wo belief saes have no beliefs in common hen heir a-closeness measure is 0 (corresponding o infinie divergence across he belief saes). The mean a-closeness across all ordered pairs of all (or a group) of acors is an indicaion of he general consensus among hem. Lower values for mean a-closeness indicae lower levels of consensus, and higher values indicae higher levels of consensus beween he acors. The sandard deviaion across all acors, for a-closeness is indicaive of he variance in he agreemen of acors in he nework. 4.3 r-closeness Measure The second ype of analysis on a socio- cogniive nework is o measure how close is an acor s belief sae o realiy (i.e. belief sae of super-acor). For his purpose we define he r-closeness measure. DEFINITION 4 (r-closeness). The r-closeness measure is defined as he closeness of he given acor A k s belief sae B k, o realiy a a given ime and i is given b 1 r closeness( Ak ) 1+ div( B S,, B k, K (7) ) Where B S, is he belief sae of he super-acor A S a ime. Similar o a-closeness, for r-closeness we consider only hose beliefs presen in he inersecion of he acor s and he superacor s belief saes i.e. only hose beliefs corresponding o he communicaions for which he acor has observed a leas one email. Philosophically he proposed r-closeness measure can be inerpreed as he following: An acor who has accurae beliefs regarding only few communicaions is closer o realiy han some oher acor who has a relaively larger number of less accurae beliefs. Thus, deph of knowledge is preferred over breadh of knowledge. Noe ha r-closeness(a k ) uses he asymmeric divergence of A k s belief sae from ha of he super-acor. The measure r-closeness ranges from 0 o 1, where 1 implies maximum closeness and 0 implies maximum divergence from realiy. The mean r-closeness across all acors provides an aggregae measure of he general knowledge or level of percepion in he nework. The higher he mean r- closeness, he more acors in he nework know abou oher acors communicaions, i.e. he communicaion is ransparen. A lower mean value for r-closeness indicaes ha acors generally have mispercepions regarding oher acors communicaions. This will normally be observed for a large social nework consising of various diverse groups, where i is difficul for a single acor o capure all he communicaion in he nework. The sandard deviaion for r-closeness across differen acors indicaes he variance in he levels of percepion by hem. 5. APPLICATIONS OF PROPOSED MEASURES 5.1 r-closeness Krackhard [4] has explained ha an acor s percepion of who communicaes wih whom is a funcion of his social posiion. In case of an organizaional environmen, i is believed ha op acors in he formal organizaional hierarchy have beer knowledge abou communicaion han lesser acors and hence beer percepions abou he social nework. In oher words, execuive managemen has a beer percepion of he social nework as compared o employees. Inuiively i is also expeced ha, he more communicaion an acor observes he beer will be his/her percepions regarding he social ineracions occurring in he organizaion. The proposed r- closeness measure can be used o es such hypoheses, where he acor s percepion of realiy needs o be quanified. 5.2 a-closeness The a-closeness measures he similariy in social percepions of wo acors. Therefore, one poenial applicaion of he a- closeness measure is o consruc a graph of who agrees wih whom. In such a graph, acors are represened by nodes and an edge is presen beween wo acors only if heir a-closeness measure is more han a cerain hreshold. I is an undireced graph. The who agrees wih whom graph has very useful applicaions. Classical social nework analysis echniques developed for social neworks (or he socio-cenric graph) can be applied on he who agrees wih whom graph, resuling in a whole new class of analysis echniques wih differen domain inerpreaions. For example, cliques are idenified in order o deermine srong communiies or groups in a social nework Analogous o his, cliques in he agreemen graph reveals clusers of acors wih similar percepions. The acors in each of hese clusers may have similar roles or seings (spaial proximi same proec or deparmen). Oher popular srucures o look for are he bow ie (signifying ariculaion poins), sar (cenral acors) and bridge. Degree cenraliy measures he imporance of an acor based on is locaion in he social nework. In he who agrees wih whom graph, degree cenraliy can be used o deermine he mos imporan acor based on he number of oher acors sharing is percepions. 6. EXPERIMENTS AND RESULTS 6.1 Enron Email Corpus The Enron email corpus is he se of emails belonging o abou 151 users, mosly senior managemen of Enron, exchanged beween mid-1998 and mid-2002 (approximaely 4 years). The ime period includes he Enron crisis which broke ou in Ocober 2001. The cleaned email corpus (no email aachmens and no inegriy issues) is publicly available and can be downloaded from hp://www.cs.cmu.edu/~enron/. For our experimens, we chose o use a furher clean version, in which duplicae, erroneous and unk emails have been removed. I is made available by Shey and Abidi [7]. I consiss of 252,759 email messages for he se of 151 users. For experimenal analysis, we chose o use he enire se of 151 users and only hose emails which were exchanged beween hese 151 users. There were 20,311 such emails. To capure he dynamic naure of he communicaion probabiliies, ime was divided ino

uniform inervals of 1 monh. A he beginning of each monh, all Bea disribuion parameers were assigned as he defaul prior values. 6.2 Experimenal Resuls Figure 5 shows a plo of he number of emails exchanged among he 151 acors for each monh. The ime values range from 6, 1999 o 6, 2002. For he ime periods before 6, 1999 here were no emails exchanged among he 151 acors. I is observed ha he email aciviy increased during he monh of crisis i.e. Ocober, 2001. num ber of m ails 3500 3000 2500 2000 1500 1000 500 0 6_1999 8_1999 10_1999 Number of mails exchanged per monh 12_1999 2_2000 4_2000 6_2000 8_2000 10_2000 12_2000 2_2001 ime 4_2001 6_2001 8_2001 10_2001 12_2001 2_2002 4_2002 6_2002 Figure 5. Number of mails exchanged in every monh 6.2.1 r-closeness The r-closeness across acors is examined for wo differen monhs. We choose o compare he monh of crisis, Ocober 2001 wih a relaively normal monh in he life of he organizaion, Ocober 2000. For each of hese monhs, users were ranked in he decreasing order of r-closeness. Using he rankings for Ocober 2000, we sudy wo hypoheses (discussed in secion 5.1) of ineres o sociologiss. H1. As one moves higher up he organizaional hierarch acors have a beer percepion of he social nework. From he r-closeness rankings, i is observed ha maoriy of he op posiions were occupied by employees. In fac, ou of he op 10 ranks only 2 of hem a he 8 h and 10 h posiions are execuive managemen users. Oher ranks were occupied by 6 employees, 1 in house lawyer and 1 user wih an unidenified designaion. Some of he higher level execuives were communicaively inacive and herefore had very few percepions. H2. The more communicaion an acor observes, he beer will be his/her percepion regarding he social nework I was observed ha even hough some acors observed a lo of communicaion, hey were sill ranked low in erms of r- closeness. A reason was ha hese acors focus more on cerain communicaions and less on he res. As a resul heir percepions regarding he social nework are skewed owards hese favored communicaions. Execuive managemen acors who observed a lo of communicaion showed a endency for his skewed percepion behavior. For Ocober, 2000, he acors can be roughly divided ino hree caegories. The firs caegory consiss of acors who are communicaively acive and observe a lo of diverse communicaions. These acors occupy he opmos posiions in he rankings. They are followed by he second caegory acors Ranks Table 1.Users in differen rank ranges of r-closeness (Oc2000) 1-10 2.6% (1) 11 50 28.9% (11) 51-151 68.5% (26) Toal (38) Ranks No Available Employees 14.6% (6) 34.1% (14) 51.3% (21) (41) Higher Managemen 0% (0) 21.4% (6) 78.6% (22) (28) Execuive Managemen 6.9% (2) 24.1% (7) 69% (20) (29) Table 2.Users in differen rank ranges of r-closeness (Oc2001) No Avail- Able 1-10 7.9 % (3) 11 50 21.1 % (8) 51-151 71% (27) Toal (38) Emplo- yees 9.75% (4) 17.1% (7) 73.15% (30) (41) Higher Managemen 0% (0) 25% (7) 75% (21) (28) Execuive Managemen 10.3% (3) 55.2% (16) 34.5% (10) (29) Ohers 6.67% (1) 13.33% (2) 80% (12) (15) Ohers 0% (0) 13.33% (2) 86.67% (13) (15) who also observe a lo of communicaion; however, heir observaions are skewed which in urn leads o skewed percepions. The hird caegory consiss of acors who are communicaively inacive and hardly observe any of he communicaion. These acors have low r-closeness values and are a he boom of he rankings able. Table 1 summarizes he percenages of various acors (according o heir formal posiions) in he differen ranges of r-closeness rankings. mean r-closeness 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 6_1999 8_1999 10_1999 mean r-closeness agains ime 12_1999 2_2000 4_2000 6_2000 8_2000 10_2000 12_2000 2_2001 ime Figure 6. Mean r-closeness agains ime 4_2001 6_2001 8_2001 10_2001 12_2001 2_2002 4_2002 6_2002 The rankings for he crisis monh Oc, 2001 are significanly differen from hose of Oc, 2000. Noe he increase in he percenage of senior execuive managemen level acors in he op 50 ranks a he cos of some of he employees being pushed down. Thus, we observe a shif from he normal behavior, indicaing ha communicaion perceived by mos execuive managemen acors were more diverse and evenly disribued as

compared o he skewed or no percepions in Oc 2000. A reason for his could be one poined ou in [3] viz. during he crisis monh emails were exchanged across differen levels of formal hierarchy in he organizaion. Table 2 summarizes he saisics for r-closeness rankings for he monh of Oc, 2001.Figure 6 is a plo of mean r-closeness of all acors over ime. An ineresing paern is ha he mean r-closeness peaks during he crisis monh of Ocober 2001, indicaing a general increase in he percepion of social ineracions during he crisis period. Afer he crisis period, mean r-closeness drops down. 6.2.2 a-closeness Using he a-closeness measure we consruced he agreemen graph for he monhs of Oc, 2000 and Oc, 2001. An edge was drawn beween wo acors only if he a-closeness beween hem was more han a hreshold of 0.95. A big organizaion like Enron, has many inra-organizaional groups. Normall he inra-group communicaion is high and he iner-group communicaion is low. Therefore, one can expec he who agrees wih whom graph o consis of many small, disoin componens of users. This is precisely wha we observe for he Oc, 2000 graph (Figure 7). Figure 7. Agreemen graph for Oc, 2000 Figure 8. Agreemen graph for Oc, 2001 Noice some of he srucures i.e. cliques, bowies and sars which can be found in he graph. For Oc, 2001 he agreemen graph (Figure 8) mainly consiss of one large conneced componen, quie dense a he cener and encompassing mos of he acors. This indicaes ha he ineracion beween employees during he crisis monh crossed all boundaries which resuled in high overlap in heir social percepions. Figure 9 shows a plo of he mean a-closeness, across all pairs of acors agains ime. Noe he very low value for mean a- closeness in he monh of Oc, 2000, as expeced in a big organizaion like Enron having various inra-organizaion groups. There is a sharp increase in mean a-closeness for he monhs of Oc, 2001 and Feb, 2002. Oc, 2001 was he crisis monh and Feb, 2002 was he monh when he invesigaions regarding he crisis occurred. M ean a-closeness across acors 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0.00 Mean a-closeness agains ime 6_1999 8_1999 10_1999 12_1999 2_2000 4_2000 6_2000 8_2000 10_2000 12_2000 2_2001 Time 4_2001 6_2001 8_2001 10_2001 12_2001 2_2002 4_2002 6_2002 Figure 9. Mean a-closeness across acors ploed agains ime 7. CONCLUSIONS AND FUTURE DIRECTIONS The populariy of online social neworks and he abiliy o collec gigabyes of unbiased social informaion abou individuals provides a unique opporuniy for compuer scieniss o develop new compuaional echniques for mining social nework paerns. In his paper, we provide a compuaional model for modeling individual s percepions of communicaion beween oher individuals in an email-based social nework. Two new measures are proposed namely r- closeness, o capure he divergence of an individual s percepions from reali and a-closeness, o capure he difference of percepions beween differen individuals. An agreemen graph is proposed o capure relaionships of similar percepions beween individuals. The use of hese measures o find ineresing paerns from social nework perspecive is illusraed using he Enron email daa. Such compuaional echniques for socio-cogniive analysis can have widespread impac in he undersanding of social neworks. A possible fuure research direcion is o explore furher analysis echniques (eg. cenraliy measures) for he agreemen graph. Anoher fuure research direcion is o exend he proposed approach eg. assigning differen weighs o a recipien based on wheher he he/she is in To or Cc or Bcc fields. 8. ACKNOWLEDGEMENTS The auhors would like o hank Dr. Lyle Ungar for his helpful commens on his research. Nishih Pahak s work was suppored by he Army High Performance Compuing Research

Cener (AHPCRC) under he auspices of he Deparmen of he Arm Army Research Laboraory (ARL) under Cooperaive Agreemen number DAAD19-01-2-0014. Sandeep Mane s research was suppored by NSF gran No. IIS-0431141. 9. REFERENCES [1] A. Baneree, S. Merugu, I. Dhillon and J. Ghosh. (2005). Clusering wih Bregman Divergences. Journal of Machine Learning Research (JMLR),vol. 6, 705-749. [2] Cross, R., Nohria, N. and Parker, A. Six Myhs Abou Informal Neworks and How To Overcome Them. Sloan Managemen Review, 43(3), pp. 67-76, 2002. [3] Diesner, J., & Carle K.M. (2005). Exploraion of Communicaion Neworks from he Enron Email Corpus. Proc. of Workshop on Link Analysis, Counererrorism and Securi SIAM Inernaional Conference on Daa Mining 2005, pp. 3-14. Newpor Beach, CA, April 21-23, 2005 [4] Krackhard, D., (1990). Assessing he poliical landscape: srucure, cogniion, and power in organizaions. Adminisraive Science Quarerly 35, 342-369. [5] Krackhard, D. and Hanson, J. Informal Neworks: The Company behind he Char. Harvard Business Review, 104-111, July-Augus, 1993. [6] Klim, B. and Yang, Y. (2004). Inroducing he Enron corpus. Firs Conference on Email and Ani-Spam (CEAS). [7] Jiesh Shey and Jafar Adibi (2004). The Enron email daase daabase schema and brief saisical repor. Technical Repor, ISI, Universiy of Souhern California. [8] Jiesh Shey and Jafar Adibi (2005) Discovering Imporan Nodes hrough Graph Enropy - The Case of Enron Email Daabase. In Proc. of LinkKDD, in conuncion wih he 11h ACM SIGKDD. [9] Wellman, B (2001). Compuer Neworks as Social Neworks. Science, 293(14). [10] Wasserman, Sanley and Faus, Kaherine. (1994) Social Nework Analysis Mehods and Applicaions. Cambridge Universiy Press. APPENDIX A. VALID BELIEF STATES Consider he expeced Bernoulli disribuion E[J(] using he Bea disribuion J(, in an acor s belief sae. The parameer of E[J(] is he expeced communicaion probabiliy E[P(], according o he acor, given b α ( E[ P( ] where α( and β( are he α( + β ( parameers of Bea disribuion J(. Since, communicaion probabiliies are defined as fracions of he oal communicaion, We mus have, E [ P( )] 1 Recall, ), P ( i, ) P P i i Each P i has a maximum value of 1, which gives, P i ( N 1) Since, Pi 1 i ), 1, we mus have, E[ P( )] ( N 1), where N is he number of acors If he expeced communicaion probabiliies in he belief sae of an acor do no saisfy he above inequaliy hen we say ha he acor s belief sae is invalid (i.e. he paricular se of expeced communicaion probabiliies inferred by he acor canno acually exis). PROPOSITION 1. If he prior probabiliies of a belief sae are such ha he belief sae is valid, hen he poserior probabiliies will also resul in a valid belief sae. Le he communicaion probabiliy P() have prior probabiliy x i. Suppose α() x i and β() 1-x i. Then he expeced communicaion probabiliy will be, α ( ) E [ P( )] α( ) + β ( ) Also assume ha he priors correspond o a valid belief sae i.e. 1 x i ), ( N 1) (8) Suppose, in a ime inerval, an acor observes M emails ou of which k i are emails from acor A i o A. We have, M k i ), M + 1 ( x i + k i ) (9) (from 8) k i ), ), x i is maximum when every email, from some acor, is addressed o every oher acor. ki M ( N 1) ), ( x i ), + k ) M ( N 1) + ( N 1) (10) (from 8) From (9) and (10) we have, 1 ), i xi + ki ( ) ( N 1) M + 1 x Bu, i + ki x, where x i is he expeced poserior i M + 1 probabili E[P()] poserior. 1 i ), x N 1. The above proof also holds for he case when priors for Bea disribuion parameers do no sum up o 1 i.e. α( rx i and β( r- α(, where r is some real valued scaling facor indicaing he confidence in he prior probabiliy x i The priors x i can be expressed a produc δ i ε i, where δ i is prior for P i and ε i is prior for P i. In some cases insead of direcly working wih x i, i migh be easier o fix δ i and ε i such ha (8) is saisfied.