Graph Analytics Modeling Chat Data using a Graph Data Model The Pink Flamingo graph model includes users, teams, chat sessions, and chat item nodes with relationships or edges of a) creating sessions, owning sessions, joining chats, leaving chats, creating chat items, being part of chat session, mentioning users, and responding to users. Creation of the Graph Database for Chats These steps were taken to create the graph database. The following 6 files were loaded to the database ERD table: chat_create_team_chat A line is added to this file when a player creates a new chat with their team. Columns: userid, teamid, timestamp ERD table: chat_item_team_chat Creates nodes labeled ChatItems. Column 0 is User id, column 1 is the TeamChatSession id, column 2 is the ChatItem id (i.e., the id property of the ChatItem node), column 3 is the timestamp for an edge labeled "CreateChat". Also create an edge labeled "PartOf" from the ChatItem node to the TeamChatSession node. This edge should also have a timestamp property using the value from Column 3. Columns: userid, teamid, timestamp ERD table: chat_join_team_chat Creates an edge labeled "Joins" from User to TeamChatSession. The columns are the User id, TeamChatSession id and the timestamp of the Joins edge. Columns: userid, TeamChatSessionID ERD table: chat_leave_team_chat Creates an edge labeled "Leaves" from User to TeamChatSession. The columns are the User id, TeamChatSession id and the timestamp of the Leaves edge. Columns: userid, chatid, timestamp ERD table: chat_mention_team_chat Creates an edge labeled "Mentioned". Column 0 is the id of the ChatItem, column 1 is the id of the User, and column 2 is the timestamp of the edge going from the chatitem to the User. Columns: ChatItem, userid, timestamp
ERD table: chat_respond_team_chat A line is added to this file when a player responds to a chat post. Columns: userid1, userid2 Data was loaded after constraints were established for each node type. Here is a partial script: CREATE CONSTRAINT ON (u:user) ASSERT u.id IS UNIQUE; CREATE CONSTRAINT ON (t:team) ASSERT t.id IS UNIQUE; CREATE CONSTRAINT ON (c:teamchatsession) ASSERT c.id IS UNIQUE; CREATE CONSTRAINT ON (i:chatitem) ASSERT i.id IS UNIQUE; LOAD CSV FROM "file:/chat_create_team_chat.csv" AS row MERGE (u:user {id: toint(row[0])}) MERGE (t:team {id: toint(row[1])}) MERGE (c:teamchatsession {id: toint(row[2])}) MERGE (u)-[:createssession{timestamp: row[3]}]->(c) MERGE (c)-[:ownedby{timestamp: row[3]}]->(t)
Below is a sample of the nodes and relationships in the database. Finding the longest conversation chain and its participants The longest conversation chain was queried with the code below. The longest chain is 11. match p=(a)-[:responseto*]-(c)
return length(p) as length_p order by length_p desc limit 1 The participants in the longest chain were pulled with this query. It shows that 429 users participated in the longest chain. match p=(a)-[:responseto*]-(c) where length(p) = 11 with p match (u)-[:createchat*]-(i) where i in nodes(p) return count(distinct u) Analyzing the relationship between top 10 chattiest users and top 10 chattiest teams The following scripts were used to discover the chattiest users and teams, and to determine if there were any intersections between the chatty users and teams. Chattiest Users Users 394 115 2067 111 1087 109 Number of Chats match (u)-[r:createchat]-(i) return u,count(r) as u_chat_cnt order by u_chat_cnt desc limit 10 Chattiest Teams Teams 82 1324 185 1,036 Number of Chats
112 957 match (i)-[:partof]-(c)-[:ownedby]-(t) return t,count(t) as t_chat_cnt order by t_chat_cnt desc limit 10 Were the chattiest users part of any of the chattiest teams? The 7 th chattiest user, #999, was part of the 7 th chattiest team, #52. Otherwise, the top 10 chattiest users were not in the chattiest teams. match (u)-[r:createchat]-(i)-[:partof]-(c)-[:ownedby]-(t) where u.id in [394,2067,1087,209,554,516,999,1627,461,668] and t.id in [82,185,112,18,194,129,52,136,146,81] return distinct u,t order by u.id, t.id How Active Are Groups of Users? To determine the most active chat groups, a cluster coefficient was used. For the chattiest users (top 10), the number of neighbors interacting with each of the top 10 were then analyzed for interaction amongst themselves. The number of interaction pairs were divided by n * (n-1) where n is the number of interacting neighbors. Below are the three chattiest users with the most active neighborhood. Most Active Users (based on Cluster Coefficients) User ID 394 & 461 1.00 (max) 209 & 516 0.95 554 0.90 Coefficient