Graph Analytics. Modeling Chat Data using a Graph Data Model. Creation of the Graph Database for Chats

Similar documents
I) write schema of the six files.

Graph Analytics. Modeling Chat Data using a Graph Data Model. Creation of the Graph Database for Chats

Acquiring, Exploring and Preparing the Data

Data Exploration. The table below lists each of the files available for analysis with a short description of what is found in each one.

SWEN-220 Mathematical Models of Software. Conceptual to Relational Mapping

FileMaker Exam FM0-306 Developer Essential for FileMaker 12 Version: 6.0 [ Total Questions: 198 ]

EXAM - FM Developer Essentials for FileMaker 12 Exam. Buy Full Product.

Actual4Test. Actual4test - actual test exam dumps-pass for IT exams

Oracle Compare Two Database Tables Sql Query Join

SQL: The Sequel. Phil Rhodes TAIR 2013 February 11, Concurrent Session A6

normalization are being violated o Apply the rule of Third Normal Form to resolve a violation in the model

CS/INFO 4154: Analytics-driven Game Design

T-SQL Training: T-SQL for SQL Server for Developers

Oracle Database 10g: Introduction to SQL

INTRODUCTION (SQL & Classification of SQL statements)

What will I learn today?

Oracle Database: Introduction to SQL

What are SQL Reports?

Querying Data with Transact SQL

Clarion Live Presentation. Employing SQL To Improve Data Quality October 23, 2015

Chapter 8: Working With Databases & Tables

Getting Started with Your Instructor Access License

Getting started. Create event content. Quick Start Guide. Quick start Adobe Connect for Webinars

TeamViewer 12 Manual Management Console. Rev

Configuring the Workspace

Report on Configurable Fields in Business Intelligence

Aaron Sun, in collaboration with Taehoon Kang, William Greene, Ben Speakmon and Chris Mills

Course Modules for MCSA: SQL Server 2016 Database Development Training & Certification Course:

Linksys Stackable Switches

Midterm Examination CS 265 Spring 2015 Name: I will not use notes, other exams, or any source other than my own brain on this exam: (please sign)

Training program. An OnCrawl Rocket Program training backed by your data and SEO team-oriented.

MTA Database Administrator Fundamentals Course

After completing this course, participants will be able to:

Module 1.Introduction to Business Objects. Vasundhara Sector 14-A, Plot No , Near Vaishali Metro Station,Ghaziabad

Database Management Systems,

Virtual Platform Checklist for Adobe Connect 9

My Query Builder Function

Querying Data with Transact-SQL (20761)

PROFESSIONAL LEARNING COMMUNITY (PLC)

Unit Assessment Guide

Implementing Table Operations Using Structured Query Language (SQL) Using Multiple Operations. SQL: Structured Query Language

Diffix: High Utility Database Anonymization

Time Series Live 2017

MIDTERM EXAMINATION Spring 2010 CS403- Database Management Systems (Session - 4) Ref No: Time: 60 min Marks: 38

Evolution-Oriented User-Centric Data Warehouse

TIM 50 - Business Information Systems

Developing and Deploying an Interactive Community Dashboard: An Empirical Window into Homelessness

INFO 1103 Homework Project 2

SQL Server Replication Guide

PostgreSQL what's new

(Zoom) Leader Guide to Breakout Rooms

Introduction to Queries, Outputs, and Scheduling. Introduction to Queries, Outputs and Scheduling

Dynamic Programming Group Exercises

Login to Oracle & Navigate to Sourcing Supplier Home Page... 1 Acknowledge Intent to Participate... 5 Submit Quote Online Discussions...

ListManager. ListManager Basic Training

What is KNIME? workflows nodes standard data mining, data analysis data manipulation

CHAT. This time I m going to go through the code for Paceville s chat in a chronological order from the user s point of view, as follows:

Guest Lecture. Daniel Dao & Nick Buroojy

How to be a Super Team Owner

A subquery is a nested query inserted inside a large query Generally occurs with select, from, where Also known as inner query or inner select,

Oracle 11g Invisible Indexes Inderpal S. Johal. Inderpal S. Johal, Data Softech Inc.

(Zoom) Leader Guide to Breakout Rooms

CS November 2018

How eharmony Turns Big Data into True Love Sridhar Chiguluri, Lead ETL Developer eharmony

CS November 2017

"Charting the Course... Oracle18c SQL (5 Day) Course Summary

*Sparksee. Slavomír Krupa, Oliver Mrázik, Martin Strhársky PA195. * It s a trap

TEAM MANAGEMENT SYSYTEM. An addon to Maestro. Monday, December 12, 2005

AO3 - Version: 2. Oracle Database 11g SQL

Database Usage (and Construction)

Review -Chapter 4. Review -Chapter 5

Learn about the Display options Complete Review Questions and Activities Complete Training Survey

Lassonde School of Engineering Winter 2016 Term Course No: 4411 Database Management Systems

Course Outline. Querying Data with Transact-SQL Course 20761B: 5 days Instructor Led

Oracle Database: Introduction to SQL

HOW PERSISTENT CHAT SERVER WORKS

Querying Microsoft SQL Server (461)

CS 390 Software Engineering Lecture 3 Configuration Management

Relational Databases. APPENDIX A Overview of Relational Database Structure and SOL

Alyssa Grieco. Data Wrangling Final Project Report Fall 2016 Dangerous Dogs and Off-leash Areas in Austin Housing Market Zip Codes.

SIMSme Management Cockpit Documentation

Access the Google Analytics Demo Account. b/demoaccount

Querying Data with Transact-SQL

Welcome! Power BI User Group (PUG) Copenhagen

An Online Interactive Database Platform For Career Searching

Oracle Database: Introduction to SQL

Modern Requirements4TFS 2018 Release Notes

Event Profile. You can edit your profile at any time by clicking Profile on the toolbar at the top of the screen.

A case study to introduce Microsoft Data Mining in the database course

Cognos Analytics Reporting User Interface

Sql Server Syllabus. Overview

Developing Microsoft SQL Server 2012 Databases

MeetMe Planner Design description. Version 2.2

20761B: QUERYING DATA WITH TRANSACT-SQL

One-to-One relationship - In this scenario both sides of the relationship have - unique values for every row.

Querying Data with Transact-SQL

The CHECKBOX Quick Start Guide

MarkLogic Server. Database Replication Guide. MarkLogic 9 May, Copyright 2017 MarkLogic Corporation. All rights reserved.

Jure Leskovec Including joint work with Y. Perez, R. Sosič, A. Banarjee, M. Raison, R. Puttagunta, P. Shah

Transcription:

Graph Analytics Modeling Chat Data using a Graph Data Model The Pink Flamingo graph model includes users, teams, chat sessions, and chat item nodes with relationships or edges of a) creating sessions, owning sessions, joining chats, leaving chats, creating chat items, being part of chat session, mentioning users, and responding to users. Creation of the Graph Database for Chats These steps were taken to create the graph database. The following 6 files were loaded to the database ERD table: chat_create_team_chat A line is added to this file when a player creates a new chat with their team. Columns: userid, teamid, timestamp ERD table: chat_item_team_chat Creates nodes labeled ChatItems. Column 0 is User id, column 1 is the TeamChatSession id, column 2 is the ChatItem id (i.e., the id property of the ChatItem node), column 3 is the timestamp for an edge labeled "CreateChat". Also create an edge labeled "PartOf" from the ChatItem node to the TeamChatSession node. This edge should also have a timestamp property using the value from Column 3. Columns: userid, teamid, timestamp ERD table: chat_join_team_chat Creates an edge labeled "Joins" from User to TeamChatSession. The columns are the User id, TeamChatSession id and the timestamp of the Joins edge. Columns: userid, TeamChatSessionID ERD table: chat_leave_team_chat Creates an edge labeled "Leaves" from User to TeamChatSession. The columns are the User id, TeamChatSession id and the timestamp of the Leaves edge. Columns: userid, chatid, timestamp ERD table: chat_mention_team_chat Creates an edge labeled "Mentioned". Column 0 is the id of the ChatItem, column 1 is the id of the User, and column 2 is the timestamp of the edge going from the chatitem to the User. Columns: ChatItem, userid, timestamp

ERD table: chat_respond_team_chat A line is added to this file when a player responds to a chat post. Columns: userid1, userid2 Data was loaded after constraints were established for each node type. Here is a partial script: CREATE CONSTRAINT ON (u:user) ASSERT u.id IS UNIQUE; CREATE CONSTRAINT ON (t:team) ASSERT t.id IS UNIQUE; CREATE CONSTRAINT ON (c:teamchatsession) ASSERT c.id IS UNIQUE; CREATE CONSTRAINT ON (i:chatitem) ASSERT i.id IS UNIQUE; LOAD CSV FROM "file:/chat_create_team_chat.csv" AS row MERGE (u:user {id: toint(row[0])}) MERGE (t:team {id: toint(row[1])}) MERGE (c:teamchatsession {id: toint(row[2])}) MERGE (u)-[:createssession{timestamp: row[3]}]->(c) MERGE (c)-[:ownedby{timestamp: row[3]}]->(t)

Below is a sample of the nodes and relationships in the database. Finding the longest conversation chain and its participants The longest conversation chain was queried with the code below. The longest chain is 11. match p=(a)-[:responseto*]-(c)

return length(p) as length_p order by length_p desc limit 1 The participants in the longest chain were pulled with this query. It shows that 429 users participated in the longest chain. match p=(a)-[:responseto*]-(c) where length(p) = 11 with p match (u)-[:createchat*]-(i) where i in nodes(p) return count(distinct u) Analyzing the relationship between top 10 chattiest users and top 10 chattiest teams The following scripts were used to discover the chattiest users and teams, and to determine if there were any intersections between the chatty users and teams. Chattiest Users Users 394 115 2067 111 1087 109 Number of Chats match (u)-[r:createchat]-(i) return u,count(r) as u_chat_cnt order by u_chat_cnt desc limit 10 Chattiest Teams Teams 82 1324 185 1,036 Number of Chats

112 957 match (i)-[:partof]-(c)-[:ownedby]-(t) return t,count(t) as t_chat_cnt order by t_chat_cnt desc limit 10 Were the chattiest users part of any of the chattiest teams? The 7 th chattiest user, #999, was part of the 7 th chattiest team, #52. Otherwise, the top 10 chattiest users were not in the chattiest teams. match (u)-[r:createchat]-(i)-[:partof]-(c)-[:ownedby]-(t) where u.id in [394,2067,1087,209,554,516,999,1627,461,668] and t.id in [82,185,112,18,194,129,52,136,146,81] return distinct u,t order by u.id, t.id How Active Are Groups of Users? To determine the most active chat groups, a cluster coefficient was used. For the chattiest users (top 10), the number of neighbors interacting with each of the top 10 were then analyzed for interaction amongst themselves. The number of interaction pairs were divided by n * (n-1) where n is the number of interacting neighbors. Below are the three chattiest users with the most active neighborhood. Most Active Users (based on Cluster Coefficients) User ID 394 & 461 1.00 (max) 209 & 516 0.95 554 0.90 Coefficient