Graph Analytics. Modeling Chat Data using a Graph Data Model. Creation of the Graph Database for Chats

Similar documents
Graph Analytics. Modeling Chat Data using a Graph Data Model. Creation of the Graph Database for Chats

I) write schema of the six files.

Acquiring, Exploring and Preparing the Data

Data Exploration. The table below lists each of the files available for analysis with a short description of what is found in each one.

Introduction to Scratch

Analytical reasoning task reveals limits of social learning in networks

Elliotte Rusty Harold August From XML to Flat Buffers: Markup in the Twenty-teens

Virtual Platform Checklist for Adobe Connect 9

THE IMPACT OF DMO WEBSITES. March 2017

CLIENT ONBOARDING PLAN & SCRIPT

CLIENT ONBOARDING PLAN & SCRIPT

This lab will introduce you to MySQL. Begin by logging into the class web server via SSH Secure Shell Client

A subquery is a nested query inserted inside a large query Generally occurs with select, from, where Also known as inner query or inner select,

Oracle 11g Invisible Indexes Inderpal S. Johal. Inderpal S. Johal, Data Softech Inc.

How To Use Request for Quote (RFQ)

Relational databases

THE IMPACT OF DMO WEBSITES A Look to the Future. March 2017

Dr. Chuck Cartledge. 5 Nov. 2015

HOSTING A WEBINAR BEST PRACTICE GUIDE

Ideas Gallery - Sai Kishore MV (Kishu)

Good afternoon, everyone. Thanks for joining us today. My name is Paloma Costa and I m the Program Manager of Outreach for the Rural Health Care

Report Studio Tips and Tricks. Presenter: Olivier Pringault - Delivery Consultant & Lead Report Developer for APAC

Formatting Documents (60min) Working with Tables (60min) Adding Headers & Footers (30min) Using Styles (60min) Table of Contents (30min)

SAP BEX ANALYZER AND QUERY DESIGNER

BAT Quick Guide 1 / 20

Audio is in normal text below. Timestamps are in bold to assist in finding specific topics.

Neo4j. Neo4j's Cypher. Samstag, 27. April 13

QUICK START GUIDE. How Do I Get Started? Step #1 - Your Account Setup Wizard. Step #2 - Meet Your Back Office Homepage

Beginner s Guide To Direct Messages On Twitter

Time Series Live 2017

The ICT4me Curriculum

The ICT4me Curriculum

OKC MySQL Users Group

TIM 50 - Business Information Systems

Software change. Software maintenance

NOSQL Databases and Neo4j

Understanding Impact of J2EE Applications On Relational Databases. Dennis Leung, VP Development Oracle9iAS TopLink Oracle Corporation

Takeaways. Takeaways on subsequent pages give teachers a quick visual reference of different features within an app.

Azon Master Class. By Ryan Stevenson Guidebook #7 Site Construction 2/3

0x1A Great Papers in Computer Security

. social? better than. 7 reasons why you should focus on . to GROW YOUR BUSINESS...

Pentaho Analytics for MongoDB

Digital Marketing Manager, Marketing Manager, Agency Owner. Bachelors in Marketing, Advertising, Communications, or equivalent experience

THE URBAN COWGIRL PRESENTS KEYWORD RESEARCH

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2017 Quiz I

Index A, B. bi-directional relationships, 58 Brewer s Theorem, 3

T-SQL Training: T-SQL for SQL Server for Developers

Contributing to a Community

Fitting It In Here s how this chapter fits in to the book as a whole...

ELEVATESEO. INTERNET TRAFFIC SALES TEAM PRODUCT INFOSHEETS. JUNE V1.0 WEBSITE RANKING STATS. Internet Traffic

Inbound Website. How to Build an. Track 1 SEO and SOCIAL

CMPUT 391 Database Management Systems. Query Processing: The Basics. Textbook: Chapter 10. (first edition: Chapter 13) University of Alberta 1

2.3 Algorithms Using Map-Reduce

Federated XDMoD Requirements

Adding content to your Blackboard 9.1 class

Lesson 2. Introducing Apps. In this lesson, you ll unlock the true power of your computer by learning to use apps!

balancer high-fidelity prototype dian hartono, grace jang, chris rovillos, catriona scott, brian yin

BUYING SERVER HARDWARE FOR A SCALABLE VIRTUAL INFRASTRUCTURE

In math, the rate of change is called the slope and is often described by the ratio rise

Facebook Basics. Agenda:

Top of Minds Report series Data Warehouse The six levels of integration

Online Evaluation Tool: Dashboards and Reports

+21% QUOTES GENERATED for the Spanish Group

Getting started. Create event content. Quick Start Guide. Quick start Adobe Connect for Webinars

CTI-TC Weekly Working Sessions

1. E-Procurement Overview

MARKETING VOL. 1

Chat Reference Assignment

Sound the Alarm: Fundraising Toolkit

Database Design Debts

Make $400 Daily. With Only. 5 Minutes Of Work

How to do an On-Page SEO Analysis Table of Contents

Query Processing: The Basics. External Sorting

ETC WEBCHAT USER GUIDE

How to Find Your Most Cost-Effective Keywords

CSCI-UA: Database Design & Web Implementation. Professor Evan Sandhaus Lecture #17: MySQL Gets Done

CS224W: Analysis of Networks Jure Leskovec, Stanford University

CS/INFO 4154: Analytics-driven Game Design

BigTable. Chubby. BigTable. Chubby. Why Chubby? How to do consensus as a service

NoSQL Databases Analysis

Table Partitioning. So you want to get Horizontal? Brian Bowman OpenEdge Product Management

Bases de Dades: introduction to SQL (part 1)

HIGH-IMPACT SEO DIY IN 5 MINUTES SEO OR LESS. Digital Marketer Increase Engagement Series

1

Academic research on graph processing: connecting recent findings to industrial technologies. Gábor Szárnyas opencypher NYC

12B. Laboratory. Databases. Objective. References. Write simple SQL queries using the Simple SQL applet.

LABORATORY. 16 Databases OBJECTIVE REFERENCES. Write simple SQL queries using the Simple SQL app.

Consolidate Onsite & Remote Education

PROGRAMMING GOOGLE APP ENGINE WITH PYTHON: BUILD AND RUN SCALABLE PYTHON APPS ON GOOGLE'S INFRASTRUCTURE BY DAN SANDERSON

What will I learn today?

Data Mapper Manual. Version 2.0. L i n k T e c h n i c a l S e r v i c e s

Useful Google Apps for Teaching and Learning

Helpdesk. Features. Module Configuration 1/49. On - October 14, 2015

Kathleen Durant PhD Northeastern University CS Indexes

Rancher Part 4: Using the Catalog Example with GlusterFS

Storage hierarchy. Textbook: chapters 11, 12, and 13

Educational Fusion. Implementing a Production Quality User Interface With JFC

This study is brought to you courtesy of.

Triangles, Squares, & Segregation

2019 BE LOUD Breakfast: Table Captain Registration Guide

Transcription:

Graph Analytics Modeling Chat Data using a Graph Data Model This we will be using a graph analytics approach to chat data from the Catch the Pink Flamingo game. Currently this chat data is purely numeric, no text. Analytically, it can still serve a useful purpose in revealing certain types of behaviors which can only be observed within a graph analytics context. This data captures team chat sessions that occur while users play the Catch the Pink Flamingo game. The database schema tracks various actions that occur such as creating a team session, joining a session, leaving a session, and responding to or mentioning items in a chat. Creation of the Graph Database for Chats Database for Schema Chat Data chat_create_team_chat.csv User INT Identifies a unique user Team INT Identifies the team a User belongs to TeamChatSession INT Identifies a chat session the User created Timestamp FLOAT Time that the chat session was created. chat_join_team_chat.csv User INT Identifies a unique user TeamChatSession INT Identifies a chat session the User created Timestamp FLOAT Time that the User joined the chat session chat_leave_team_chat.csv User INT Identifies a unique user TeamChatSession INT Identifies a chat session the

User created Timestamp FLOAT Time that the User left the chat session chat_item_team_chat.csv User INT Identifies a unique user TeamChatSession INT Identifies a chat session the User created ChatItem INT Identifier for a Chat Item. Timestamp FLOAT Time that the ChatItem was created. chat_mention_team_chat.csv ChatItem INT Identifier for a Chat Item. User INT Identifies a unique user Timestamp FLOAT Time that the User was mentioned in the ChatItem. chat_respond_team_chat.csv ChatItem INT Identifier for the responding Chat Item. ChatItem INT Identifier for a Chat Item that was responded to. Timestamp FLOAT Time that the ChatItem was responded to. Loading the Chat Data into Neo4J chat_create_team_chat.csv LOAD CSV FROM "file:///chat_create_team_chat.csv" AS row MERGE (u:user {id: toint(row[0])}) MERGE (t:team {id: toint(row[1])}) MERGE (c:teamchatsession {id: toint(row[2])}) MERGE (u)-[:createssession{timestamp: row[3]}]->(c) MERGE (c)-[:ownedby{timestamp: row[3]}]->(t)

chat_join_team_chat.csv LOAD CSV FROM "file:///chat_join_team_chat.csv" AS row MERGE (u:user {id: toint(row[0])}) MERGE (c:teamchatsession {id: toint(row[1])}) MERGE (u)-[:joins{timestamp: row[2]}]->(c) chat_leave_team_chat.csv LOAD CSV FROM "file:///chat_leave_team_chat.csv" AS row MERGE (u:user {id: toint(row[0])}) MERGE (c:teamchatsession {id: toint(row[1])}) MERGE (u)-[:leaves{timestamp: row[2]}]->(c) chat_item_team_chat.csv LOAD CSV FROM "file:///chat_item_team_chat.csv" AS row MERGE (u:user {id: toint(row[0])}) MERGE (c:teamchatsession {id: toint(row[1])}) MERGE (i:chatitem {id: toint(row[2])}) MERGE (u)-[:createchat{timestamp: row[3]}]->(i) MERGE (i)-[:partof{timestamp: row[3]}]->(c) chat_mention_team_chat.csv LOAD CSV FROM "file:///chat_mention_team_chat.csv" AS row MERGE (i:chatitem {id: toint(row[0])}) MERGE (u:user {id: toint(row[1])}) MERGE (i)-[:mentioned{timestamp: row[2]}]->(u) chat_respond_team_chat.csv LOAD CSV FROM "file:///chat_respond_team_chat.csv" AS row MERGE (i:chatitem {id: toint(row[0])})

MERGE (j:chatitem {id: toint(row[1])}) MERGE (i)-[:responseto{timestamp: row[2]}]->(j) Screen Shots of Graph Data This graph shows the edges between Chat Items. match (n:chatitem)-[r]->(m) return n, r, m LIMIT 50 This graph shows the all of the ResponseTo edges. match (n)-[r:responseto*]->(m) return n, r, m LIMIT 50

Finding the longest conversation chain and its participants Description Find the longest conversation chain in the chat data using the "ResponseTo" edge label. This question has two parts, 1) How many chats are involved in it?, and 2) How many users participated in this chain? Results Path length: 9 Number of Unique Users: 10 Unique Users: 52694, 16478, 13704, 11312, 9744, 9435, 8194, 8044, 7816, 7803 Query START n=node(*) MATCH p=(n)-[:responseto*]->(m) WITH COLLECT(p) AS paths, MAX(length(p)) AS maxlength RETURN FILTER(path IN paths WHERE length(path)= maxlength) AS longestpaths Screenshot

Relevance Long chats can be used to identify users that have a lot of influence. These Maven users can be targeted with incentives to help sway opinions of users to certain purchases, etc. Identifying highly influential users is far more cost effective then appealing to all users and arguably as effective. Analyzing the relationship between top 10 chattiest users and top 10 chattiest teams Description Find the longest conversation chain in the chat data using the "ResponseTo" edge label. This question has two parts, 1) How many chats are involved in it?, and 2) How many users participated in this chain? Results Chattiest Users User 394 115 2067 111 209 109 Number of Chats Chattiest Teams Teams 82 1324 185 1036 112 967 Number of Chats

Users / Teams User Team Top 10 Team? 394 54 No 2067 93 No 209 97 No 1087 135 No 554 32 No 516 95 No 1627 18 Yes 999 86 No 668 134 No 451 82 Yes The data show there is only about a 20% probability that a chatty user belongs to a chatty team. This could be due to the fact that teams with more communication will have a more equal distribution of chats (like the San Antonio Spurs), where the super chatty team member will drown out the others (think Kobe on the Lakers). Queries Chatty Users START n=node(*) MATCH (n)-[:createchat*]->(m) return n.id as User, count(m) as Total ORDER BY Total DESC LIMIT 10 Chatty Teams START n=node(*) MATCH ()-[:PartOf]-()-[:OwnedBy]-(n) return n.id as Team, count(n) as Total ORDER BY Total DESC LIMIT 10 Screenshot Chatty Users

Chatty Teams Relevance This data shows that chatty users are not necessarily popular or influential. When targeting groups of users this should be remembered. You might get better cost efficiency but targeting tightly coupled groups rather than individuals. How Active Are Groups of Users? Description In this question, we will compute an estimate of how dense the neighborhood of a node is. In the context of chat that translates to how mutually interactive a certain group of users are. If we can

identify these highly interactive neighborhoods, we can potentially target some members of the neighborhood for direct advertising. We will do this in a series of steps. The degree of connectedness is measured by a connectedness coefficient that is a ratio of the number of relationships a user has compared to the maximum relationships that are possible. Results Most Active Users (based on Cluster Coefficients) User ID Coefficient 554 0.11904761904761904 461 0.5 668 0.15 2067 0.11904761904761904 516 0.2 394 0.3333333333333333 209 0.16666666666666666 999 0.09722222222222222 1627 0.14285714285714285 1087 0.2 Queries Graph Creation We will construct the neighborhood of users. In this neighborhood, we will connect two users if: One mentioned another user in a chat: Match (u1:user)-[:createchat]->()-[:mentioned]->(u2:user) create (u1)-[:interactswith]->(u2) One created a chatitem in response to another user s chatitem Match (u1:user)-[:createchat]->()-[:responseto]->()-[:mentioned]- >(u2:user) create (u1)-[:interactswith]->(u2) Then we delete self loops. Match (u1)-[r:interactswith]->(u1) delete r And finally delete all duplicate relationships. MATCH (a)-[r:interactswith]->(b) WITH a, b, TAIL (COLLECT (r)) as rr FOREACH (r IN rr DELETE r) Connectedness Coefficient Calculation

After much struggling, this was the final query I came up with. This one did not come easy. I found Cypher quite difficult. I can see the power of it, but it was not a natural fit for me. I did learn a great deal though and feel much more comfortable. MATCH (n)-[:createchat*]->(m) with {u: n.id, t: count(m)} as f ORDER BY f.t DESC LIMIT 10 with collect (f.u) as chatty match (d)-[:interactswith]-(b) with {u: d.id, n: collect(distinct b.id)} as row, chatty with collect (row) as neighbors, chatty match (n)-[r:interactswith]->(m) UNWIND neighbors as row with row.u as id, SUM(CASE WHEN (n.id = row.u and m.id in row.n) or (n.id in row.n and m.id = row.id) THEN 1 ELSE 0 END) as numedges, length(row.n) as k, chatty with {u: id, n: numedges, k: k} as row, chatty with collect(row) as rows, chatty with FILTER(row in rows WHERE row.u in chatty) as frows UNWIND frows as frow with frow.u as id, frow.n as numedges, frow.k as k return id as userid, tofloat(numedges)/tofloat(k * (k - 1)) as coefficient Screenshot