Social Media Intelligence Text and Network Mining combined. Dr. Rosaria Silipo

Similar documents
Web Analysis in 4 Easy Steps. Rosaria Silipo, Bernd Wiswedel and Tobias Kötter

APPLYING THE POWER OF AI TO YOUR VIDEO PRODUCTION STORAGE

Powering Knowledge Discovery. Insights from big data with Linguamatics I2E

An Oracle White Paper October Oracle Social Cloud Platform Text Analytics

7 Techniques for Data Dimensionality Reduction

Sentiment Web Mining Architecture - Shahriar Movafaghi

SCALABLE KNOWLEDGE BASED AGGREGATION OF COLLECTIVE BEHAVIOR

Viewpoint Review & Analytics

DATA MINING II - 1DL460. Spring 2014"

Overview of Web Mining Techniques and its Application towards Web

Data Management Glossary

Google Marketing Boot Camp 3 Days

The Top 10 New Features in KNIME 2.8. Rosaria Silipo KNIME.com AG, San Francisco

LIDER Survey. Overview. Number of participants: 24. Participant profile (organisation type, industry sector) Relevant use-cases

UNIT-V WEB MINING. 3/18/2012 Prof. Asha Ambhaikar, RCET Bhilai.

GETTING STARTED WITH DATA MINING

KNIME for the life sciences Cambridge Meetup

Graph Mining and Social Network Analysis

Interactive Campaign Planning for Marketing Analysts

Kara Greenfield, William Campbell, Joel Acevedo-Aviles

D4 WHITEPAPER powered by people THREE METHODS OF EDISCOVERY DOCUMENT REVIEW COMPARED

Mining Social Media Users Interest

Microsoft SharePoint Server 2013 Plan, Configure & Manage

Deploying, Managing and Reusing R Models in an Enterprise Environment

USER GUIDE DASHBOARD OVERVIEW A STEP BY STEP GUIDE

Chuck Cartledge, PhD. 24 February 2018

Analysis of Nokia Customer Tweets with SAS Enterprise Miner and SAS Sentiment Analysis Studio

1. Inroduction to Data Mininig

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

Marketing Automation Functional Evaluation Guide

SAS. Contextual Analysis 13.2: User s Guide. SAS Documentation

Marketing Automation

Contextual Search using Cognitive Discovery Capabilities

Building Search Applications

USER GUIDE DESIGN A STEP BY STEP GUIDE

TEXT ANALYTICS USING AZURE COGNITIVE SERVICES

Microsoft Core Solutions of Microsoft SharePoint Server 2013

SAMPLE 2 This is a sample copy of the book From Words to Wisdom - An Introduction to Text Mining with KNIME

Nuix ediscovery Specialist

Part I: Data Mining Foundations

Overview. Data-mining. Commercial & Scientific Applications. Ongoing Research Activities. From Research to Technology Transfer

Qualitative Data Analysis Software. A workshop for staff & students School of Psychology Makerere University

Using the Force of Python and SAS Viya on Star Wars Fan Posts

Hortonworks DataPlane Service

Election Analysis and Prediction Using Big Data Analytics

INTRODUCTION... 2 FEATURES OF DARWIN... 4 SPECIAL FEATURES OF DARWIN LATEST FEATURES OF DARWIN STRENGTHS & LIMITATIONS OF DARWIN...

Taxonomy Tools: Collaboration, Creation & Integration. Dow Jones & Company

Foundations of Business Intelligence: Databases and Information Management

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper.

Exploratory Analysis: Clustering

INTRODUCTION TO BIG DATA, DATA MINING, AND MACHINE LEARNING

Data Mining: Approach Towards The Accuracy Using Teradata!

Information Management Fundamentals by Dave Wells

Privacy Challenges in Big Data and Industry 4.0

USC Viterbi School of Engineering

Case Study: Social Network Analysis. Part II

2 The IBM Data Governance Unified Process

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

An overview of Graph Categories and Graph Primitives

Management Information Systems Review Questions. Chapter 6 Foundations of Business Intelligence: Databases and Information Management

DIGIT.B4 Big Data PoC

DATA MINING AND DATABASE TECHNOLOGY (WEB MINING, TEXT MINING, SENTIMENTAL ANALYSIS FOR SOCIAL MEDIA, TOOLS, TECHNIQUES, METHODS,

RELEASE NOTES. Overview: Introducing ForeSee CX Suite

D DAVID PUBLISHING. Big Data; Definition and Challenges. 1. Introduction. Shirin Abbasi

TDWI strives to provide course books that are content-rich and that serve as useful reference documents after a class has ended.

Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data

data-based banking customer analytics

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

Text Mining: A Burgeoning technology for knowledge extraction

Non-trivial extraction of implicit, previously unknown and potentially useful information from data

CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets

Evaluating the Usefulness of Sentiment Information for Focused Crawlers

Mining Web Data. Lijun Zhang

SEO and Monetizing The Content. Digital 2011 March 30 th Thinking on a different level

Data Mining. Introduction. Piotr Paszek. (Piotr Paszek) Data Mining DM KDD 1 / 44

Time: 3 hours. Full Marks: 70. The figures in the margin indicate full marks. Answers from all the Groups as directed. Group A.

Oracle9i Data Mining. An Oracle White Paper December 2001

Analysis of Tweets: Donald Trump and sexual harassment allegations

Oracle9i Data Mining. Data Sheet August 2002

John Biancamano Inbound Digital LLC InboundDigital.net

NLP Final Project Fall 2015, Due Friday, December 18

*ANSWERS * **********************************

A data-driven framework for archiving and exploring social media data

IBM Watson Application Developer Workshop. Watson Knowledge Studio: Building a Machine-learning Annotator with Watson Knowledge Studio.

Feature selection. LING 572 Fei Xia

Semantic Systems & Visual Tools to Analyze Climate Change Communication

Visualization and text mining of patent and non-patent data

Gale Digital Scholar Lab Getting Started Walkthrough Guide

empythy Documentation

SAS Factory Miner 14.2: User s Guide

COCKPIT FP Citizens Collaboration and Co-Creation in Public Service Delivery. Deliverable D Opinion Mining Tools 1st version

Tracking 101 DISCOVER HOW TRACKING HELPS YOU UNDERSTAND AND TRULY ENGAGE YOUR AUDIENCES, TURNING INTO RESULTS

ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA

Data Mining. Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA.

COURSE BROCHURE. ITIL - Intermediate Service Transition. Training & Certification

Social Business Intelligence in Action

Automated Classification. Lars Marius Garshol Topic Maps

Data Mining Concepts. Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech

Acquiring, Exploring and Preparing the Data

Transcription:

Social Media Intelligence Text and Network Mining combined Dr. Rosaria Silipo rosariasilipo@yahoo.com

Previously on PAW... PAW San Francisco 2012 2

Social Media Analysis Water Water Everywhere, and not a drop to drink Approaches and Challenges: In-House Text Mining: Sentiment but no relevance In-House Network Mining: Relevance but no Sentiment In-House Scorecard: No Analytics Cloud-based Approach: No Access to Data 3

Our Goal in Social Media Analysis Text Mining for Sentiment Network Mining for Relevance Drill Down on special cases Analytics for Prediction 4

Case Study: Major European Telco Very rich new data sources about customers! Combine Text mining Network Analysis Classic Predictive Analytics Modeling, Clustering, Time Series, etc Combine with internal Data makes the text relevant Include Product names/categories exclude Staff Members Include number of web hits per page... Include existing marketing positioning Include major campaign information 5

Case Study Example: Slashdot Data News for Nerds, Stuff that Matters Basic Facts: 24532 users 491 threads with 15 843 responses from 12 507 users 113505 posts (text mining on posts) 60 main topics 6

Combining Text and Network Mining Network Analysis Hub and Authority Score per User Text Analysis Attitude Level per User 7

Remove anonymous users, group by PostID Text Mining Words Tagging MPQA Corpus Positive words Negative words BoW Standard Named Entity Filter Word Frequency User Bins Word cloud for selected users

Slashdot Text Mining List of negative and positive words (MPQA Opinion Corpus) Tag positive and negative words Count words in posts Aggregate over users Negative + Positive User. Most positive user: dada21 (2838 positive / 1725 negative words) Most negative user: pnutz (43 positive / 109 negative words) 16016 positive users 7107 negative users Which Topics have positive users in common? Government People Law/s Money Market Parties

Network Creation User1 User2 User3 User4 User5 User6 10

Topic Graphs 11

Topic Graph: NASA 12

Hubs & Authorities Hubs = Follower Authorities = Leader Users with hub and authority weights and other features Filtering anonymous users and creating network Centrality index to define hub weight and authority weight 13

Hubs & Authorities dada21 Carl Bialik from the WSJ pnutz Tube Steak Doc Ruby 99BottlesOfBeerInMyF 14

Hubs, Authorities &Attitudes dada21 Carl Bialik from the WSJ Tube Steak WebHosting Guy Catbeller 99BottlesOfBeerInMyF Doc Ruby pnutz 15

What we have found... - The positive leaders - The neutral leaders - The negative leaders - The inactive users What identifies each group? How do I identify a new user? How do I handle each user? 16

User Classification Authority Score Histogram Hub Score Histogram How do I define leadership? 17

Attitude Level Histogram Defining thresholds on attitude might be easier 18

Why Clustering? - No a priori knowledge (not even on a subset of users) - Prediction and interpretation capabilities required k-means algorithm 19

Normalization (Authority score, Hub score) in [0,1] x [0,1] Attitude level in [-66, 1113] 20

Authority after Normalization Leadership is now a bit easier to obtain. 21

Hub Score after Normalization Also the follower condition is more spread out. 22

Attitude after Normalization Attitude is the only parameter that is now easier to identify. 23

Number of Clusters Users with a negative attitude are hard to catch! K=30: 10 clusters with more than 1000 users; 2 clusters with clear negative attitude (< 0.4) K=20: 5 clusters with more than 1000 users; 2 clusters with negative attitude (<0.4) K=10: 2 clusters with more than 5000 users and no cluster with a negative attitude anymore. 24

Re-sampling the Training Set k = 10 25

The k-means Clusters 26

Additional Discoveries There are only very few real leaders! Authority and hub scores identify active participants rather than leaders. Superfans can be found in cluster_3 Negative and (sigh!) active users are collected in cluster_1. Neutral users are usually inactive (cluster_2, cluster_7, and cluster_8) Positive users with different degrees of activity are scattered across the remaining clusters. 27

The k-means Clusters Neutral users Superfans Negative users Fans 28

The operational Workflow Pre-processing Cluster Extraction Assignment of new data 29

Full system to: Summary and Conclusions - Integrate text and network mining - Find meaningful clusters in terms of attitude and activity - Define appropriate actions for users in different clusters - Assign new data to existing clusters 30

Next Steps - Integrate topic information - Integrate user demographic and behavioural information - Discover [time series] patterns for early detection of negative users and superfans - Try other techniques, maybe even on manually segmented data, to discover new user segments 31

Where do I find more? Whitepaper: rosariasilipo@yahoo.com Complete Workflows + Data: - text mining - network mining www.knime.com - combined analysis (note the above 3 process huge data and require 16G memory) clustering Open Source Software: KNIME www.knime.com 32