Discovering Playing Patterns: Time Series Clustering of Free-To-Play Game Data

Similar documents
Cluster Analysis: Agglomerate Hierarchical Clustering

Network Traffic Measurements and Analysis

Exploratory Analysis: Clustering

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Time Series Analysis DM 2 / A.A

Gene Clustering & Classification

Data Mining and Data Warehousing Henryk Maciejewski Data Mining Clustering

CSE 40171: Artificial Intelligence. Learning from Data: Unsupervised Learning

Hierarchical clustering

Hierarchical Clustering

The Curse of Dimensionality. Panagiotis Parchas Advanced Data Management Spring 2012 CSE HKUST

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology

Data Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining

COMP33111: Tutorial and lab exercise 7

Data Preprocessing. Komate AMPHAWAN

Unsupervised Learning

CS 521 Data Mining Techniques Instructor: Abdullah Mueen

Clustering CS 550: Machine Learning

Clustering and Dimensionality Reduction. Stony Brook University CSE545, Fall 2017

CPSC 340: Machine Learning and Data Mining. Hierarchical Clustering Fall 2017

Clustering and Visualisation of Data

Forestry Applied Multivariate Statistics. Cluster Analysis

10701 Machine Learning. Clustering

High throughput Data Analysis 2. Cluster Analysis

Knowledge Discovery. Javier Béjar URL - Spring 2019 CS - MIA

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

Introduction to Data Mining and Data Analytics

ECS 234: Data Analysis: Clustering ECS 234

Hierarchical Clustering

Understanding Computer Usage Evolution

Clustering algorithms

Clustering Part 3. Hierarchical Clustering

Unsupervised Learning. Supervised learning vs. unsupervised learning. What is Cluster Analysis? Applications of Cluster Analysis

Time Series Analysis with R 1

Unsupervised Learning Hierarchical Methods

Jarek Szlichta

Master's thesis. Two years. Datateknik Computer engineering. An extended BIRCH-based clustering algorithm for large time-series datasets.

Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering

HW4 VINH NGUYEN. Q1 (6 points). Chapter 8 Exercise 20

Hierarchical Clustering 4/5/17

Lecture Notes for Chapter 7. Introduction to Data Mining, 2 nd Edition. by Tan, Steinbach, Karpatne, Kumar

Clustering. Chapter 10 in Introduction to statistical learning

Unsupervised learning: Clustering & Dimensionality reduction. Theo Knijnenburg Jorma de Ronde

Chuck Cartledge, PhD. 23 September 2017

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3. Chapter 3: Data Preprocessing. Major Tasks in Data Preprocessing

Cluster Analysis. Ying Shen, SSE, Tongji University

Hierarchical clustering

CLUSTERING. CSE 634 Data Mining Prof. Anita Wasilewska TEAM 16

Chapter 1, Introduction

Hierarchical Clustering

CSE 5243 INTRO. TO DATA MINING

Machine Learning (BSMC-GA 4439) Wenke Liu

Accelerometer Gesture Recognition

Data Preprocessing Yudho Giri Sucahyo y, Ph.D , CISA

CS570: Introduction to Data Mining

Knowledge Discovery and Data Mining

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.

Clustering: Overview and K-means algorithm

Computing with large data sets

CSE 347/447: DATA MINING

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3

10601 Machine Learning. Hierarchical clustering. Reading: Bishop: 9-9.2

Understanding Clustering Supervising the unsupervised

COMS 4771 Clustering. Nakul Verma

DATA MINING AND WAREHOUSING

CPSC 340: Machine Learning and Data Mining

UNIT 2 Data Preprocessing

Network Heartbeat Traffic Characterization. Mackenzie Haffey Martin Arlitt Carey Williamson Department of Computer Science University of Calgary

3. Multidimensional Information Visualization II Concepts for visualizing univariate to hypervariate data

T H E S H I F T T O SMARTPHONE DOMINANCE

Chapter 5: Outlier Detection

Foundations of Machine Learning CentraleSupélec Fall Clustering Chloé-Agathe Azencot

Clustering COMS 4771

MACHINE LEARNING Example: Google search

Clustering. Lecture 6, 1/24/03 ECS289A

Application of Clustering Techniques to Energy Data to Enhance Analysts Productivity

CHAPTER 4: CLUSTER ANALYSIS

CSE 5243 INTRO. TO DATA MINING

CSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo

Data Mining Concepts & Techniques

Distances, Clustering! Rafael Irizarry!

Working with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan

CSE 258 Lecture 5. Web Mining and Recommender Systems. Dimensionality Reduction

Machine Learning using MapReduce

Unsupervised Learning

Data Mining. Jeff M. Phillips. January 7, 2019 CS 5140 / CS 6140

Lecture-17: Clustering with K-Means (Contd: DT + Random Forest)

Unsupervised Learning

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Slides From Lecture Notes for Chapter 8. Introduction to Data Mining

Unsupervised Learning

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Knowledge Discovery. URL - Spring 2018 CS - MIA 1/22

Machine Learning Feature Creation and Selection

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

CS 188: Artificial Intelligence Fall 2008

Transcription:

Discovering Playing Patterns: Time Series Clustering of Free-To-Play Game Data Alain Saas, Anna Guitart and África Periáñez (Silicon Studio) IEEE CIG 2016 Santorini 21 September, 2016

About us Who are we? Game studio and graphics middleware company based in Tokyo Research project to provide Game Data Science as a service Goals: predict player behavior, scale to big data and intuitive result visualization Which data? RPG free-to-play games TS of two games TS of in-app purchases and activity behavioral data 2 of 17

Challenge Unsupervised clustering of Time Series of player activity Why? discover temporary player patterns evaluation of game events and business diagnosis assess common characteristics of players belonging to the same cluster How? 1. representation techniques: reducing the high dimensionality of TS 2. similarity measures for free-to-play game data 3. hierarquical clustering 4. visual validation of the results 3 of 17

Representation methods Symbolic Aggregate Approximation Discrete Wavelet Transfrom Trend Extraction 4 of 17

Similarity measures Dynamic Time Warping ( M ) DTW (X, Y ) = min x im y jm r M m=1 Correlation-based measure Complexity-Invariant Distance measure CID(X, Y ) = dist(x, Y ) CF (X, Y ), CF complexity correction factor N n=1 COR(X, Y ) = (xn X )(y n Ȳ ) N n=1 (xn X N ) 2 n=1 (yn Ȳ )2 Temporal Correlation and Raw Values Behaviors measure CORT (X, Y ) = CF (X, Y ) = CE(X ) = N 1 n=1 (x n+1 x n)(y n+1 y n) N 1 N 1 n=1 (x n+1 x n) 2 n=1 (y n+1 y n) 2 max(ce(x ), CE(Y )) min(ce(x ), CE(Y )) CE is the complexity estimation N 1 (x n x n+1 ) 2 n=1 5 of 17

Similarity measure comparison Euclidean vs. Correlation Correlation vs. Complexity-Invariant Distance Dynamic Time Warping vs.correlation Correlation vs. Discrete Wavelet Transform 6 of 17

Comparison clustering methods DTW Dynamic Time Warping similar player profiles with a shift on the time axis different patterns but at different scale DWT Discrete Wavelet Transform dimensionality reduction frequency of the series SAX Symbolic Aggregate Approximation parameters w,a COR Correlation similar geometric and synchronous profiles sensitive to noise data and outliers CORT Temporal Correlation similar to COR but with time consideration? CID Complexity-Invariant distance similar complexity patterns good for sparse time series COR+trend Correlation and trend extraction addresses COR s sensitivity to noise does not work well with sparse time series 7 of 17

Hierarchical clustering Agglomerative Ward method: Lead to a minimum increase of total within-cluster variance Single Linkage Complete Linkage Average Linkage Centroid Method Ward Method 8 of 17

Our data Time series measured per user per day. Game Activity Behavioral data In-app Sales Time: The amount of time spent in the game Sessions: The total number of playing sessions Actions: The total number of actions performed Purchase: The total amount of in-app purchases 9 of 17

Data selection, constraints Time Series: Multi-dimensional data selection of period P in our data weekly game events period P of length 21 days played time active users min connections 6/7 days a week purchases paying users at least one purchase in period P players alive during period P 10 of 17

Datasets and tests Game Data Technique Clusters Date range Age of Ishtaria Daily played time COR-trend 8 Oct2014 - Jan2016 Age of Ishtaria Daily purchase CID 5 Oct2014 - Jan2016 Grand Sphere Daily played time COR-trend 8 Jun2015 - Mar2016 11 of 17

Clustering time series of time played 1. representation method: trend extraction 2. similarity measure: correlation 3. hierarchical clustering: Ward method 4. validation of results: visualization with heatmap (raw data) 12 of 17

13 of 17 Extraction of players characteristics

Clustering time series of time played Also able to extract differentiate patterns as in Age of Ishtaria 14 of 17

Clustering time series of purchases 1. similarity measure: complexity-invariant distance 2. hierarchical clustering: Ward method 3. validation of results: visualization with heatmap (raw data) 15 of 17

Summary and Next Steps Unsupervised clustering time series data from two free-to-play games Evaluate several similarity measures and representation methods Extract meaningful behavioral patterns of players Assess impact of weekly game events Discover hidden playing dynamics regarding purchases and time played Feature for churn prediction Event recommender Cluster level behaviour 16 of 17

http://www.siliconstudio.co.jp/rd/4front/ Thank you! 17 of 17