Discovering Playing Patterns: Time Series Clustering of Free-To-Play Game Data

Discovering Playing Patterns: Time Series Clustering of Free-To-Play Game Data Alain Saas, Anna Guitart and África Periáñez (Silicon Studio) IEEE CIG 2016 Santorini 21 September, 2016

About us Who are we? Game studio and graphics middleware company based in Tokyo Research project to provide Game Data Science as a service Goals: predict player behavior, scale to big data and intuitive result visualization Which data? RPG free-to-play games TS of two games TS of in-app purchases and activity behavioral data 2 of 17

Challenge Unsupervised clustering of Time Series of player activity Why? discover temporary player patterns evaluation of game events and business diagnosis assess common characteristics of players belonging to the same cluster How? 1. representation techniques: reducing the high dimensionality of TS 2. similarity measures for free-to-play game data 3. hierarquical clustering 4. visual validation of the results 3 of 17

Representation methods Symbolic Aggregate Approximation Discrete Wavelet Transfrom Trend Extraction 4 of 17

Similarity measures Dynamic Time Warping ( M ) DTW (X, Y ) = min x im y jm r M m=1 Correlation-based measure Complexity-Invariant Distance measure CID(X, Y ) = dist(x, Y ) CF (X, Y ), CF complexity correction factor N n=1 COR(X, Y ) = (xn X )(y n Ȳ ) N n=1 (xn X N ) 2 n=1 (yn Ȳ )2 Temporal Correlation and Raw Values Behaviors measure CORT (X, Y ) = CF (X, Y ) = CE(X ) = N 1 n=1 (x n+1 x n)(y n+1 y n) N 1 N 1 n=1 (x n+1 x n) 2 n=1 (y n+1 y n) 2 max(ce(x ), CE(Y )) min(ce(x ), CE(Y )) CE is the complexity estimation N 1 (x n x n+1 ) 2 n=1 5 of 17

Similarity measure comparison Euclidean vs. Correlation Correlation vs. Complexity-Invariant Distance Dynamic Time Warping vs.correlation Correlation vs. Discrete Wavelet Transform 6 of 17

Comparison clustering methods DTW Dynamic Time Warping similar player profiles with a shift on the time axis different patterns but at different scale DWT Discrete Wavelet Transform dimensionality reduction frequency of the series SAX Symbolic Aggregate Approximation parameters w,a COR Correlation similar geometric and synchronous profiles sensitive to noise data and outliers CORT Temporal Correlation similar to COR but with time consideration? CID Complexity-Invariant distance similar complexity patterns good for sparse time series COR+trend Correlation and trend extraction addresses COR s sensitivity to noise does not work well with sparse time series 7 of 17

Hierarchical clustering Agglomerative Ward method: Lead to a minimum increase of total within-cluster variance Single Linkage Complete Linkage Average Linkage Centroid Method Ward Method 8 of 17

Our data Time series measured per user per day. Game Activity Behavioral data In-app Sales Time: The amount of time spent in the game Sessions: The total number of playing sessions Actions: The total number of actions performed Purchase: The total amount of in-app purchases 9 of 17

Data selection, constraints Time Series: Multi-dimensional data selection of period P in our data weekly game events period P of length 21 days played time active users min connections 6/7 days a week purchases paying users at least one purchase in period P players alive during period P 10 of 17

Datasets and tests Game Data Technique Clusters Date range Age of Ishtaria Daily played time COR-trend 8 Oct2014 - Jan2016 Age of Ishtaria Daily purchase CID 5 Oct2014 - Jan2016 Grand Sphere Daily played time COR-trend 8 Jun2015 - Mar2016 11 of 17

Clustering time series of time played 1. representation method: trend extraction 2. similarity measure: correlation 3. hierarchical clustering: Ward method 4. validation of results: visualization with heatmap (raw data) 12 of 17

13 of 17 Extraction of players characteristics

Clustering time series of time played Also able to extract differentiate patterns as in Age of Ishtaria 14 of 17

Clustering time series of purchases 1. similarity measure: complexity-invariant distance 2. hierarchical clustering: Ward method 3. validation of results: visualization with heatmap (raw data) 15 of 17

Summary and Next Steps Unsupervised clustering time series data from two free-to-play games Evaluate several similarity measures and representation methods Extract meaningful behavioral patterns of players Assess impact of weekly game events Discover hidden playing dynamics regarding purchases and time played Feature for churn prediction Event recommender Cluster level behaviour 16 of 17

http://www.siliconstudio.co.jp/rd/4front/ Thank you! 17 of 17