Graph mining assisted semi-supervised learning for fraudulent cash-out detection

Similar documents
CS145: INTRODUCTION TO DATA MINING

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

Lecture 25: Review I

Collective classification in network data

Constrained Classification of Large Imbalanced Data

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

DATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane

Using Machine Learning to Optimize Storage Systems

CS570: Introduction to Data Mining

Data Mining Classification: Alternative Techniques. Imbalanced Class Problem

Lecture 9: Support Vector Machines

5/3/2010Z:\ jeh\self\notes.doc\7 Chapter 7 Graphical models and belief propagation Graphical models and belief propagation

Deep Character-Level Click-Through Rate Prediction for Sponsored Search

Classification. Slide sources:

Multi-label classification using rule-based classifier systems

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

CS381V Experiment Presentation. Chun-Chen Kuo

A Survey on Postive and Unlabelled Learning

Tutorial on Machine Learning Tools

INTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá

Information Management course

Fraud Detection using Machine Learning

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

CS249: ADVANCED DATA MINING

Dynamic update of binary logistic regression model for fraud detection in electronic credit card transactions

Using Machine Learning to Identify Security Issues in Open-Source Libraries. Asankhaya Sharma Yaqin Zhou SourceClear

SOCIAL MEDIA MINING. Data Mining Essentials

Expectation Propagation

Supervised Learning for Image Segmentation

Random Forest A. Fornaser

Fast or furious? - User analysis of SF Express Inc

International Journal of Research in Advent Technology, Vol.7, No.3, March 2019 E-ISSN: Available online at

Tutorials Case studies

A Comparative Study of Locality Preserving Projection and Principle Component Analysis on Classification Performance Using Logistic Regression

Lecture 9: Undirected Graphical Models Machine Learning

Unlabeled Data Classification by Support Vector Machines

6 : Factor Graphs, Message Passing and Junction Trees

1 Training/Validation/Testing

ML4Bio Lecture #1: Introduc3on. February 24 th, 2016 Quaid Morris

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

Interpretable Machine Learning with Applications to Banking

Evaluating Classifiers

Data-driven Kernels for Support Vector Machines

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

node2vec: Scalable Feature Learning for Networks

MIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA

Semi-supervised Learning

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

9. Conclusions. 9.1 Definition KDD

Information Retrieval. Lecture 7

Data Mining and Knowledge Discovery: Practice Notes

A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems

10708 Graphical Models: Homework 4

Lecture 13: Model selection and regularization

Bayesian Network & Anomaly Detection

Evaluation Metrics. (Classifiers) CS229 Section Anand Avati

A Dendrogram. Bioinformatics (Lec 17)

Comparison of Optimization Methods for L1-regularized Logistic Regression

Hierarchical Local Clustering for Constraint Reduction in Rank-Optimizing Linear Programs

Supervised Belief Propagation: Scalable Supervised Inference on Attributed Networks

Proactive Prediction of Hard Disk Drive Failure

Lecture #11: The Perceptron

ECG782: Multidimensional Digital Signal Processing

Machine Learning in Python. Rohith Mohan GradQuant Spring 2018

List of Exercises: Data Mining 1 December 12th, 2015

Outlier Ensembles. Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY Keynote, Outlier Detection and Description Workshop, 2013

Data Mining and Knowledge Discovery Practice notes 2

CS4491/CS 7265 BIG DATA ANALYTICS

7. Boosting and Bagging Bagging

Classification Algorithms in Data Mining

Data mining with Support Vector Machine

Module 4. Non-linear machine learning econometrics: Support Vector Machine

Machine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart

Predict the Likelihood of Responding to Direct Mail Campaign in Consumer Lending Industry

CS 229 Midterm Review

CLASSIFICATION JELENA JOVANOVIĆ. Web:

Data Mining and Knowledge Discovery: Practice Notes

Bagging for One-Class Learning

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Evaluation. Evaluate what? For really large amounts of data... A: Use a validation set.

Statistical Physics of Community Detection

Applied Statistics for Neuroscientists Part IIa: Machine Learning

Conditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,

UNSUPERVISED LEARNING FOR ANOMALY INTRUSION DETECTION Presented by: Mohamed EL Fadly

scikit-learn (Machine Learning in Python)

Variable Selection 6.783, Biomedical Decision Support

Automatic summarization of video data

CS535 Big Data Fall 2017 Colorado State University 10/10/2017 Sangmi Lee Pallickara Week 8- A.

Stanford University. A Distributed Solver for Kernalized SVM

Lecture 13 Segmentation and Scene Understanding Chris Choy, Ph.D. candidate Stanford Vision and Learning Lab (SVL)

Contents. Preface to the Second Edition

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

Machine Learning. Sourangshu Bhattacharya

Kernels for Structured Data

Reduce and Aggregate: Similarity Ranking in Multi-Categorical Bipartite Graphs

Local Context Selection for Outlier Ranking in Graphs with Multiple Numeric Node Attributes

Large Scale Manifold Transduction

Link Prediction for Social Network

Online Social Networks and Media

Protein Sequence Classification Using Probabilistic Motifs and Neural Networks

Transcription:

Graph mining assisted semi-supervised learning for fraudulent cash-out detection Yuan Li Yiheng Sun Noshir Contractor Aug 2, 2017

Outline Introduction Method Experiments and Results Conculsion and Future work 2 / 23

What is fraudulent cash-out? Figure: The schematic diagram of fraudulent cash-out. 3 / 23

Problem Setting Figure: The schematic diagram of the problem setting. 4 / 23

Approaches for fraud detection Supervised learning methods, such as logistic regression, SVM, as well as neural networks C.Phua, V.Lee, K.Smith, and R.Gayler. arxiv preprint arxiv:1009.6119, 2010. 5 / 23

Approaches for fraud detection Supervised learning methods, such as logistic regression, SVM, as well as neural networks Supervised learning hybrid C.Phua, V.Lee, K.Smith, and R.Gayler. arxiv preprint arxiv:1009.6119, 2010. 5 / 23

Approaches for fraud detection Supervised learning methods, such as logistic regression, SVM, as well as neural networks Supervised learning hybrid Semi-supervised learning approach with clustering algorithm C.Phua, V.Lee, K.Smith, and R.Gayler. arxiv preprint arxiv:1009.6119, 2010. 5 / 23

Approaches for fraud detection Supervised learning methods, such as logistic regression, SVM, as well as neural networks Supervised learning hybrid Semi-supervised learning approach with clustering algorithm Graph mining C.Phua, V.Lee, K.Smith, and R.Gayler. arxiv preprint arxiv:1009.6119, 2010. 5 / 23

If we could estimate user reputation, then Figure: The schematic diagram of modeling fraudulent cash-out detection problem in supervised learning and graph mining hybrid approach. D.H.Chau etc. ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2010 6 / 23

However when reputation score is not available, we need to Model edge potential more carefully Tune the parameters in Markov random field 7 / 23

However when reputation score is not available, we need to Model edge potential more carefully Tune the parameters in Markov random field Our approach: 7 / 23

Problem statement Given: An undirected bipartite graph G = (V c, V s, E) V c : the set of consumer nodes V s : the set of merchant nodes E: the edge set corresponding to the transactions among V c and V s. 8 / 23

Problem statement Given: An undirected bipartite graph G = (V c, V s, E) V c : the set of consumer nodes V s : the set of merchant nodes E: the edge set corresponding to the transactions among V c and V s. The binary variable X { 1, 1} observed over a subset V l s of V s and X = 1 over a subset V l c of V c, where X = 1 corresponds to fraudulent status. 8 / 23

Problem statement Given: An undirected bipartite graph G = (V c, V s, E) V c : the set of consumer nodes V s : the set of merchant nodes E: the edge set corresponding to the transactions among V c and V s. The binary variable X { 1, 1} observed over a subset V l s of V s and X = 1 over a subset V l c of V c, where X = 1 corresponds to fraudulent status. The frequency of transactions between i c V c and j s V s and the amount associated with the transactions. Output: P(X js = 1) for j s V s : probability of a shop involved in fraudulent cash-out transaction. 8 / 23

Modeling Markov random field: P{X} = 1 φ(x js ) φ(x ic ) ψ icj Z s (X ic, X js ) (1) j s V s i c V c i,j E Given node potential φ(x js ), φ(x ic ) and edge potential ψ icjs (X ic, X js ), the marginal probability P(X js = 1) for vertices j s can be calculated with Belief Propagation algorithm. 9 / 23

Edge Potential Transaction between consumers and shops are categorized into different types based on their amount. Edge potential is modeled as: 1 ψ icjs (X ic, X js ) = p (2) 1 + e 1 α kx ic X js m kxic X js p: number of all possible types of transactions m kxic X js : number of k th type transactions between vertices i c and j s α kxic X js : parameter that indicates hemophilic relation among shops and consumers for the k th type of transaction 10 / 23

Node Potential Consumer node potential: φ(i c V l c) = { β l c, for X ic = 1 (3a) 1 β l c, for X ic = 1 (3b) φ(i c V c \V l c) = { β u c, for X ic = 1 (4a) 1 β u c, for X ic = 1 (4b) 11 / 23

Node Potential Consumer node potential: φ(i c V l c) = { β l c, for X ic = 1 (3a) 1 β l c, for X ic = 1 (3b) φ(i c V c \V l c) = { β u c, for X ic = 1 (4a) 1 β u c, for X ic = 1 (4b) Shop node potential: Labeled shops are used to estimate parameters Both the potentials of unlabeled shops and labeled shops are set to be 0.5. 11 / 23

Parameter Estimation (1) Given a set of parameters (α kxic X js, β u c, β l c), by applying BP, the marginal probability of a shop j s being fraudulent is calculated. 12 / 23

Parameter Estimation (1) Given a set of parameters (α kxic X js, β u c, β l c), by applying BP, the marginal probability of a shop j s being fraudulent is calculated. (2) The value of a loss function L defined over all labeled shops are calculated. 12 / 23

Parameter Estimation (1) Given a set of parameters (α kxic X js, β u c, β l c), by applying BP, the marginal probability of a shop j s being fraudulent is calculated. (2) The value of a loss function L defined over all labeled shops are calculated. (3) Bayesian optimization is used to find the optimal solution to the following optimization problem: (α kxic X js, βc u, βc) l = argmin L(j s j s Vs l ) (5) α kxic X js,βc u,βc l 12 / 23

Data The performance of the model is evaluated with real-world data from JD Finance. Table: Descriptive Statistics of the experiment data Labeled Unknown Sum Consumer NA NA 230238 Merchant 7582 193707 201289 Transaction 0 2913471 2913471 13 / 23

Number of nodes vs Number of transactions Figure: node degree distribution (log-log) 14 / 23

Experiment Setup 10 random 4-fold cross validations 15 / 23

Experiment Setup 10 random 4-fold cross validations Multiple initial guesses for the parameters are generated to prevent local optimal solutions. 15 / 23

Performance of our algorithm Figure: ROC curve for shops. Dark red line is the average ROC curve over 10 experiments and light red lines are ROC curves for each experiment. 16 / 23

Choice of Loss function Figure: A comparison of different loss function. Dark bars represent the performances of the algorithms after running sufficient number of iterations of Bayesian optimization, and light bars represent the performances of the algorithms after running 30 iterations of Bayesian optimization. The performances are measured in Deviance, TPR and AUC. 17 / 23

Edge Potential Figure: ROC curves of the algorithms under different edge potential models. Red line corresponds to our model. Dark blue and light blue lines correspond to two parsimonious models used in previous studies. 18 / 23

Node Potential Table: impact of the number of labeled nodes when shop potentials are set to be 0.5 P m = 10% P m = 25% P m = 50% P m = 100% P c = 0% 0.9114 0.9033 0.8967 0.9127 P c = 10% 0.9156 0.9036 0.8965 0.9099 P c = 25% 0.9237 0.9116 0.9086 0.9288 P c = 50% 0.9250 0.9123 0.9196 0.9148 P c = 100% 0.9012 0.9008 0.9071 0.9248 19 / 23

Node Potential Table: impact of the number of labeled nodes when shop potentials are estimated P m = 10% P m = 25% P m = 50% P m = 100% P c = 0% 0.7960 0.8815 0.9196 0.9306 P c = 10% 0.8055 0.9195 0.9108 0.9206 P c = 25% 0.9163 0.9227 0.9226 0.9271 P c = 50% 0.8362 0.9047 0.9225 0.9305 P c = 100% 0.8570 0.9092 0.9313 0.9348 20 / 23

Conclusion Our algorithm is efficient and scalable. We achieve 92% TPR while controlling FPR at 5% level in JD dataset. The algorithm is scalable. Our algorithm sheds light on regulation for the fraudulent merchants. Our algorithm is robust even if only a small number of nodes are labeled. In real world, ground truth is hard to obtained. Our algorithm provides an attractive way to use the limited observed labels. 21 / 23

Future work Including node degree into the model Allocating the budget of labeling nodes in a network Developing an ensemble approach. 22 / 23

Thank you for your attention! 23 / 23