Credit Card Fraud Detection Using Historical Transaction Data

Similar documents
Clustering algorithms and autoencoders for anomaly detection

A Comparative Study of Selected Classification Algorithms of Data Mining

Outlier detection using autoencoders

2015 The MathWorks, Inc. 1

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,

DETECTION OF INDIAN COUNTERFEIT BANKNOTES USING NEURAL NETWORK

Demystifying Deep Learning

Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect

Code Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python:

Keras: Handwritten Digit Recognition using MNIST Dataset

NVIDIA DLI HANDS-ON TRAINING COURSE CATALOG

Bayesian Network & Anomaly Detection

Keras: Handwritten Digit Recognition using MNIST Dataset

Fraud Detection using Machine Learning

Deep Tensor: Eliciting New Insights from Graph Data that Express Relationships between People and Things

Demystifying Deep Learning

Tutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY

Research Article International Journals of Advanced Research in Computer Science and Software Engineering ISSN: X (Volume-7, Issue-6)

NVIDIA DEEP LEARNING INSTITUTE

New Research Trends in Cybersecurity Privacy Attacks on Decentralized Deep Learning

K-Mean Clustering Algorithm Implemented To E-Banking

M. Sc. (Artificial Intelligence and Machine Learning)

Anomaly Detection System for Video Data Using Machine Learning

EE 589 INTRODUCTION TO ARTIFICIAL NETWORK REPORT OF THE TERM PROJECT REAL TIME ODOR RECOGNATION SYSTEM FATMA ÖZYURT SANCAR

Predictive SIM Box Fraud Detection Model for ethio telecom

XES Tensorflow Process Prediction using the Tensorflow Deep-Learning Framework

International Journal of Electronics and Computer Science Engineering 1765

Kaggle See Click Fix Model Description

INTRODUCTION... 2 FEATURES OF DARWIN... 4 SPECIAL FEATURES OF DARWIN LATEST FEATURES OF DARWIN STRENGTHS & LIMITATIONS OF DARWIN...

International Journal of Computer Engineering and Applications, Volume XI, Issue IX, August 17, ISSN

Optimization Model of K-Means Clustering Using Artificial Neural Networks to Handle Class Imbalance Problem

Frameworks in Python for Numeric Computation / ML

Deep Learning Approach to Network Intrusion Detection

IJSRD - International Journal for Scientific Research & Development Vol. 2, Issue 04, 2014 ISSN (online):

Fraud Mobility: Exploitation Patterns and Insights

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

An Introduction to Deep Learning with RapidMiner. Philipp Schlunder - RapidMiner Research

Deep Learning. Volker Tresp Summer 2014

THE DARK WEB AND HOW IT AFFECTS YOUR INDUSTRY

Credit card Fraud Detection using Predictive Modeling: a Review

Brainchip OCTOBER

A Layered Approach to Fraud Mitigation. Nick White Product Manager, FIS Payments Integrated Financial Services

Facial Keypoint Detection

Deep Neural Networks Optimization

Homework 01 : Deep learning Tutorial

Deep Learning for Computer Vision II

Introduction to Data Mining and Data Analytics

ISSN: (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies

Interpretable Machine Learning with Applications to Banking

CYBER SECURITY and Mobile Money: Challenges and Opportunities NOVEMBER 28 TH 2016 PRESENTER: CECIL WILLIAMS

We will divide the many telecom fraud schemes into three broad categories, based on who the fraudsters are targeting. These categories are:

Deep Learning with R. Francesca Lazzeri Data Scientist II - Microsoft, AI Research

A Data Classification Algorithm of Internet of Things Based on Neural Network

Question Bank. 4) It is the source of information later delivered to data marts.

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a

International Journal of Computer Engineering and Applications, Volume XI, Issue IX, September 17, ISSN

ABSTRACT I. INTRODUCTION. Dr. J P Patra 1, Ajay Singh Thakur 2, Amit Jain 2. Professor, Department of CSE SSIPMT, CSVTU, Raipur, Chhattisgarh, India

Tutorial on Machine Learning Tools

A Bayesian Classification Model for Fraud Detection over ATM Platforms

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

Cyber Insurance: What is your bank doing to manage risk? presented by

Deep Learning with Tensorflow AlexNet

Machine Learning Software ROOT/TMVA

Why data science is the new frontier in software development

Application of Deep Learning Techniques in Satellite Telemetry Analysis.

TensorFlow and Keras-based Convolutional Neural Network in CAT Image Recognition Ang LI 1,*, Yi-xiang LI 2 and Xue-hui LI 3

Detecting SIM Box Fraud Using Neural Network

arxiv: v1 [cs.lg] 12 Jul 2018

Deep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES

Large Scale Data Analysis Using Deep Learning

An Exploration of Computer Vision Techniques for Bird Species Classification

Jarek Szlichta

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Data Mining with R Programming Language for Optimizing Credit Scoring in Commercial Bank

Credit Union Cyber Crisis: Gaining Awareness and Combatting Cyber Threats Without Breaking the Bank

INTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá

Hidden Markov Model for Credit Card Fraud Detection

A Detailed Analysis on NSL-KDD Dataset Using Various Machine Learning Techniques for Intrusion Detection

Review on Text Mining

DECISION SUPPORT SYSTEM USING TENSOR FLOW

RNNSecureNet: Recurrent neural networks for Cyber security use-cases. Mohammed Harun Babu R, Vinayakumar R, Soman KP

Dynamic Clustering of Data with Modified K-Means Algorithm

Object Detection Lecture Introduction to deep learning (CNN) Idar Dyrdal

Characterization and Benchmarking of Deep Learning. Natalia Vassilieva, PhD Sr. Research Manager

Classifying Twitter Data in Multiple Classes Based On Sentiment Class Labels

Python With Data Science

Exploring the Structure of Data at Scale. Rudy Agovic, PhD CEO & Chief Data Scientist at Reliancy January 16, 2019

Hand Written Digit Recognition Using Tensorflow and Python

DeepFace: Closing the Gap to Human-Level Performance in Face Verification

Deep Learning. Deep Learning provided breakthrough results in speech recognition and image classification. Why?

Reduce fraud losses and improve operational efficiency with advanced fraud detection technology

Deep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD.

GENERATIVE ADVERSARIAL NETWORKS (GAN) Presented by Omer Stein and Moran Rubin

Introduction to Data Science. Introduction to Data Science with Python. Python Basics: Basic Syntax, Data Structures. Python Concepts (Core)

1 INTRODUCTION 2 RELATED WORK. Usha.B.P ¹, Sushmitha.J², Dr Prashanth C M³

Clustering and Unsupervised Anomaly Detection with l 2 Normalized Deep Auto-Encoder Representations

CLICK TO EDIT MASTER TITLE STYLE Fraud Overview and Mitigation Strategies

Ch.1 Introduction. Why Machine Learning (ML)?

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

Transcription:

Credit Card Fraud Detection Using Historical Transaction Data 1. Problem Statement With the growth of e-commerce websites, people and financial companies rely on online services to carry out their transactions that have led to an exponential increase in the credit card frauds [1]. Fraudulent credit card transactions lead to a loss of huge amount of money. The design of an effective fraud detection system is necessary in order to reduce the losses incurred by the customers and financial companies [2]. Research has been done on many models and methods to prevent and detect credit card frauds. Some credit card fraud transaction datasets contain the problem of imbalance in datasets. A good fraud detection system should be able to identify the fraud transaction accurately and should make the detection possible in real-time transactions. Fraud detection can be divided into two groups: anomaly detection and misuse detection. Anomaly detection systems bring normal transaction to be trained and use techniques to determine novel frauds. Conversely, a misuse fraud detection system uses the labeled transaction as normal or fraud transaction to be trained in the database history. So, this misuse detection system entails a system of supervised learning and anomaly detection system a system of unsupervised learning [2]. Fraudsters masquerade the normal behavior of customers and the fraud patterns are changing rapidly so the fraud detection system needs to constantly learn and update. Credit card frauds can be broadly classified into three categories, that is, traditional card related frauds (application, stolen, account takeover, fake and counterfeit), merchant related frauds (merchant collusion and triangulation) and Internet frauds (site cloning, credit card generators and false merchant sites) [1]. 2. Background Timely information on fraudulent activities is strategic to the banking industry as banks have huge databases with variety. Valuable business information can be extracted from these data stores. Credit card frauds can be broadly classified into three categories, that is, traditional card related frauds (application, stolen, account takeover, fake and counterfeit), merchant related frauds (merchant collusion and triangulation) and Internet frauds (site cloning, credit card generators and false merchant sites) 3. Methodology

Basically, there are five basic steps for the data mining process which defines the problem. 1) preparing data 2) exploring the data 3) development of the model 4) exploration and validation of the models 5) deployment and updation in the models. In this project, Neural network is used as the data mining technique and it utilized above mentioned steps for accurate and reliable result. Moreover, Neural network was used as it has the capability of adaption and generalization. Moreover, H2O [3] is also a good option for the experiment purpose. H2O flow is a notebook style open source interface for H2O. It is an interactive web-based environment that allows persons to combine text, plot, mathematics, executable code in a single document, very similar to ipython notebooks. 3.1 Working with H2O: 3.1.1 Start of H2O cluster: We can work with data in two ways on H2O. Either we can use commands in R Studio or we can use H2O flow interface. After starting H2O cluster the dataset named df is loaded on H2O by the name h2odf. The input data loaded on to H2O is then split into training, testing and validation dataset. 60% of the data forms the training data, 20% forms the validation data and rest 20% forms the testing data. For the training of the data A deep neural network model is used. During the training deep learning model uses the following parameters: 1. Hidden - It specifies the number of hidden layers and number of neurons in each layer in the deep learning architecture. 2. Epochs - It represents the number of iterations to be done on the data set. 3. Activation - It represents the type of activation function to use. The major activation functions in H2O are Tanh, Rectifier, and Maxout. 4. Variable importance: It gives the importance of variables listed from greatest importance, to least importance. A pictorial representation of fraud detection model is depicted in following figure.

Figure 1: Fraud detection model [4] 4. Experimental Design For the experimental purpose, various datasets are available in web. The weblink of some of such data sets as given below: Dataset:

https://www.kaggle.com/datasets?sortby=relevance&group=featured&search=credit+c ard+fraud+detection https://data.world/raghu543/credit-card-fraud-data https://data.world/vlad/credit-card-fraud-detection Evaluation Measures: The performance of the model is evaluated by running the specific command: h2o.varimp_plot (model_dl_1) We obtain a table giving the importance of the different attributes to classify a transaction as genuine or legitimate. MSE: Mean Squared Error RMSE: Root Mean Squared Error MAE: Mean Absolute Errors RMSLE: Root Mean Squared Log Error 5. Software & Hardware Requirements: Python based Computer Vision and Deep Learning libraries will be exploited for the development and experimentation of the project. Tools such as Anaconda Python, and libraries such as OpenCV, Tensorflow, and Keras will be utilized for this process. Training will be conducted on NVIDIA GPUs for training the end-to-end version of CNN based object detection model. 6. References [1] Smith, Michael. The Federal Cyber Role: How Federal Cybersecurity Policy has Affected the Public and Private Sector. Diss. Utica College, 2017. [2] Seyedhossein, Leila, and Mahmoud Reza Hashemi. "Mining information from credit card time series for timelier fraud detection." Telecommunications (IST), 2010 5th International Symposium on. IEEE, 2010. [3] https://www.h2o.ai/h2o-old/h2o-flow/

[4] Pumsirirat, Apapan, and Liu Yan. "Credit Card Fraud Detection using Deep Learning based on Auto-Encoder and Restricted Boltzmann Machine." INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS 9.1 (2018): 18-25.