Dynamic update of binary logistic regression model for fraud detection in electronic credit card transactions
|
|
- Laureen Gibson
- 5 years ago
- Views:
Transcription
1 Dynamic update of binary logistic regression model for fraud detection in electronic credit card transactions Fidel Beraldi 1,2 and Alair Pereira do Lago 2 1 First Data, Fraud Risk Department, São Paulo, Brazil 2 University of São Paulo, Department of Computer Science, São Paulo, Brazil ABSTRACT In this paper we develop Dynamic Model Averaging (DMA) models regarding electronic transactions coming from e-commerce environment which incorporate the trends and characteristics of fraud in each period of analysis. We have also developed logistic regression models in order to compare their performances in the fraud detection processes. For the experiment, a dataset was provided by an e-commerce company in Brazil to develop the models and compare their results. 1 Introduction Regarding technological and economic development, which made communication process easier and increased purchasing power, credit card transactions have become the primary payment method in brazilian and international retailers. 1 In this scenario, as the number of transactions by credit card grows, more opportunities are created for fraudsters to produce new ways of fraud, resulting in large losses for the financial system. 2 Fraud indicators have shown that e-commerce transactions are riskier than card present transactions, since those do not use secure and efficient processes to authenticate the cardholder, such as personal identification number (PIN). 1
2 2 Methods Due to the fact that fraudsters quickly adapt to fraud prevention measures, statistical models for fraud detection need to be adaptable and flexible to change over time in a dynamic way. Fraud scoring models can be updated sporadically or continuously over time, which raises the question of dynamic update of the model parameters to detect fraud. Raftery et al (2012) developed a method called Dynamic Model Averaging (DMA) 345 which implements a process of continuous updating over time. The DMA methodology combines some existing ideas: weighted Bayesian models (Bayesian Model Averaging - BMA), Markov chains and forgetting factor in shaping state-space. All those characteristics make DMA models a better option for fraud scoring models. For the experiment, an e-commerce company provided the transactions data which have been performed by its payment system from July 2009 to January The data analysis was performed following a non-disclosure agreement, as recommended by PCI Data Security Standard. 6 Data analysis made possible to compare the DMA methodology against the classical logistic regression model, which is often used in fraud detection process. The following steps were taken in the experiment: Step 1 - Original dataset: it was created the structure for access to tables and fields to export data. The structure was made in the SQL database management system. Step 2 - Tables and fields selection: at this stage, all analyzed tables and fields developed along with the business team were selected and exported. At total, 28 tables were exported leading to 354 fields. Step 3 - Data filters and structure: all credit card transactions approved from July 2009 to January 2014 were selected. For this period, the records identified as fraudulent, either by cardholder or the company s internal analysis, were identified in the database. The entire selection of fields and tables relationship was performed. Thus, the originality of the data stored in the fields of exported tables could be preserved. Step 4 - Development of derived variables: based on the original variables, derived variables were created for the modeling process. 2/7
3 Step 5 - Final dataset for sampling: finally the final dataset of records and variables for sampling and modeling. Additionally, for validation purposes, some records in the database on the internal system of the company were selected in order to verify if the data of variables collected were equal to the original. The database for the experiment is composed of 7,716,09 records of credit card transactions, distributed from July 2009 to January For each extracted record, there are 52 independent variables (original and derived variables) and one dependent variable (fraud/non-fraud transaction). Denied transactions, regardless of the reason of negative, were excluded from the database. Thus, to the modeling process are considered only approved transactions. Fraudulent transactions were identified in two ways: through chargebacks information and the analysis of the company s internal team. However, both ways of detection resulted in a classification of fraud transactions in the final database. Transactions not classified as fraud were identified as non-fraud. For the modeling process, we sampled 428,256 records, with 22,615 fraud and 405,641 non-fraud transactions. This implies a ratio of about 1 fraud record for each 18 non-fraud transaction. This ratio of fraud and non-fraud is very close to the values adopted for this kind of problem, as noted in previous experiments. 789 After collecting the sample, for the modeling process, the sampled records were split into 80% (342,605) for model development and 20% (85,651) for validation. 3/7
4 3 Results Evaluating the performance of logistic regression model and DMA model separately on the table 1, the Stepwise Modified and the DMA 95 (λ =α = 0.95) showed the best performance indicators in its categories. However, in general, it is noted that DMA models have better performance for all indicators in relation to logistic regression models, except the detection rate, which has approximately a difference of 10% to compare the Stepwise and DMA 95. Models Indicators Logistic Regression DMA Model Stepwise Stepwise DMA 99 DMA 95 Modified (λ =α=0,99) (λ =α=0,95) Classification Performance Detection Rate 62,9% 66,2% 50,8% 56,5% Specificity 83,1% 83,9% 96,3% 97,9% False Positive Rate 16,9% 16,1% 3,7% 2,1% Ratio Non-Fraud/Fraud 4,8 4,4 1,3 0,7 Precision (Fraud) 17,2% 18,7% 43,2% 60,1% Precision (Non-Fraud) 97,6% 97,8% 97,2% 97,6% Model Performance KS 48,1% 52,1% 61,3% 70,4% Accuracy 82,1% 83,0% 93,9% 95,7% Area under ROC curve 81,7% 83,9% 88,3% 92,9% F-measure 27,0% 29,2% 46,7% 58,2% Bold values indicate the best results in the indicator. Table 1. Adjusted Models Performance. Figure 1 shows AUC values for the four models developed. Adopting the Stepwise for comparative purposes, the Stepwise Modified, DMA 99 and DMA 95 have relative performance of 3%, 8% and 14% 4/7
5 better, respectively. Figure 1. ROC curves of adjusted models. As we are interested in the detection rate and accuracy of developed models, we can use the F-measure to compare the performance among them. In the table 2, we observed which the DMA models showed better performance and, in particular, the DMA 95. Model F-measure DMA 95 (λ=α=0,95) 58% DMA 99 (λ=α=0,99) 47% Stepwise Modified 29% Stepwise 27% Table 2. Adjusted models F-measures. 5/7
6 4 Conclusion The experiment shows that DMA models present better results than logistic regression models in respect to the analysis of the area under the ROC curve (AUC) and F measure. The F measure for the DMA was 58% while the logistic regression model was 29%. For the AUC, the DMA model reached 93% and the classical model reached 84%. Considering the results for DMA models, we can conclude that its updating over time characteristic makes a large difference when it comes to the analysis of fraud data, which undergo behavioral changes continuously. Given all that, its application has been proved to be appropriate for the detection process of fraudulent transactions in the e-commerce environment. References 1. Bolton, R. J. & Hand, D. J. Statistical fraude detection: A review. In Statistical Science 17, (2002). 2. Chan, P. K., Fan, W., Prodromidis, A. L. & Stolfo, S. J. Distributed data mining in credit card fraud detection. IEEE Intelligent Systems 14, (1999). 3. Raftery, A. E., Karny, M. & Ettler, P. Online prediction under model uncertainty via dynamic model averaging: Application to a cold rolling mill. American Statistical Association 52 (2010). 4. McCormick, T. H., Raftery, A. E., Madigan, D. & Burd, R. S. Dynamic logistic regression and dynamic model averaging for binary classification. Biometrics (2012). 5. Madigan, D. & Raftery, A. E. Model selection and accounting for model uncertainty in graphical models using occam s window. American Statistical Association 89, (1994). Washington. 6. PCI-DSS. Payment card industry (pci) - data security standard (2013). URL pcisecuritystandards.org/documents. 7. Chan, P. K. & Stolfo, S. J. Toward scalable learning with non-uniform class and cost distribution: A case study in credit card detection. In Proceeding of the Fourth International Conference on Knowledge Discovery and Data Mining (1998). 6/7
7 8. Gadi, M. F. A. Uma comparação de métodos de classificação aplicados à detecção de fraude em cartões de crédito. Dissertação de Mestrado - Instituto de Matemática e Estatística da Universidade de São Paulo (2006). 9. Gadi, M. F. A., Wang, X. & do Lago, A. P. Comparison with parametric optimization in credit card fraud detection. Seventh International Conference on Machine Learning and Applications (2008). 7/7
Attestation of Compliance for Onsite Assessments Service Providers
Attestation of Compliance Service Providers Payment Card Industry (PCI) Data Security Standard Attestation of Compliance for Onsite Assessments Service Providers Version 2.0 October 2010 Instructions for
More informationGraph mining assisted semi-supervised learning for fraudulent cash-out detection
Graph mining assisted semi-supervised learning for fraudulent cash-out detection Yuan Li Yiheng Sun Noshir Contractor Aug 2, 2017 Outline Introduction Method Experiments and Results Conculsion and Future
More informationIstat s Pilot Use Case 1
Istat s Pilot Use Case 1 Pilot identification 1 IT 1 Reference Use case X 1) URL Inventory of enterprises 2) E-commerce from enterprises websites 3) Job advertisements on enterprises websites 4) Social
More information9. Conclusions. 9.1 Definition KDD
9. Conclusions Contents of this Chapter 9.1 Course review 9.2 State-of-the-art in KDD 9.3 KDD challenges SFU, CMPT 740, 03-3, Martin Ester 419 9.1 Definition KDD [Fayyad, Piatetsky-Shapiro & Smyth 96]
More informationPrototype Selection for Handwritten Connected Digits Classification
2009 0th International Conference on Document Analysis and Recognition Prototype Selection for Handwritten Connected Digits Classification Cristiano de Santana Pereira and George D. C. Cavalcanti 2 Federal
More informationCredit card Fraud Detection using Predictive Modeling: a Review
February 207 IJIRT Volume 3 Issue 9 ISSN: 2396002 Credit card Fraud Detection using Predictive Modeling: a Review Varre.Perantalu, K. BhargavKiran 2 PG Scholar, CSE, Vishnu Institute of Technology, Bhimavaram,
More informationAs a reference, please find a version of the Machine Learning Process described in the diagram below.
PREDICTION OVERVIEW In this experiment, two of the Project PEACH datasets will be used to predict the reaction of a user to atmospheric factors. This experiment represents the first iteration of the Machine
More informationMobile Banking and Payments Emerging Trends and Opportunities
Mobile Banking and Payments Emerging Trends and Opportunities VIDEO 2 Introductions Barry O Connell Banking and Payments Strategy Barry focuses on customer, product and channel strategy for banks and payments
More informationKnowledge Discovery. URL - Spring 2018 CS - MIA 1/22
Knowledge Discovery Javier Béjar cbea URL - Spring 2018 CS - MIA 1/22 Knowledge Discovery (KDD) Knowledge Discovery in Databases (KDD) Practical application of the methodologies from machine learning/statistics
More informationDATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane
DATA MINING AND MACHINE LEARNING Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Data preprocessing Feature normalization Missing
More informationFast or furious? - User analysis of SF Express Inc
CS 229 PROJECT, DEC. 2017 1 Fast or furious? - User analysis of SF Express Inc Gege Wen@gegewen, Yiyuan Zhang@yiyuan12, Kezhen Zhao@zkz I. MOTIVATION The motivation of this project is to predict the likelihood
More informationPCI DSS 3.1 is here. Are you ready? Mike Goldgof Sr. Director Product Marketing
PCI DSS 3.1 is here. Are you ready? Mike Goldgof Sr. Director Product Marketing 1 WhiteHat Security Application Security Company Leader in the Gartner Magic Quadrant Headquartered in Santa Clara, CA 320+
More informationCS423: Data Mining. Introduction. Jakramate Bootkrajang. Department of Computer Science Chiang Mai University
CS423: Data Mining Introduction Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS423: Data Mining 1 / 29 Quote of the day Never memorize something that
More informationStructured Data Security Methodology. Discovering Sensitive Data in Structured Data Sources. San Francisco Chapter
Structured Data Security Methodology Discovering Sensitive Data in Structured Data Sources Agenda 2 Agenda Sensitive Data Security Introduction Find before you Fix Current Approaches Framework and Methodology
More informationOnline Banking Fraud Detection Based on Local and Global Behavior
Online Banking Fraud Detection Based on Local and Global Behavior Stephan Kovach Laboratory of Computer Architecture and Networks Department of Computer and Digital System Engineering, Polytechnic School
More informationNavigating the PCI DSS Challenge. 29 April 2011
Navigating the PCI DSS Challenge 29 April 2011 Agenda 1. Overview of Threat and Compliance Landscape 2. Introduction to the PCI Security Standards 3. Payment Brand Compliance Programs 4. PCI DSS Scope
More informationAIB Merchant Services AIB Merchant Services Quick Reference Guide Verifone
AIB Merchant Services AIB Merchant Services Quick Reference Guide Verifone AIB Merchant Services AIBMS Quick Reference Guide This quick reference guide has been designed to answer the most common queries
More informationJanuary to April Upgrade Guide. Microsoft Dynamics AX for Retail
January to April Upgrade Guide Microsoft Dynamics AX for Retail April 2011 Microsoft Dynamics is a line of integrated, adaptable business management solutions that enables you and your people to make business
More informationInterpretable Machine Learning with Applications to Banking
Interpretable Machine Learning with Applications to Banking Linwei Hu Advanced Technologies for Modeling, Corporate Model Risk Wells Fargo October 26, 2018 2018 Wells Fargo Bank, N.A. All rights reserved.
More informationWeka ( )
Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data pre-processing (filtering) and representation Supervised
More informationPolicy. Sensitive Information. Credit Card, Social Security, Employee, and Customer Data Version 3.4
Policy Sensitive Information Version 3.4 Table of Contents Sensitive Information Policy -... 2 Overview... 2 Policy... 2 PCI... 3 HIPAA... 3 Gramm-Leach-Bliley (Financial Services Modernization Act of
More informationFraud Detection using Machine Learning
Fraud Detection using Machine Learning Aditya Oza - aditya19@stanford.edu Abstract Recent research has shown that machine learning techniques have been applied very effectively to the problem of payments
More informationBusiness Data Analytics
MTAT.03.319 Business Data Analytics Lecture 9 The slides are available under creative common license. The original owner of these slides is the University of Tartu Fraud Detection Wrongful act for financial
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 24 2019 Logistics HW 1 is due on Friday 01/25 Project proposal: due Feb 21 1 page description
More informationNeural Network Method for failure detection. with skewed class distribution
Neural Network Method for failure detection with skewed class distribution K. Carvajal Cuello 1, M. Chacón 1, D. Mery* 2 and G. Acuña 1 1 Departamento de Ingeniería Informática Universidad de Santiago
More informationData Mining: Classifier Evaluation. CSCI-B490 Seminar in Computer Science (Data Mining)
Data Mining: Classifier Evaluation CSCI-B490 Seminar in Computer Science (Data Mining) Predictor Evaluation 1. Question: how good is our algorithm? how will we estimate its performance? 2. Question: what
More informationHidden Markov Model for Credit Card Fraud Detection
Hidden Markov Model for Credit Card Fraud Detection Ankit Vartak #1, Chinmay D Patil *2,Chinmay K Patil #3 #Vidyavardhini s College of Engineering & Technology, Mumbai,Maharashtra,India *Viva Institute
More informationKnowledge Discovery. Javier Béjar URL - Spring 2019 CS - MIA
Knowledge Discovery Javier Béjar URL - Spring 2019 CS - MIA Knowledge Discovery (KDD) Knowledge Discovery in Databases (KDD) Practical application of the methodologies from machine learning/statistics
More informationMinimal Cost Complexity Pruning of Meta-Classifiers
Minimal Cost Complexity Pruning of Meta-Classifiers Andreas L. Prodromidis Salvatore J. Stolfo Department of Computer Science Columbia University Combining multiple models Learning Algorithm Classifier-1
More informationINTRODUCTION... 2 FEATURES OF DARWIN... 4 SPECIAL FEATURES OF DARWIN LATEST FEATURES OF DARWIN STRENGTHS & LIMITATIONS OF DARWIN...
INTRODUCTION... 2 WHAT IS DATA MINING?... 2 HOW TO ACHIEVE DATA MINING... 2 THE ROLE OF DARWIN... 3 FEATURES OF DARWIN... 4 USER FRIENDLY... 4 SCALABILITY... 6 VISUALIZATION... 8 FUNCTIONALITY... 10 Data
More informationPCI DSS. Compliance and Validation Guide VERSION PCI DSS. Compliance and Validation Guide
PCI DSS VERSION 1.1 1 PCI DSS Table of contents 1. Understanding the Payment Card Industry Data Security Standard... 3 1.1. What is PCI DSS?... 3 2. Merchant Levels and Validation Requirements... 3 2.1.
More informationINTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá
INTRODUCTION TO DATA MINING Daniel Rodríguez, University of Alcalá Outline Knowledge Discovery in Datasets Model Representation Types of models Supervised Unsupervised Evaluation (Acknowledgement: Jesús
More informationFAQs. The Worldpay PCI Program. Help protect your business and your customers from data theft
The Worldpay PCI Program Help protect your business and your customers from data theft What is the Payment Card Industry Data Security Standard (PCI DSS)? Do I have to comply? The PCI DSS is a set of 12
More informationEnterprise Miner Tutorial Notes 2 1
Enterprise Miner Tutorial Notes 2 1 ECT7110 E-Commerce Data Mining Techniques Tutorial 2 How to Join Table in Enterprise Miner e.g. we need to join the following two tables: Join1 Join 2 ID Name Gender
More informationPackage smbinning. December 1, 2017
Title Scoring Modeling and Optimal Binning Version 0.5 Author Herman Jopia Maintainer Herman Jopia URL http://www.scoringmodeling.com Package smbinning December 1, 2017 A set of functions
More informationCOMP 465 Special Topics: Data Mining
COMP 465 Special Topics: Data Mining Introduction & Course Overview 1 Course Page & Class Schedule http://cs.rhodes.edu/welshc/comp465_s15/ What s there? Course info Course schedule Lecture media (slides,
More informationnode2vec: Scalable Feature Learning for Networks
node2vec: Scalable Feature Learning for Networks A paper by Aditya Grover and Jure Leskovec, presented at Knowledge Discovery and Data Mining 16. 11/27/2018 Presented by: Dharvi Verma CS 848: Graph Database
More informationSection 1: Assessment Information
Section 1: Assessment Information Instructions for Submission This document must be completed as a declaration of the results of the merchant s self-assessment with the Payment Card Industry Data Security
More informationCS535 Big Data Fall 2017 Colorado State University 10/10/2017 Sangmi Lee Pallickara Week 8- A.
CS535 Big Data - Fall 2017 Week 8-A-1 CS535 BIG DATA FAQs Term project proposal New deadline: Tomorrow PA1 demo PART 1. BATCH COMPUTING MODELS FOR BIG DATA ANALYTICS 5. ADVANCED DATA ANALYTICS WITH APACHE
More informationGold finger: Fingerprints lead biometric authentication
Gold finger: Fingerprints lead biometric authentication The use of fingerprint authentication on smartphones has surged. As of mid-2017, 28 per cent of all smartphone owners aged 16-75 used fingerprint
More informationREDUCING THE RISK OF CARD NOT PRESENT FRAUD
www.globalpaymentsinc.co.uk REDUCING THE RISK OF CARD NOT PRESENT FRAUD 02 03 REDUCING THE RISK OF CARD NOT PRESENT FRAUD INTRODUCTION Many businesses accept Card Not Present (CNP) transactions on a daily
More informationYour guide to the Payment Card Industry Data Security Standard (PCI DSS) banksa.com.au
Your guide to the Payment Card Industry Data Security Standard (PCI DSS) 1 13 13 76 banksa.com.au CONTENTS Page Contents 1 Introduction 2 What are the 12 key requirements of PCIDSS? 3 Protect your business
More informationClustering Large Credit Client Data Sets for Classification with SVM
Clustering Large Credit Client Data Sets for Classification with SVM Ralf Stecking University of Oldenburg Department of Economics Klaus B. Schebesch University Vasile Goldiş Arad Faculty of Economics
More informationComparison of Optimization Methods for L1-regularized Logistic Regression
Comparison of Optimization Methods for L1-regularized Logistic Regression Aleksandar Jovanovich Department of Computer Science and Information Systems Youngstown State University Youngstown, OH 44555 aleksjovanovich@gmail.com
More informationISSN: X International Journal of Advanced Research in Electronics and Communication Engineering (IJARECE) Volume 6, Issue 6, June 2017
A review: Credit card fraud detection using various machines learning algorithm Deepika kaushik 1 (M.Tech scholar) Dr.Indu kashyap 2 (Associate professor) Simple Sharma 3 (Associate professor) Department
More informationList of Exercises: Data Mining 1 December 12th, 2015
List of Exercises: Data Mining 1 December 12th, 2015 1. We trained a model on a two-class balanced dataset using five-fold cross validation. One person calculated the performance of the classifier by measuring
More informationTutorial on Machine Learning Tools
Tutorial on Machine Learning Tools Yanbing Xue Milos Hauskrecht Why do we need these tools? Widely deployed classical models No need to code from scratch Easy-to-use GUI Outline Matlab Apps Weka 3 UI TensorFlow
More informationTable of Contents. PCI Information Security Policy
PCI Information Security Policy Policy Number: ECOMM-P-002 Effective Date: December, 14, 2016 Version Number: 1.0 Date Last Reviewed: December, 14, 2016 Classification: Business, Finance, and Technology
More informationOptimizing Your Analytics Life Cycle with SAS & Teradata. Rick Lower
Optimizing Your Analytics Life Cycle with SAS & Teradata Rick Lower 1 Agenda The Analytic Life Cycle Common Problems SAS & Teradata solutions Analytical Life Cycle Exploration Explore All Your Data Preparation
More informationData Mining with R Programming Language for Optimizing Credit Scoring in Commercial Bank
INTERNATIONAL BLACK SEA UNIVERSITY FACULTY OF COMPUTER TECHNOLOGIES AND ENGINEERING Ph.D. PROGRAM Data Mining with R Programming Language for Optimizing Credit Scoring in Commercial Bank Dilmurodzhon Zakirov
More informationUniversal Representation of a Consumer's Identity Is it Possible? Presenter: Rob Harris, VP of Product Strategy, FIS
Universal Representation of a Consumer's Identity Is it Possible? Presenter: Rob Harris, VP of Product Strategy, FIS Topics Consumer identity why it is important How big a problem is identity fraud? What
More informationPayment Card Industry (PCI) Data Security Standard
Payment Card Industry (PCI) Data Security Standard Attestation of Compliance for Onsite Assessments Service Providers Version 3.2 April 2016 Section 1: Assessment Information Instructions for Submission
More informationPayment Card Industry (PCI) Data Security Standard Self-Assessment Questionnaire A and Attestation of Compliance
Payment Card Industry (PCI) Data Security Standard Self-Assessment Questionnaire A and Attestation of Compliance Card-not-present Merchants, All Cardholder Data Functions Fully Outsourced For use with
More informationData Mining and Knowledge Discovery: Practice Notes
Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2016/11/16 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization
More informationMagento GDPR Frequently Asked Questions
Magento GDPR Frequently Asked Questions Whom does GDPR impact? Does this only impact European Union (EU) based companies? The new regulation provides rules that govern how companies may collect and handle
More informationLecture 25: Review I
Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,
More informationAccess Online. Navigation Basics. User Guide. Version 2.2 Cardholder and Program Administrator
Access Online Navigation Basics User Guide Version 2.2 Cardholder and Program Administrator Contents Introduction... 1 Access Online Overview... 2 How We Gather and Manage Transaction Data in Access Online...
More informationAutomatic Domain Partitioning for Multi-Domain Learning
Automatic Domain Partitioning for Multi-Domain Learning Di Wang diwang@cs.cmu.edu Chenyan Xiong cx@cs.cmu.edu William Yang Wang ww@cmu.edu Abstract Multi-Domain learning (MDL) assumes that the domain labels
More informationPart I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a
Week 9 Based in part on slides from textbook, slides of Susan Holmes Part I December 2, 2012 Hierarchical Clustering 1 / 1 Produces a set of nested clusters organized as a Hierarchical hierarchical clustering
More informationData Mining Classification: Alternative Techniques. Imbalanced Class Problem
Data Mining Classification: Alternative Techniques Imbalanced Class Problem Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Class Imbalance Problem Lots of classification problems
More informationDonor Credit Card Security Policy
Donor Credit Card Security Policy INTRODUCTION This document explains the Community Foundation of Northeast Alabama s credit card security requirements for donors as required by the Payment Card Industry
More informationFREQUENTLY ASKED QUESTIONS
FREQUENTLY ASKED QUESTIONS 1. What is the YES BANK MasterCard SecureCode? The MasterCard SecureCode is a service offered by YES BANK in partnership with MasterCard. This authentication is basically a password
More informationData Mining With Weka A Short Tutorial
Data Mining With Weka A Short Tutorial Dr. Wenjia Wang School of Computing Sciences University of East Anglia (UEA), Norwich, UK Content 1. Introduction to Weka 2. Data Mining Functions and Tools 3. Data
More informationHybrid Feature Selection for Modeling Intrusion Detection Systems
Hybrid Feature Selection for Modeling Intrusion Detection Systems Srilatha Chebrolu, Ajith Abraham and Johnson P Thomas Department of Computer Science, Oklahoma State University, USA ajith.abraham@ieee.org,
More informationMachine Learning Final Project
Machine Learning Final Project Team: hahaha R01942054 林家蓉 R01942068 賴威昇 January 15, 2014 1 Introduction In this project, we are asked to solve a classification problem of Chinese characters. The training
More informationA Comparative Study of Locality Preserving Projection and Principle Component Analysis on Classification Performance Using Logistic Regression
Journal of Data Analysis and Information Processing, 2016, 4, 55-63 Published Online May 2016 in SciRes. http://www.scirp.org/journal/jdaip http://dx.doi.org/10.4236/jdaip.2016.42005 A Comparative Study
More informationMaintaining Trust: Visa Inc. Payment Security Strategy
Maintaining Trust: Visa Inc Payment Security Strategy Ellen Richey 2010 Payments Conference Chicago Federal Reserve Global Electronic Payments Protecting the payment system is a shared responsibility among
More informationCSE 258. Web Mining and Recommender Systems. Advanced Recommender Systems
CSE 258 Web Mining and Recommender Systems Advanced Recommender Systems This week Methodological papers Bayesian Personalized Ranking Factorizing Personalized Markov Chains Personalized Ranking Metric
More informationUser Authentication Best Practices for E-Signatures Wednesday February 25, 2015
User Authentication Best Practices for E-Signatures Wednesday February 25, 2015 Agenda E-Signature Overview Legality, Authentication & Best Practices Role of authentication in e-signing Options and applications
More informationLies, Damned Lies and Statistics Using Data Mining Techniques to Find the True Facts.
Lies, Damned Lies and Statistics Using Data Mining Techniques to Find the True Facts. BY SCOTT A. BARNES, CPA, CFF, CGMA The adversarial nature of the American legal system creates a natural conflict between
More informationOnline Signature Verification Technique
Volume 3, Issue 1 ISSN: 2320-5288 International Journal of Engineering Technology & Management Research Journal homepage: www.ijetmr.org Online Signature Verification Technique Ankit Soni M Tech Student,
More informationMACHINE LEARNING TOOLBOX. Logistic regression on Sonar
MACHINE LEARNING TOOLBOX Logistic regression on Sonar Classification models Categorical (i.e. qualitative) target variable Example: will a loan default? Still a form of supervised learning Use a train/test
More informationNIST. Support Vector Machines. Applied to Face Recognition U56 QC 100 NO A OS S. P. Jonathon Phillips. Gaithersburg, MD 20899
^ A 1 1 1 OS 5 1. 4 0 S Support Vector Machines Applied to Face Recognition P. Jonathon Phillips U.S. DEPARTMENT OF COMMERCE Technology Administration National Institute of Standards and Technology Information
More informationSandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing
Generalized Additive Model and Applications in Direct Marketing Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing Abstract Logistic regression 1 has been widely used in direct marketing applications
More informationSlice Intelligence!
Intern @ Slice Intelligence! Wei1an(Wu( September(8,(2014( Outline!! Details about the job!! Skills required and learned!! My thoughts regarding the internship! About the company!! Slice, which we call
More informationISyE 6416 Basic Statistical Methods - Spring 2016 Bonus Project: Big Data Analytics Final Report. Team Member Names: Xi Yang, Yi Wen, Xue Zhang
ISyE 6416 Basic Statistical Methods - Spring 2016 Bonus Project: Big Data Analytics Final Report Team Member Names: Xi Yang, Yi Wen, Xue Zhang Project Title: Improve Room Utilization Introduction Problem
More informationReal-time Fraud Detection with Innovative Big Graph Feature. Gaurav Deshpande, VP Marketing, TigerGraph; Mingxi Wu, VP Engineering, TigerGraph
Real-time Fraud Detection with Innovative Big Graph Feature Gaurav Deshpande, VP Marketing, TigerGraph; Mingxi Wu, VP Engineering, TigerGraph Speaking Today Gaurav Deshpande VP Marketing, TigerGraph gaurav@tigergraph.com
More informationData Preprocessing. Supervised Learning
Supervised Learning Regression Given the value of an input X, the output Y belongs to the set of real values R. The goal is to predict output accurately for a new input. The predictions or outputs y are
More information90% of data breaches are caused by software vulnerabilities.
90% of data breaches are caused by software vulnerabilities. Get the skills you need to build secure software applications Secure Software Development (SSD) www.ce.ucf.edu/ssd Offered in partnership with
More informationNETWORK FAULT DETECTION - A CASE FOR DATA MINING
NETWORK FAULT DETECTION - A CASE FOR DATA MINING Poonam Chaudhary & Vikram Singh Department of Computer Science Ch. Devi Lal University, Sirsa ABSTRACT: Parts of the general network fault management problem,
More informationTaking Your Application Design to the Next Level with Data Mining
Taking Your Application Design to the Next Level with Data Mining Peter Myers Mentor SolidQ Australia HDNUG 24 June, 2008 WHO WE ARE Industry experts: Growing, elite group of over 90 of the world s best
More informationStatistics 202: Data Mining. c Jonathan Taylor. Outliers Based in part on slides from textbook, slides of Susan Holmes.
Outliers Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Concepts What is an outlier? The set of data points that are considerably different than the remainder of the
More informationPayment Card Industry (PCI) Data Security Standard
Payment Card Industry (PCI) Data Security Standard Attestation of Compliance for Onsite Assessments Service Providers Version 3.2 April 2016 Section 1: Assessment Information Instructions for Submission
More informationOn classification, ranking, and probability estimation
On classification, ranking, and probability estimation Peter Flach 1 and Edson Takashi Matsubara 2 1 Department of Computer Science, University of Bristol, United Kingdom Peter.Flach@bristol.ac.uk 2 Instituto
More informationData Classification, Security, and Privacy
Data Classification, Security, and Privacy Jennifer Bayuk Securities Industry and Financial Markets Association Internal Audit Division October, 2007 Overview of Information Classification Logical Relationship
More informationSection 1: Assessment Information
Section 1: Assessment Information Instructions for Submission This document must be completed as a declaration of the results of the merchant s self-assessment with the Payment Card Industry Data Security
More informationPackage gbts. February 27, 2017
Type Package Package gbts February 27, 2017 Title Hyperparameter Search for Gradient Boosted Trees Version 1.2.0 Date 2017-02-26 Author Waley W. J. Liang Maintainer Waley W. J. Liang
More informationBoost your Analytics with Machine Learning for SQL Nerds. Julie mssqlgirl.com
Boost your Analytics with Machine Learning for SQL Nerds Julie Koesmarno @MsSQLGirl mssqlgirl.com 1. Y ML 2. Operationalizing ML 3. Tips & Tricks 4. Resources automation delighting customers Deepen Engagement
More informationSOCIAL MEDIA MINING. Data Mining Essentials
SOCIAL MEDIA MINING Data Mining Essentials Dear instructors/users of these slides: Please feel free to include these slides in your own material, or modify them as you see fit. If you decide to incorporate
More informationEfficient Scalable Multi-Level Classification Scheme for Credit Card Fraud Detection
IJCSNS International Journal of Computer Science and Network Security, VOL.10 No.8, August 2010 123 Efficient Scalable Multi-Level Classification Scheme for Credit Card Fraud Detection Dipti D.Patil, Sunita
More informationThe Iterative Bayesian Model Averaging Algorithm: an improved method for gene selection and classification using microarray data
The Iterative Bayesian Model Averaging Algorithm: an improved method for gene selection and classification using microarray data Ka Yee Yeung, Roger E. Bumgarner, and Adrian E. Raftery April 30, 2018 1
More informationThe Devil is in the Details: The Secrets to Complying with PCI Requirements. Michelle Kaiser Bray Faegre Baker Daniels
The Devil is in the Details: The Secrets to Complying with PCI Requirements Michelle Kaiser Bray Faegre Baker Daniels 1 PCI DSS: What? PCI DSS = Payment Card Industry Data Security Standard Payment card
More informationPrinciples of Machine Learning
Principles of Machine Learning Lab 3 Improving Machine Learning Models Overview In this lab you will explore techniques for improving and evaluating the performance of machine learning models. You will
More informationJune 30, Phyllis Schneider, AAP, Director, Network Rules ᅳ Rules Development & Technical Support
June 30, 2010 TO: FROM: ACH Rulebook Subscribers Phyllis Schneider, AAP, Director, Network Rules ᅳ Rules Development & Technical Support RE: 2010 ACH Rulebook ᅳ Supplement #1-2010 Rules Simplification
More informationPhishing Activity Trends Report August, 2006
Phishing Activity Trends Report, 26 Phishing is a form of online identity theft that employs both social engineering and technical subterfuge to steal consumers' personal identity data and financial account
More informationCSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13
CSE 634 - Data Mining Concepts and Techniques STATISTICAL METHODS Professor- Anita Wasilewska (REGRESSION) Team 13 Contents Linear Regression Logistic Regression Bias and Variance in Regression Model Fit
More informationPenalizied Logistic Regression for Classification
Penalizied Logistic Regression for Classification Gennady G. Pekhimenko Department of Computer Science University of Toronto Toronto, ON M5S3L1 pgen@cs.toronto.edu Abstract Investigation for using different
More informationPayment Card Industry (PCI) Data Security Standard
Payment Card Industry (PCI) Data Security Standard Attestation of Compliance for Onsite Assessments Service Providers Version 3.1 April 2015 Section 1: Assessment Information Instructions for Submission
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 Homework 2 out Announcements Due May 3 rd (11:59pm) Course project proposal
More informationPayment Card Industry (PCI) Data Security Standard
Payment Card Industry (PCI) Data Security Standard Attestation of Compliance for Onsite Assessments Service Providers Version 3.2 April 2016 Section 1: Assessment Information Instructions for Submission
More information