data-based banking customer analytics

Size: px

Start display at page:

Download "data-based banking customer analytics"

Dwain Booker
5 years ago
Views:

1 icare: A framework for big data-based banking customer analytics Authors: N.Sun, J.G. Morris, J. Xu, X.Zhu, M. Xie Presented By: Hardik Sahi

2 Overview Why Big Data? Traditional versus new ways of handling data Standard data mining techniques. Aspects of customer behavior analytics. Challenges in the Big Data era. icare: Intelligent Customer Analytics for Recognition and Exploration solution design. 7. Example of icare analytical model. 8. Case study of real-life usage of icare

3 Why Big Data? 1. Systems, sensors and mobile devices 2. Everyday 2.5 quintillion bytes of data is generated 3. 90% of digital data today has been produced within past 2 years.

4 Traditional vs new ways of handling data [1] Traditional Latest Banks have worked with structured data that can be easily accessed and used to provide insights into customer behavior. Now, banks have to work with both structured and unstructured data to derive insights from the data it is collecting. Worked with Hard information which can be recorded as numbers and easy to store and transmit in impersonal ways Has to work with both Hard and Soft information e.g. Tweets, Facebook comments etc. Used a sample of internal data and produce periodic reports to make business decisions. Use terabytes of data to make data-driven decisions.

5 Traditional vs new ways of handling data (contd ) Now, availability of large amounts of structured and unstructured data, banks can obtain an enterprise view of customer in a much more comprehensive manner. Hence, integrating predictive analytics with automatic decision making can help banks make better decisions regarding understanding preference of customers, identify customers with high spending potential, promote right products to customers and so on.

6 Standard data mining techniques 1. Classification- Identifying to which category out of a set of categories an unknown instance belongs.[4] 2. Clustering- Assignment of observations to some set such that observations in same cluster are similar in some sense. 3. Regression - Approximating a mapping function from input variables to continuous output variables.[4] 4. Sequence discovery - Find statistically relevant patterns between data examples where values are provided in sequence.[5] 5. Association - Discover the probability of co-occurrence of items in a collection [6]

7 Aspects of customer behaviour analytics Aspect Purpose Techniques Identification Customer segmentation and targeting. Classification and clustering Attraction Adopt specific strategies to attract target customers Retention Improving retention and identifying cause of attrition. Development Customer lifetime value analysis, consistent expansion of transaction intensity, individual customer profitability. Association techniques to find relationship between different products bought by customer over time.

8 Challenges in Big Data era 1. Handle massive amounts of complex data in an efficient and cost-effective way? Traditional systems did not leverage the power of unstructured, soft information. Unable to provide reasonable response times in handling expanding data volumes. Hence, new data analytical models are required to capture the value behind the increasing amount of unstructured, soft information. 2. Effectively generate business value from the analytics and obtain competitive advantages for banks? Understanding of the problem should be combined with problem-solving techniques to improve decision making and bring real business value to a bank.

9 On a different note.. [2]

10 icare: Intelligent Customer Analytics for Recognition and Exploration icare is able to process both structured and unstructured data Provide a unified customer view to yield new and deep insights into customer behavior icare analytical models work on processed data and are customized to focus on specific business problem. Deployed in parallel computing manner to achieve high performance and low response time. Solution can be personalized to cater to a bank s specific business need and data environment. Leverages IBM products- IBM SPSS Analytic Server and IBM InfoSphere BigInsights

11 icare Solution Design Four phases in the icare solution: Data Acquisition Data preparation Data Modeling Business applications. Architecture of icare solution.

12 Data Acquisition Involves getting hold of the structured and unstructured data, converting into appropriate format and storing it on BigInsights platform. Structured data Unstructured data Internal and external data sources like economic, geographic and demographic data. Multiple internal and external sources like log files, social media data etc. Standard input format is defined to ensure consistency and accuracy of the data. Stored as files rather than database tables. All tables are stored in IBM InfoSphere BigInsights platform. Stored on BigInsights platform- Apache Hadoop based platform.

13 Data Preparation Structured Data Unstructured Data Purpose Data preparation to enhance the data quality of data stored in BigInsights. Transform into regularized or schematized form before modeling. Tool Big SQL (Structured Query IBM SPSS Analytic Server (AS) Language) provided by BigInsights Examples Handle incomplete, incorrect or irrelevant data. Reduce the impact of noise Detect outliers Pull and perform queries on data stored in HDFS. Normalize unstructured data.

14 Data preparation (contd ) Once the data is prepared and cleaned, data from multiple sources is merged on BigInsights. Merged data stored in data warehouse where relationships between tables are well-defined Data conflicts due to different sources resolved. Based on data warehouse, hundreds of attributes are associated with each customer On such an integrated data, icare analytical models are built.

15 Data Modeling Based on consolidated data, different analytical models can be built catering to different business scenarios. Two advantages of using icare based models: 1. All statistical and machine learning models in icare are already customized to suit different business needs. E.g. Domain knowledge driven interactive decision tree for customer retention. 2. Parallel computation facility is provided in icare owing to usage of IBM products. E.g. All machine learning algorithms implemented in icare are designed and developed to follow MapReduce programming model.

16 Business Applications GOAL: create deeper understanding of customers and their behaviour to maximize their lifetime value to the bank. Possible applications of customer analytics: customer marketing credit scoring and approval profitable credit card customer identification high-risk loan applicant identification payment default prediction fraud detection Money laundering detection

17 Business Applications (contd ) Fine grained segmentation of customers based on their preference for different sub-branches of the bank. Helps banks to get deeper insights in customer characteristics and preferences Improve customer satisfaction and achieve precision marketing. Donec risus dolor porta venenatis Pharetra luctus felis potential high revenue or loyal customers who are Helps banks identify Proin felisprofitable volutpat to the bank. likelyintotellus become Get a better curated list of potential customers. Improve marketing efficiency and bring huge benefits to the bank. Via analysis of social media, banks can understand what products their customers like Can lead to improvement in customer retention, cross-sell and up-sell

18 Business Applications (contd ) Using demographic, economic and geographic data, spatial distribution of both existing and potential customers is generated. Banks get clear overview of the target customers locations Helps in customer marketing and exploration Based on the banks strategy and spatial distribution of customer resource, this module optimizes the configuration (i.e., location, type) and operations of service channels Banks get clear overview of the target customers locations Maximizing revenue, customer satisfaction. icare has the capability of incorporating many other use cases with its ability to integrate and work with data belonging to different sources.

19 icare analytical model - Customized and parallelized K-means clustering. Classic K-means algorithm: [3] Unsupervised machine learning algorithm. Divide n data points into K clusters (n>>k) Aim is to minimize total distance of points to their cluster centers. Used to reduce complexity and obtain initial insight on data. E.g. Cluster customers based on their profile and transaction information. Issues when classic implementation on Big Data: Clustering result is sensitive to errors and outliers. Identifying tight cluster with closely related data points is more valuable than assigning every data point to a cluster.

20 icare analytical model - Customized & parallelized K-means clustering (contd ) The classic K means algorithm is customized to increase its robustness to outliers and get more meaningful results for banks. Step1: Select K data points as cluster centers. Choose Manhattan distance as the distance metric. Selects a data point with largest minimum distance from defined cluster points as new cluster center (unlike classic K-means) Repeat until we have K cluster centers.

21 icare analytical model - Customized & parallelized K-means clustering (contd ) Step 2: Assign each data point to the closest cluster using standard K-means algorithm. Step 3: Update cluster centers such that new cluster center is the weighted mean of all data points that belong to the cluster. [Unlike classic K-means]

22 icare analytical model - Customized & parallelized K-means clustering (contd ) Step 4: Redistribute points to their closest center and drop any point which is far away from any cluster center. Step 5: Repeat till convergence.

23 Advantages of customized model 1. Using Manhattan distance instead of Euclidean distance makes clustering algo more robust to the presence of outliers. 2. Dropping data points not close to any cluster (w.r.t threshold value) helps to cut down noisy data points. 3. Using parallelized MapReduce model speeds up implementation of model on big data.

24 Case study- icare used in a bank in China Purpose: Transform customers from a traditional service retail channel to online retail in order to reduce operational costs. Higher the online banking customer active index, the lower is the pressure on conventional channel services. 20 TB of data was analyzed to help generate insights for retaining active online banking customers, and identify the customers who were more likely to drop off based on transactional behavior. Based on above information, personalized retention strategies would then be developed to maintain the customer active index.

25 Case study (contd )

26 Case study - Data acquisition phase Structured data: Acquired from: - online banking system - E-payment platform - Enterprise Customer Information Facility (ECIF) system. - Core banking system Had ambiguous definitions, multiple incompatible formats etc. Unstructured data: Acquired from: - online/mobile banking log files Structured information was extracted from log files using SPSS AS. The data was loaded into IBM BigInsights platform.

27 Case study - Data preparation phase BigSQL was used to clean and prepare the data Data imputation by statistical methods. Detecting outliers. Around 200 attributes were generated from different sources like: Personal information : age, gender Account information : application date of the account, what type of business has been opened and the opening date Transaction information: Frequency and recency of transactions. Hence data from multiple sources was merged to provide a uniform view of a customer.

28 Case study - Data modeling phase Models were built to identify customers who had high possibility of becoming inactive in future. Customized decision tree was used.

29 Performance evaluation Baseline: Customers likely to become inactive are chosen randomly from the data set. Model: Significant improvement in the percentage of correct identification. Precision is 1.59 times higher than the baseline result from random selection when a list of 30,000 customers. Performance of customized decision tree

30 Performance evaluation The model ran 12 times faster as a single host for the 4 GB test data sample with 1,600 instances. Comparison of computing time

31 Conclusion Discusses the usage of unstructured data along with traditional structured data. Results can be interpreted as business rules that can help in decision making. Described icare framework live in action.

32 References al/our%20insights/big%20data%20the%20next%20frontier%20for%20innovation/mgi_bi g_data_full_report.ashx J. B. MacQueen, BSome methods for classification and analysis of multivariate observations,[ in Proc. 5th Berkeley Symp. Math. Statist. Probab., 1967, vol. 1, pp /

33 Discussion-Strengths and Weaknesses Strengths Well structured paper with extensive background and motivation information provided. Differentiates between traditional and modern ways of dealing with banking problems. Drives home the idea of usage of icare framework in a real life scenario Weaknesses Paper just outlines preliminary work on icare framework. Does not explain why the precision goes down with increase in number of identified customers Does not explain why the computing time is high for Big Data platform as opposed to single host. Does not present any novel approach in dealing with problems faced while dealing with BigData.

34 Discussion-Related Papers w /nl-en/_acnmedia/pdf-20/ Accenture-Next-Generation-Financial.pdf U. D. Prasad and S. Madhavi, Prediction of churn behavior of bank customers using data mining tools,[ Business Intell. J., vol. 5, no. 1, pp , Jan _big_data_in_financial_services_mai_2013.pdf

35 Discussion-Future work - The framework can be extended to include other modules in addition to the five existing modules. - Since it is based on MapReduce programming paradigm, many interesting projects can be implemented using the framework.

36 Discussion - What all features were extracted from soft information (tweets, facebook etc)? - Why did the authors choose Manhattan distance and not any other distance? - Which other analytical models can be customized to take advantage of parallelisation provided by Hadoop? - To what extent can the icare platform be customized to cater to the needs of individual banks?

37 Thank you

Big Data The end of Data Warehousing?

Big Data The end of Data Warehousing? Hermann Bär Oracle USA Redwood Shores, CA Schlüsselworte Big data, data warehousing, advanced analytics, Hadoop, unstructured data Introduction If there was an Unwort