data-based banking customer analytics
|
|
- Dwain Booker
- 5 years ago
- Views:
Transcription
1 icare: A framework for big data-based banking customer analytics Authors: N.Sun, J.G. Morris, J. Xu, X.Zhu, M. Xie Presented By: Hardik Sahi
2 Overview Why Big Data? Traditional versus new ways of handling data Standard data mining techniques. Aspects of customer behavior analytics. Challenges in the Big Data era. icare: Intelligent Customer Analytics for Recognition and Exploration solution design. 7. Example of icare analytical model. 8. Case study of real-life usage of icare
3 Why Big Data? 1. Systems, sensors and mobile devices 2. Everyday 2.5 quintillion bytes of data is generated 3. 90% of digital data today has been produced within past 2 years.
4 Traditional vs new ways of handling data [1] Traditional Latest Banks have worked with structured data that can be easily accessed and used to provide insights into customer behavior. Now, banks have to work with both structured and unstructured data to derive insights from the data it is collecting. Worked with Hard information which can be recorded as numbers and easy to store and transmit in impersonal ways Has to work with both Hard and Soft information e.g. Tweets, Facebook comments etc. Used a sample of internal data and produce periodic reports to make business decisions. Use terabytes of data to make data-driven decisions.
5 Traditional vs new ways of handling data (contd ) Now, availability of large amounts of structured and unstructured data, banks can obtain an enterprise view of customer in a much more comprehensive manner. Hence, integrating predictive analytics with automatic decision making can help banks make better decisions regarding understanding preference of customers, identify customers with high spending potential, promote right products to customers and so on.
6 Standard data mining techniques 1. Classification- Identifying to which category out of a set of categories an unknown instance belongs.[4] 2. Clustering- Assignment of observations to some set such that observations in same cluster are similar in some sense. 3. Regression - Approximating a mapping function from input variables to continuous output variables.[4] 4. Sequence discovery - Find statistically relevant patterns between data examples where values are provided in sequence.[5] 5. Association - Discover the probability of co-occurrence of items in a collection [6]
7 Aspects of customer behaviour analytics Aspect Purpose Techniques Identification Customer segmentation and targeting. Classification and clustering Attraction Adopt specific strategies to attract target customers Retention Improving retention and identifying cause of attrition. Development Customer lifetime value analysis, consistent expansion of transaction intensity, individual customer profitability. Association techniques to find relationship between different products bought by customer over time.
8 Challenges in Big Data era 1. Handle massive amounts of complex data in an efficient and cost-effective way? Traditional systems did not leverage the power of unstructured, soft information. Unable to provide reasonable response times in handling expanding data volumes. Hence, new data analytical models are required to capture the value behind the increasing amount of unstructured, soft information. 2. Effectively generate business value from the analytics and obtain competitive advantages for banks? Understanding of the problem should be combined with problem-solving techniques to improve decision making and bring real business value to a bank.
9 On a different note.. [2]
10 icare: Intelligent Customer Analytics for Recognition and Exploration icare is able to process both structured and unstructured data Provide a unified customer view to yield new and deep insights into customer behavior icare analytical models work on processed data and are customized to focus on specific business problem. Deployed in parallel computing manner to achieve high performance and low response time. Solution can be personalized to cater to a bank s specific business need and data environment. Leverages IBM products- IBM SPSS Analytic Server and IBM InfoSphere BigInsights
11 icare Solution Design Four phases in the icare solution: Data Acquisition Data preparation Data Modeling Business applications. Architecture of icare solution.
12 Data Acquisition Involves getting hold of the structured and unstructured data, converting into appropriate format and storing it on BigInsights platform. Structured data Unstructured data Internal and external data sources like economic, geographic and demographic data. Multiple internal and external sources like log files, social media data etc. Standard input format is defined to ensure consistency and accuracy of the data. Stored as files rather than database tables. All tables are stored in IBM InfoSphere BigInsights platform. Stored on BigInsights platform- Apache Hadoop based platform.
13 Data Preparation Structured Data Unstructured Data Purpose Data preparation to enhance the data quality of data stored in BigInsights. Transform into regularized or schematized form before modeling. Tool Big SQL (Structured Query IBM SPSS Analytic Server (AS) Language) provided by BigInsights Examples Handle incomplete, incorrect or irrelevant data. Reduce the impact of noise Detect outliers Pull and perform queries on data stored in HDFS. Normalize unstructured data.
14 Data preparation (contd ) Once the data is prepared and cleaned, data from multiple sources is merged on BigInsights. Merged data stored in data warehouse where relationships between tables are well-defined Data conflicts due to different sources resolved. Based on data warehouse, hundreds of attributes are associated with each customer On such an integrated data, icare analytical models are built.
15 Data Modeling Based on consolidated data, different analytical models can be built catering to different business scenarios. Two advantages of using icare based models: 1. All statistical and machine learning models in icare are already customized to suit different business needs. E.g. Domain knowledge driven interactive decision tree for customer retention. 2. Parallel computation facility is provided in icare owing to usage of IBM products. E.g. All machine learning algorithms implemented in icare are designed and developed to follow MapReduce programming model.
16 Business Applications GOAL: create deeper understanding of customers and their behaviour to maximize their lifetime value to the bank. Possible applications of customer analytics: customer marketing credit scoring and approval profitable credit card customer identification high-risk loan applicant identification payment default prediction fraud detection Money laundering detection
17 Business Applications (contd ) Fine grained segmentation of customers based on their preference for different sub-branches of the bank. Helps banks to get deeper insights in customer characteristics and preferences Improve customer satisfaction and achieve precision marketing. Donec risus dolor porta venenatis Pharetra luctus felis potential high revenue or loyal customers who are Helps banks identify Proin felisprofitable volutpat to the bank. likelyintotellus become Get a better curated list of potential customers. Improve marketing efficiency and bring huge benefits to the bank. Via analysis of social media, banks can understand what products their customers like Can lead to improvement in customer retention, cross-sell and up-sell
18 Business Applications (contd ) Using demographic, economic and geographic data, spatial distribution of both existing and potential customers is generated. Banks get clear overview of the target customers locations Helps in customer marketing and exploration Based on the banks strategy and spatial distribution of customer resource, this module optimizes the configuration (i.e., location, type) and operations of service channels Banks get clear overview of the target customers locations Maximizing revenue, customer satisfaction. icare has the capability of incorporating many other use cases with its ability to integrate and work with data belonging to different sources.
19 icare analytical model - Customized and parallelized K-means clustering. Classic K-means algorithm: [3] Unsupervised machine learning algorithm. Divide n data points into K clusters (n>>k) Aim is to minimize total distance of points to their cluster centers. Used to reduce complexity and obtain initial insight on data. E.g. Cluster customers based on their profile and transaction information. Issues when classic implementation on Big Data: Clustering result is sensitive to errors and outliers. Identifying tight cluster with closely related data points is more valuable than assigning every data point to a cluster.
20 icare analytical model - Customized & parallelized K-means clustering (contd ) The classic K means algorithm is customized to increase its robustness to outliers and get more meaningful results for banks. Step1: Select K data points as cluster centers. Choose Manhattan distance as the distance metric. Selects a data point with largest minimum distance from defined cluster points as new cluster center (unlike classic K-means) Repeat until we have K cluster centers.
21 icare analytical model - Customized & parallelized K-means clustering (contd ) Step 2: Assign each data point to the closest cluster using standard K-means algorithm. Step 3: Update cluster centers such that new cluster center is the weighted mean of all data points that belong to the cluster. [Unlike classic K-means]
22 icare analytical model - Customized & parallelized K-means clustering (contd ) Step 4: Redistribute points to their closest center and drop any point which is far away from any cluster center. Step 5: Repeat till convergence.
23 Advantages of customized model 1. Using Manhattan distance instead of Euclidean distance makes clustering algo more robust to the presence of outliers. 2. Dropping data points not close to any cluster (w.r.t threshold value) helps to cut down noisy data points. 3. Using parallelized MapReduce model speeds up implementation of model on big data.
24 Case study- icare used in a bank in China Purpose: Transform customers from a traditional service retail channel to online retail in order to reduce operational costs. Higher the online banking customer active index, the lower is the pressure on conventional channel services. 20 TB of data was analyzed to help generate insights for retaining active online banking customers, and identify the customers who were more likely to drop off based on transactional behavior. Based on above information, personalized retention strategies would then be developed to maintain the customer active index.
25 Case study (contd )
26 Case study - Data acquisition phase Structured data: Acquired from: - online banking system - E-payment platform - Enterprise Customer Information Facility (ECIF) system. - Core banking system Had ambiguous definitions, multiple incompatible formats etc. Unstructured data: Acquired from: - online/mobile banking log files Structured information was extracted from log files using SPSS AS. The data was loaded into IBM BigInsights platform.
27 Case study - Data preparation phase BigSQL was used to clean and prepare the data Data imputation by statistical methods. Detecting outliers. Around 200 attributes were generated from different sources like: Personal information : age, gender Account information : application date of the account, what type of business has been opened and the opening date Transaction information: Frequency and recency of transactions. Hence data from multiple sources was merged to provide a uniform view of a customer.
28 Case study - Data modeling phase Models were built to identify customers who had high possibility of becoming inactive in future. Customized decision tree was used.
29 Performance evaluation Baseline: Customers likely to become inactive are chosen randomly from the data set. Model: Significant improvement in the percentage of correct identification. Precision is 1.59 times higher than the baseline result from random selection when a list of 30,000 customers. Performance of customized decision tree
30 Performance evaluation The model ran 12 times faster as a single host for the 4 GB test data sample with 1,600 instances. Comparison of computing time
31 Conclusion Discusses the usage of unstructured data along with traditional structured data. Results can be interpreted as business rules that can help in decision making. Described icare framework live in action.
32 References al/our%20insights/big%20data%20the%20next%20frontier%20for%20innovation/mgi_bi g_data_full_report.ashx J. B. MacQueen, BSome methods for classification and analysis of multivariate observations,[ in Proc. 5th Berkeley Symp. Math. Statist. Probab., 1967, vol. 1, pp /
33 Discussion-Strengths and Weaknesses Strengths Well structured paper with extensive background and motivation information provided. Differentiates between traditional and modern ways of dealing with banking problems. Drives home the idea of usage of icare framework in a real life scenario Weaknesses Paper just outlines preliminary work on icare framework. Does not explain why the precision goes down with increase in number of identified customers Does not explain why the computing time is high for Big Data platform as opposed to single host. Does not present any novel approach in dealing with problems faced while dealing with BigData.
34 Discussion-Related Papers w /nl-en/_acnmedia/pdf-20/ Accenture-Next-Generation-Financial.pdf U. D. Prasad and S. Madhavi, Prediction of churn behavior of bank customers using data mining tools,[ Business Intell. J., vol. 5, no. 1, pp , Jan _big_data_in_financial_services_mai_2013.pdf
35 Discussion-Future work - The framework can be extended to include other modules in addition to the five existing modules. - Since it is based on MapReduce programming paradigm, many interesting projects can be implemented using the framework.
36 Discussion - What all features were extracted from soft information (tweets, facebook etc)? - Why did the authors choose Manhattan distance and not any other distance? - Which other analytical models can be customized to take advantage of parallelisation provided by Hadoop? - To what extent can the icare platform be customized to cater to the needs of individual banks?
37 Thank you
Big Data The end of Data Warehousing?
Big Data The end of Data Warehousing? Hermann Bär Oracle USA Redwood Shores, CA Schlüsselworte Big data, data warehousing, advanced analytics, Hadoop, unstructured data Introduction If there was an Unwort
More informationBIG DATA TESTING: A UNIFIED VIEW
http://core.ecu.edu/strg BIG DATA TESTING: A UNIFIED VIEW BY NAM THAI ECU, Computer Science Department, March 16, 2016 2/30 PRESENTATION CONTENT 1. Overview of Big Data A. 5 V s of Big Data B. Data generation
More informationIntroduction to Data Mining and Data Analytics
1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns
More informationThis tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.
About the Tutorial Data Mining is defined as the procedure of extracting information from huge sets of data. In other words, we can say that data mining is mining knowledge from data. The tutorial starts
More informationINTRODUCTION... 2 FEATURES OF DARWIN... 4 SPECIAL FEATURES OF DARWIN LATEST FEATURES OF DARWIN STRENGTHS & LIMITATIONS OF DARWIN...
INTRODUCTION... 2 WHAT IS DATA MINING?... 2 HOW TO ACHIEVE DATA MINING... 2 THE ROLE OF DARWIN... 3 FEATURES OF DARWIN... 4 USER FRIENDLY... 4 SCALABILITY... 6 VISUALIZATION... 8 FUNCTIONALITY... 10 Data
More informationUNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX
UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX 1 Successful companies know that analytics are key to winning customer loyalty, optimizing business processes and beating their
More informationQuestion Bank. 4) It is the source of information later delivered to data marts.
Question Bank Year: 2016-2017 Subject Dept: CS Semester: First Subject Name: Data Mining. Q1) What is data warehouse? ANS. A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile
More informationBig Data Specialized Studies
Information Technologies Programs Big Data Specialized Studies Accelerate Your Career extension.uci.edu/bigdata Offered in partnership with University of California, Irvine Extension s professional certificate
More informationEmbedded Technosolutions
Hadoop Big Data An Important technology in IT Sector Hadoop - Big Data Oerie 90% of the worlds data was generated in the last few years. Due to the advent of new technologies, devices, and communication
More informationCustomer Clustering using RFM analysis
Customer Clustering using RFM analysis VASILIS AGGELIS WINBANK PIRAEUS BANK Athens GREECE AggelisV@winbank.gr DIMITRIS CHRISTODOULAKIS Computer Engineering and Informatics Department University of Patras
More informationBig Data with Hadoop Ecosystem
Diógenes Pires Big Data with Hadoop Ecosystem Hands-on (HBase, MySql and Hive + Power BI) Internet Live http://www.internetlivestats.com/ Introduction Business Intelligence Business Intelligence Process
More informationProgress DataDirect For Business Intelligence And Analytics Vendors
Progress DataDirect For Business Intelligence And Analytics Vendors DATA SHEET FEATURES: Direction connection to a variety of SaaS and on-premises data sources via Progress DataDirect Hybrid Data Pipeline
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 1 1 Acknowledgement Several Slides in this presentation are taken from course slides provided by Han and Kimber (Data Mining Concepts and Techniques) and Tan,
More informationCOMP 465 Special Topics: Data Mining
COMP 465 Special Topics: Data Mining Introduction & Course Overview 1 Course Page & Class Schedule http://cs.rhodes.edu/welshc/comp465_s15/ What s there? Course info Course schedule Lecture media (slides,
More informationChallenges and Opportunities with Big Data. By: Rohit Ranjan
Challenges and Opportunities with Big Data By: Rohit Ranjan Introduction What is Big Data? Big data is data sets that are so voluminous and complex that traditional data processing application software
More informationOverview of Web Mining Techniques and its Application towards Web
Overview of Web Mining Techniques and its Application towards Web *Prof.Pooja Mehta Abstract The World Wide Web (WWW) acts as an interactive and popular way to transfer information. Due to the enormous
More informationCLUSTERING. CSE 634 Data Mining Prof. Anita Wasilewska TEAM 16
CLUSTERING CSE 634 Data Mining Prof. Anita Wasilewska TEAM 16 1. K-medoids: REFERENCES https://www.coursera.org/learn/cluster-analysis/lecture/nj0sb/3-4-the-k-medoids-clustering-method https://anuradhasrinivas.files.wordpress.com/2013/04/lesson8-clustering.pdf
More informationOracle9i Data Mining. An Oracle White Paper December 2001
Oracle9i Data Mining An Oracle White Paper December 2001 Oracle9i Data Mining Benefits and Uses of Data Mining... 2 What Is Data Mining?... 3 Data Mining Concepts... 4 Using the Past to Predict the Future...
More informationBigInsights and Cognos Stefan Hubertus, Principal Solution Specialist Cognos Wilfried Hoge, IT Architect Big Data IBM Corporation
BigInsights and Cognos Stefan Hubertus, Principal Solution Specialist Cognos Wilfried Hoge, IT Architect Big Data 2013 IBM Corporation A Big Data architecture evolves from a traditional BI architecture
More informationReal-time Scheduling of Skewed MapReduce Jobs in Heterogeneous Environments
Real-time Scheduling of Skewed MapReduce Jobs in Heterogeneous Environments Nikos Zacheilas, Vana Kalogeraki Department of Informatics Athens University of Economics and Business 1 Big Data era has arrived!
More informationA Comparative study of Clustering Algorithms using MapReduce in Hadoop
A Comparative study of Clustering Algorithms using MapReduce in Hadoop Dweepna Garg 1, Khushboo Trivedi 2, B.B.Panchal 3 1 Department of Computer Science and Engineering, Parul Institute of Engineering
More informationExploiting and Gaining New Insights for Big Data Analysis
Exploiting and Gaining New Insights for Big Data Analysis K.Vishnu Vandana Assistant Professor, Dept. of CSE Science, Kurnool, Andhra Pradesh. S. Yunus Basha Assistant Professor, Dept.of CSE Sciences,
More informationOracle Database 11g for Data Warehousing & Big Data: Strategy, Roadmap Jean-Pierre Dijcks, Hermann Baer Oracle Redwood City, CA, USA
Oracle Database 11g for Data Warehousing & Big Data: Strategy, Roadmap Jean-Pierre Dijcks, Hermann Baer Oracle Redwood City, CA, USA Keywords: Big Data, Oracle Big Data Appliance, Hadoop, NoSQL, Oracle
More informationSOLUTION BRIEF BIG DATA SECURITY
SOLUTION BRIEF BIG DATA SECURITY Get maximum value and insight from your Big Data initiatives while maintaining robust data security THE CHALLENGE More and more companies are finding that Big Data strategies
More informationChapter 3. Foundations of Business Intelligence: Databases and Information Management
Chapter 3 Foundations of Business Intelligence: Databases and Information Management THE DATA HIERARCHY TRADITIONAL FILE PROCESSING Organizing Data in a Traditional File Environment Problems with the traditional
More informationPutting it all together: Creating a Big Data Analytic Workflow with Spotfire
Putting it all together: Creating a Big Data Analytic Workflow with Spotfire Authors: David Katz and Mike Alperin, TIBCO Data Science Team In a previous blog, we showed how ultra-fast visualization of
More informationOptimizing Your Analytics Life Cycle with SAS & Teradata. Rick Lower
Optimizing Your Analytics Life Cycle with SAS & Teradata Rick Lower 1 Agenda The Analytic Life Cycle Common Problems SAS & Teradata solutions Analytical Life Cycle Exploration Explore All Your Data Preparation
More informationData warehouse and Data Mining
Data warehouse and Data Mining Lecture No. 14 Data Mining and its techniques Naeem A. Mahoto Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology
More informationADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA
INSIGHTS@SAS: ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA AGENDA 09.00 09.15 Intro 09.15 10.30 Analytics using SAS Enterprise Guide Ellen Lokollo 10.45 12.00 Advanced Analytics using SAS
More informationBringing Data to Life
Bringing Data to Life Data management and Visualization Techniques Benika Hall Rob Harrison Corporate Model Risk March 16, 2018 Introduction Benika Hall Analytic Consultant Wells Fargo - Corporate Model
More informationACCELERATE YOUR ANALYTICS GAME WITH ORACLE SOLUTIONS ON PURE STORAGE
ACCELERATE YOUR ANALYTICS GAME WITH ORACLE SOLUTIONS ON PURE STORAGE An innovative storage solution from Pure Storage can help you get the most business value from all of your data THE SINGLE MOST IMPORTANT
More informationBig Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018
Big Data com Hadoop Impala, Hive e Spark VIII Sessão - SQL Bahia 03/03/2018 Diógenes Pires Connect with PASS Sign up for a free membership today at: pass.org #sqlpass Internet Live http://www.internetlivestats.com/
More informationAn Indian Journal FULL PAPER. Trade Science Inc. Research on data mining clustering algorithm in cloud computing environments ABSTRACT KEYWORDS
[Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 17 BioTechnology 2014 An Indian Journal FULL PAPER BTAIJ, 10(17), 2014 [9562-9566] Research on data mining clustering algorithm in cloud
More informationData Mining Concepts & Tasks
Data Mining Concepts & Tasks Duen Horng (Polo) Chau Georgia Tech CSE6242 / CX4242 Jan 16, 2014 Partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos Faloutsos Last Time
More informationKnowledge Discovery. URL - Spring 2018 CS - MIA 1/22
Knowledge Discovery Javier Béjar cbea URL - Spring 2018 CS - MIA 1/22 Knowledge Discovery (KDD) Knowledge Discovery in Databases (KDD) Practical application of the methodologies from machine learning/statistics
More informationNetApp Autosupport Analysis
NetApp Autosupport Analysis Junwei Da Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2012-158 http://www.eecs.berkeley.edu/pubs/techrpts/2012/eecs-2012-158.html
More informationAnalytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.
Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied
More informationAcquiring Big Data to Realize Business Value
Acquiring Big Data to Realize Business Value Agenda What is Big Data? Common Big Data technologies Use Case Examples Oracle Products in the Big Data space In Summary: Big Data Takeaways
More informationYunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace
[Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 20 BioTechnology 2014 An Indian Journal FULL PAPER BTAIJ, 10(20), 2014 [12526-12531] Exploration on the data mining system construction
More informationAccelerate your SAS analytics to take the gold
Accelerate your SAS analytics to take the gold A White Paper by Fuzzy Logix Whatever the nature of your business s analytics environment we are sure you are under increasing pressure to deliver more: more
More informationClustering Analysis based on Data Mining Applications Xuedong Fan
Applied Mechanics and Materials Online: 203-02-3 ISSN: 662-7482, Vols. 303-306, pp 026-029 doi:0.4028/www.scientific.net/amm.303-306.026 203 Trans Tech Publications, Switzerland Clustering Analysis based
More informationOracle9i Data Mining. Data Sheet August 2002
Oracle9i Data Mining Data Sheet August 2002 Oracle9i Data Mining enables companies to build integrated business intelligence applications. Using data mining functionality embedded in the Oracle9i Database,
More informationMachine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
More informationData Mining Course Overview
Data Mining Course Overview 1 Data Mining Overview Understanding Data Classification: Decision Trees and Bayesian classifiers, ANN, SVM Association Rules Mining: APriori, FP-growth Clustering: Hierarchical
More informationData Mining Techniques Methods Algorithms and Tools
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 6.017 IJCSMC,
More informationData Set. What is Data Mining? Data Mining (Big Data Analytics) Illustrative Applications. What is Knowledge Discovery?
Data Mining (Big Data Analytics) Andrew Kusiak Intelligent Systems Laboratory 2139 Seamans Center The University of Iowa Iowa City, IA 52242-1527 andrew-kusiak@uiowa.edu http://user.engineering.uiowa.edu/~ankusiak/
More informationInternational Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16
The Survey Of Data Mining And Warehousing Architha.S, A.Kishore Kumar Department of Computer Engineering Department of computer engineering city engineering college VTU Bangalore, India ABSTRACT: Data
More informationData Preprocessing. Slides by: Shree Jaswal
Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data
More informationThanks to the advances of data processing technologies, a lot of data can be collected and stored in databases efficiently New challenges: with a
Data Mining and Information Retrieval Introduction to Data Mining Why Data Mining? Thanks to the advances of data processing technologies, a lot of data can be collected and stored in databases efficiently
More informationData Mining Technology Based on Bayesian Network Structure Applied in Learning
, pp.67-71 http://dx.doi.org/10.14257/astl.2016.137.12 Data Mining Technology Based on Bayesian Network Structure Applied in Learning Chunhua Wang, Dong Han College of Information Engineering, Huanghuai
More informationBetter Business Decisions at a Lower Cost with IBM InfoSphere BigInsights IBM Redbooks Solution Guide
Better Business Decisions at a Lower Cost with IBM InfoSphere BigInsights IBM Redbooks Solution Guide As activities in our world become more integrated, the rate of data growth is increasing exponentially.
More informationOracle and Tangosol Acquisition Announcement
Oracle and Tangosol Acquisition Announcement March 23, 2007 The following is intended to outline our general product direction. It is intended for information purposes only, and may
More informationComparative analysis of data mining methods for predicting credit default probabilities in a retail bank portfolio
Comparative analysis of data mining methods for predicting credit default probabilities in a retail bank portfolio Adela Ioana Tudor, Adela Bâra, Simona Vasilica Oprea Department of Economic Informatics
More informationData Mining: Approach Towards The Accuracy Using Teradata!
Data Mining: Approach Towards The Accuracy Using Teradata! Shubhangi Pharande Department of MCA NBNSSOCS,Sinhgad Institute Simantini Nalawade Department of MCA NBNSSOCS,Sinhgad Institute Ajay Nalawade
More informationInformatica Enterprise Information Catalog
Data Sheet Informatica Enterprise Information Catalog Benefits Automatically catalog and classify all types of data across the enterprise using an AI-powered catalog Identify domains and entities with
More informationAn Effectual Approach to Swelling the Selling Methodology in Market Basket Analysis using FP Growth
An Effectual Approach to Swelling the Selling Methodology in Market Basket Analysis using FP Growth P.Sathish kumar, T.Suvathi K.S.Rangasamy College of Technology suvathi007@gmail.com Received: 03/01/2017,
More informationCHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES
70 CHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES 3.1 INTRODUCTION In medical science, effective tools are essential to categorize and systematically
More informationEnterprise Miner Tutorial Notes 2 1
Enterprise Miner Tutorial Notes 2 1 ECT7110 E-Commerce Data Mining Techniques Tutorial 2 How to Join Table in Enterprise Miner e.g. we need to join the following two tables: Join1 Join 2 ID Name Gender
More informationStorm Identification in the Rainfall Data Using Singular Value Decomposition and K- Nearest Neighbour Classification
Storm Identification in the Rainfall Data Using Singular Value Decomposition and K- Nearest Neighbour Classification Manoj Praphakar.T 1, Shabariram C.P 2 P.G. Student, Department of Computer Science Engineering,
More informationDATA MINING AND WAREHOUSING
DATA MINING AND WAREHOUSING Qno Question Answer 1 Define data warehouse? Data warehouse is a subject oriented, integrated, time-variant, and nonvolatile collection of data that supports management's decision-making
More informationDesign and Realization of Data Mining System based on Web HE Defu1, a
4th International Conference on Machinery, Materials and Computing Technology (ICMMCT 2016) Design and Realization of Data Mining System based on Web HE Defu1, a 1 Department of Quartermaster, Wuhan Economics
More informationIntroduction to Data Mining
Introduction to JULY 2011 Afsaneh Yazdani What motivated? Wide availability of huge amounts of data and the imminent need for turning such data into useful information and knowledge What motivated? Data
More informationHow to integrate data into Tableau
1 How to integrate data into Tableau a comparison of 3 approaches: ETL, Tableau self-service and WHITE PAPER WHITE PAPER 2 data How to integrate data into Tableau a comparison of 3 es: ETL, Tableau self-service
More informationDATA MINING II - 1DL460
DATA MINING II - 1DL460 Spring 2016 A second course in data mining http://www.it.uu.se/edu/course/homepage/infoutv2/vt16 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,
More informationChapter 6. Foundations of Business Intelligence: Databases and Information Management VIDEO CASES
Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:
More informationBig Data - Some Words BIG DATA 8/31/2017. Introduction
BIG DATA Introduction Big Data - Some Words Connectivity Social Medias Share information Interactivity People Business Data Data mining Text mining Business Intelligence 1 What is Big Data Big Data means
More informationBest practices for building a Hadoop Data Lake Solution CHARLOTTE HADOOP USER GROUP
Best practices for building a Hadoop Data Lake Solution CHARLOTTE HADOOP USER GROUP 07.29.2015 LANDING STAGING DW Let s start with something basic Is Data Lake a new concept? What is the closest we can
More informationAbstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight
ESG Lab Review InterSystems Data Platform: A Unified, Efficient Data Platform for Fast Business Insight Date: April 218 Author: Kerry Dolan, Senior IT Validation Analyst Abstract Enterprise Strategy Group
More information2. Data Preprocessing
2. Data Preprocessing Contents of this Chapter 2.1 Introduction 2.2 Data cleaning 2.3 Data integration 2.4 Data transformation 2.5 Data reduction Reference: [Han and Kamber 2006, Chapter 2] SFU, CMPT 459
More informationTopics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples
Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?
More informationECLT 5810 Clustering
ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping
More informationData Mining at a Major Bank: Lessons from a Large Marketing Application
Data Mining at a Major Bank: Lessons from a Large Marketing Application Petra Hunziker, Andreas Maier, Alex Nippe, Markus Tresch, Douglas Weers, and Peter Zemp Credit Suisse P.O. Box, CH-8070 Zurich Switzerland
More informationIBM C Foundations of IBM Big Data & Analytics Architecture V1.
IBM C2030-136 Foundations of IBM Big Data & Analytics Architecture V1 http://killexams.com/exam-detail/c2030-136 A. Dynamic In-Memory processing, Parallel Vector processing, and Data Tiering B. Actionable
More informationData Platforms and Pattern Mining
Morteza Zihayat Data Platforms and Pattern Mining IBM Corporation About Myself IBM Software Group Big Data Scientist 4Platform Computing, IBM (2014 Now) PhD Candidate (2011 Now) 4Lassonde School of Engineering,
More informationIntroduction to Text Mining. Hongning Wang
Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:
More informationApplying big data analytics in practice
ARISTOTLE UNIVERSITY of THESSALONIKI Applying big data analytics in practice Anastasios Gounaris School of Informatics datalab.csd.auth.gr/~gounaris email: gounaria@csd.auth.gr New data every 1 min 2 What
More informationWKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems
Management Information Systems Management Information Systems B10. Data Management: Warehousing, Analyzing, Mining, and Visualization Code: 166137-01+02 Course: Management Information Systems Period: Spring
More informationTECHNOLOGY BRIEF: CA ERWIN DATA PROFILER. Combining Data Profiling and Data Modeling for Better Data Quality
TECHNOLOGY BRIEF: CA ERWIN DATA PROFILER Combining Data Profiling and Data Modeling for Better Data Quality Table of Contents Executive Summary SECTION 1: CHALLENGE 2 Reducing the Cost and Risk of Data
More informationThe Establishment of Large Data Mining Platform Based on Cloud Computing. Wei CAI
2017 International Conference on Electronic, Control, Automation and Mechanical Engineering (ECAME 2017) ISBN: 978-1-60595-523-0 The Establishment of Large Data Mining Platform Based on Cloud Computing
More informationAPPLICATION OF MULTIPLE RANDOM CENTROID (MRC) BASED K-MEANS CLUSTERING ALGORITHM IN INSURANCE A REVIEW ARTICLE
APPLICATION OF MULTIPLE RANDOM CENTROID (MRC) BASED K-MEANS CLUSTERING ALGORITHM IN INSURANCE A REVIEW ARTICLE Sundari NallamReddy, Samarandra Behera, Sanjeev Karadagi, Dr. Anantha Desik ABSTRACT: Tata
More informationProcessing Unstructured Data. Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd.
Processing Unstructured Data Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd. http://dinesql.com / Dinesh Priyankara @dinesh_priya Founder/Principal Architect dinesql Pvt Ltd. Microsoft Most
More informationObtaining Rough Set Approximation using MapReduce Technique in Data Mining
Obtaining Rough Set Approximation using MapReduce Technique in Data Mining Varda Dhande 1, Dr. B. K. Sarkar 2 1 M.E II yr student, Dept of Computer Engg, P.V.P.I.T Collage of Engineering Pune, Maharashtra,
More informationCluster Analysis. CSE634 Data Mining
Cluster Analysis CSE634 Data Mining Agenda Introduction Clustering Requirements Data Representation Partitioning Methods K-Means Clustering K-Medoids Clustering Constrained K-Means clustering Introduction
More informationData Mining. Jeff M. Phillips. January 12, 2015 CS 5140 / CS 6140
Data Mining CS 5140 / CS 6140 Jeff M. Phillips January 12, 2015 Data Mining What is Data Mining? Finding structure in data? Machine learning on large data? Unsupervised learning? Large scale computational
More information<Insert Picture Here> Introduction to Big Data Technology
Introduction to Big Data Technology The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into
More informationMACHINE LEARNING Example: Google search
MACHINE LEARNING Lauri Ilison, PhD Data Scientist 20.11.2014 Example: Google search 1 27.11.14 Facebook: 350 million photo uploads every day The dream is to build full knowledge of the world and know everything
More informationAn improved MapReduce Design of Kmeans for clustering very large datasets
An improved MapReduce Design of Kmeans for clustering very large datasets Amira Boukhdhir Laboratoire SOlE Higher Institute of management Tunis Tunis, Tunisia Boukhdhir _ amira@yahoo.fr Oussama Lachiheb
More informationEvolving To The Big Data Warehouse
Evolving To The Big Data Warehouse Kevin Lancaster 1 Copyright Director, 2012, Oracle and/or its Engineered affiliates. All rights Insert Systems, Information Protection Policy Oracle Classification from
More informationShine a Light on Dark Data with Vertica Flex Tables
White Paper Analytics and Big Data Shine a Light on Dark Data with Vertica Flex Tables Hidden within the dark recesses of your enterprise lurks dark data, information that exists but is forgotten, unused,
More informationUnsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing
Unsupervised Data Mining: Clustering Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 1. Supervised Data Mining Classification Regression Outlier detection
More informationAn Introduction to Data Mining in Institutional Research. Dr. Thulasi Kumar Director of Institutional Research University of Northern Iowa
An Introduction to Data Mining in Institutional Research Dr. Thulasi Kumar Director of Institutional Research University of Northern Iowa AIR/SPSS Professional Development Series Background Covering variety
More informationOverview of Data Services and Streaming Data Solution with Azure
Overview of Data Services and Streaming Data Solution with Azure Tara Mason Senior Consultant tmason@impactmakers.com Platform as a Service Offerings SQL Server On Premises vs. Azure SQL Server SQL Server
More informationCHAPTER 4: CLUSTER ANALYSIS
CHAPTER 4: CLUSTER ANALYSIS WHAT IS CLUSTER ANALYSIS? A cluster is a collection of data-objects similar to one another within the same group & dissimilar to the objects in other groups. Cluster analysis
More informationChapter 6 VIDEO CASES
Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:
More informationDynamic Data in terms of Data Mining Streams
International Journal of Computer Science and Software Engineering Volume 1, Number 1 (2015), pp. 25-31 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining
More informationEvent Detection through Differential Pattern Mining in Internet of Things
Event Detection through Differential Pattern Mining in Internet of Things Authors: Md Zakirul Alam Bhuiyan and Jie Wu IEEE MASS 2016 The 13th IEEE International Conference on Mobile Ad hoc and Sensor Systems
More informationWhat is Data Mining? Data Mining. Data Mining Architecture. Illustrative Applications. Pharmaceutical Industry. Pharmaceutical Industry
Data Mining Andrew Kusiak Intelligent Systems Laboratory 2139 Seamans Center The University of Iowa Iowa City, IA 52242-1527 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak Tel. 319-335 5934
More informationChee Kiam. to sieve through. and the next one. relevant. The advances in Big. (NLB) of Singapore.
Submitted on: May 31, 2013 Connecting library content using data mining and text analytics on structured and unstructured dataa Chee Kiam Lim Technology and Innovation, National Library Board, Singapore.
More informationWhat is Data Mining? Data Mining. Data Mining Architecture. Illustrative Applications. Pharmaceutical Industry. Pharmaceutical Industry
Data Mining Andrew Kusiak Intelligent Systems Laboratory 2139 Seamans Center The University it of Iowa Iowa City, IA 52242-1527 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak Tel. 319-335
More informationFrequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management
Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management Kranti Patil 1, Jayashree Fegade 2, Diksha Chiramade 3, Srujan Patil 4, Pradnya A. Vikhar 5 1,2,3,4,5 KCES
More information