An Effectual Approach to Swelling the Selling Methodology in Market Basket Analysis using FP Growth P.Sathish kumar, T.Suvathi K.S.Rangasamy College of Technology suvathi007@gmail.com Received: 03/01/2017, Revised: 04/02/2017 and Accepted: 04/04/2017 Abstract Market Basket Analysis (MBA) is an essential component that provides information to retail organizations which helps to improve the profit rate of the marketing. A customer relationship helps to predict the behavior of customers and also finding frequent item sets that enhance the decision-making processes for retaining valued customers. Online Social Networks such as Twitter, LinkedIn and Facebook have attracted large number of people. These various domains provide information about the customer which is mined from the market basket database using the efficient association rule mining. FP-growth method is efficient and scalable and is about an order of magnitude faster than the Apriori algorithm. This paper is to provide efficient way to extract knowledge from various sources that helps to improve the effectiveness of the marketing system. Keywords: Market Basket Analysis, Apriori, Association rule mining, FP-growth Algorithm. 1. Introduction Data mining is defined as a process that uses Knowledge Discovery and Data (KDD) techniques to extract and identify useful information and subsequently gain knowledge from larger databases. Today, the large amount of data is being maintained in the databases in various fields. But it is not necessary that the whole information is useful for the user. In this case data mining provides a computational process of automatically searching large volumes of data to extract knowledge from them in a human-understandable structure, which helps analysts to recognize relationships within data. Applying DM techniques to marketing data is extremely useful to find interesting, previously unknown, hidden patterns, which can be better defined, in massive datasets for extracting useful patterns from larger databases. Companies that has mainly focusing on customer data collection to extract important information from their vast customer databases and product feature databases, in order to gain competitive advantage. The achieved knowledge has a strategic 1
importance in terms of competition and improvement of marketing and production has changed the way relationships between organizations and their customers. It can be used in organizations for decision making and forecasting and one of the most common learning models in data mining that predicts the future customer behavior. Social media helps to mine opinions and interest. In today s world where everyone posts their opinions online, there is a great market to utilize these opinions to form proper information. The people s opinions can be expressed linguistically in different forms called subjectively, emotions, evaluations, beliefs, sentiments and speculations. The sentiment analysis can be done from subjectivity detection both independently or dependently. For example (In order to extract the opinion first of all data is selected and extracted from twitter in the form of tweets. After selecting the data set of the tweets, these tweets were cleaned from emoticons, unnecessary punctuation marks and a database was created to store this data in a specific transformed structure. In this structure, all the transformed tweets are divided into different parts of tweets in the specific field). Those data are used for various organizations to motivate the customers for buying products. 2. Data Mining Techniques 2.1. Knowledge Discovery and Data (KDD) The KDD process is an iterative and interactive sequence of the following steps data samples, on which discovery has to be performed. Preprocessing- it is the process of clean data by performing various operations, such as noise modeling and removal, defining proper strategies for handling missing data fields, accounting for time-sequence information. Transformation- it is in charge of reducing and projecting the data, in order to derive a representation suitable for the specific task to be performed. It is typically accomplished by involving transformation techniques or methods that are able to find invariant representations of the data. Data Mining- which deals with extracting interesting patterns by choosing (i) a specific data mining method or task (e.g., summarization, classification, clustering, regression, and so on), 2
(ii) proper algorithm(s) for performing the task at hand, and (iii) an appropriate representation of the output results. Interpretation/Evaluation- which is exploited by the user to interpret and extract knowledge from the mined patterns, by visualizing the patterns, this interpretation is typically carried out by visualizing the patterns, the models, or the data given such models and, in case, iteratively looking back at the previous steps of the process. 2.2. Customer Relationship Management (CRM) Selection- the main goal of selection is to create a target data set from the original data, i.e., selecting a subset of variables or CRM-data mining framework establishes close customer relationships and manages relationship between organizations and customers in today s advanced world of businesses. Data mining has gained popularity in various CRM applications in recent years and classification model is an important data mining technique useful in the field. It helps to building long term and profitable relationships with valuable customers. The set of processes and other useful systems in CRM help in developing a business strategy and this enterprise approach understands and influences the customer behavior through meaningful communications so that customer acquisition, customer loyalty, customer retention and customer profitability are improved. The key factor in the development of a competitive CRM strategy is the understanding and analyzing of customer behavior and this helps in acquiring and retaining potential customers so as to maximize customer value. It helps organizations to identify valuable customers and predict their future. 3. Association Rule Mining (ARM) Association rule mining is useful for discovering interesting relationships hidden in large data sets. In the following example, there are some transactions of the shop have been taken. Mining any rule of the form X Y, where X and Y are sets of data items Example of Market Basket Transactions 1. Butter, Bread, Burger 2. Milk, Bread, Butter 3. Butter, Milk The Interesting relationships can be represented in the form of association rules as, Milk Bread The above rule shows that there is a strong relationship between milk and butter. It shows that many customers buy milk and 3
4
butter together. These rules can be helpful for retailers to understand buying nature of customers. One of the most popular data mining approaches is to find frequent itemsets from a transaction dataset and derive association rules. The different types of mining such as association rule mining, classification, clustering, etc. Further two basic measures have been discussed for association rules i.e. support and confidence. Support (X): Support of item is the number of times an item occurs in transactions in a database. Confidence: Confidence is a term associated with association rule; it is defined mathematically as, Confidence = Support (X Y)/Support(X) Score (X Y): It is the value which is assigned to attributes of association on the basis of confidence of that association rule. The work of market basket analysis with data mining methods has been implemented based on Six Sigma methodology. The aim is to improve the result and change the sigma performance level of the process. General rule induction (GRI) algorithm was used in this study to establish the association rules. The result was examined after applying association rule mining technique, rule induction technique and Apriori algorithm. Subsequently the results of these three techniques were combined and efforts were made to understand the correct buying behavior of the customer. The work of using market basket analysis and Association rule data mining technique for extracting knowledge using the dataset of supermarket and analyze the daily transactions of the market. The main purpose of this study was to arrange the products of supermarket in such a way so that the profit of supermarket may increase. 3.1. ARM Algorithm This algorithm is trying to capture the changing trends of transactions in Market Basket Analysis. It is based on the basic idea of collaborating Association Rule Miner, Changes in Association Rule Predictor based on some logic to get the strong relationship between the various attributes (i.e. the goods placed in market). The main thrust is on finding the association between various items in transactions. We keep track on the items which are associated with high confidence (i.e. X Y, then confidence = n (X Y)/n(x)). So result of this algorithm will be two sets of association rules: 1. Association rules which are highly predictable for future windows. 2. Outliers (Association Rules which are least probable to come in next windows). Input: Set of Transactions Output: Predicted Association Rules, Outdated Association Rules 5
4. Apriori Algorithm Apriori algorithm is an influential algorithm for mining frequent itemsets for Boolean association rules learning over transactional databases. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger itemsets appear sufficiently often in the database. The frequent itemsets determined by Apriori can be used to determine association rules which highlight general trends in the database, which is used in market basket analysis. It is designed to operate on databases containing transactions (for example, collection of items bought by customers, or details of a website frequentation). Each transaction is seen as a set of items. The Apriori algorithm identifies the itemsets which are subsets of at least transactions in the database. Apriori uses a bottom up approach when frequent subsets are extended one item at a time and group of candidates are tested against the data. It uses breadth-first search and a hash tree structure to count candidate itemsets efficiently. According to the downward closure lemma, the candidate set contains all frequent itemsets. After that, it scans the transaction database to determine frequent itemsets among the candidates. The main advantage of Apriori Algorithm is calculates more sets of frequent items. The algorithm terminates when no further successful extensions are found. 4.1. Performing Market Basket Analysis using Apriori Algorithm The purpose of this analysis is to generate a set of rules that link two or more products together. Rules are statements of the form, If you have the items in itemset (on the left hand side (LHS) of the rule i.e. {i1, i2,...}), then it is likely that a visitor will be interested in the item on the right hand side (RHS i.e. {ik}). In our example above, our rule would be: {TV, Remote} => {Stabilizer} The output of a market basket analysis is generally a set of rules that we can exploit to make business decisions. Each of these rules should have a lift greater than one. In addition, we are interested in the support and confidence of those rules; higher confidence rules are ones where there is a higher probability of items on the RHS being part of the transaction given the presence of items on the LHS. The recommendations based on these rules to drive a higher response rate, for example. It also better off auctioning rules with higher support first, as these will be applicable to a wider range of instances. In this example, helps to perform the analysis for an online retailer to do the classic market basket analysis. 4.2. Disadvantages of Apriori 6
The candidate generation could be extremely slow. It could generate duplicates depending on the implementation. The counting method iterates through all of the transactions each time. Constant items make the algorithm a lot heavier. 5. FP-Growth Algorithm FP-growth (frequent pattern growth) is an improvement of Apriori designed to eliminate some of the heavy bottlenecks in Apriori. It uses an extended prefix-tree (FP- tree) structure to store the database in a compressed form, so it works well with any distributed system. FP-growth adopts a divide-and-conquer approach to decompose both the mining tasks and the databases. It uses a pattern fragment growth method to avoid the costly process of candidate generation and testing used by Apriori. FP-Growth adopts a divide-and- conquer strategy as follows. First, it compresses the database representing frequent items into a frequent-pattern tree or FP-tree, which retains the itemsets association information. It then divides the compressed database into a set of conditional databases. Each associated with one frequent item and mines each such database separately. 5.1. Performing Market Basket Analysis using FP-Growth Algorithm Here we have a simple example for performing market basket analysis using FP- Growth algorithm: A client is named jackey and here we have his transactions: Tjackey= [[beer, bread, butter, milk], [beer, milk, butter], [beer, milk, cheese], [beer, butter, diapers, cheese], [beer, cheese, bread]] 1. The first step is we count all the items in all the transactions Tjackey= [ beer: 5, bread: 2, butter: 3, milk: 3, cheese: 3, diapers: 1] 2. Next we apply the threshold we had set previously. For this example let's say we have a threshold of 30% so each item has to appear at least twice. Tjackey = [ beer: 5, bread: 2, butter: 3, milk: 3, cheese: 3, diapers: 1] 3. Now we sort the list according to the count of each item. Tjackey Sorted = [ beer: 5, butter: 3, milk: 3, cheese: 3, bread: 2] Now we build the tree. We go through each of the transactions and add all the items in the order they appear in our sorted list. Transaction 1 to add= [beer, bread, butter, milk] Transaction 2= [beer, milk, butter] Transaction 3= [beer, milk, cheese] Transaction 4= [beer, cheese, diapers] 4. In order to get the associations now we go through every branch of the tree and only include in the association all the nodes whose count passed the threshold. 7
The above example shows that FP-Growth provides better performance than Apriori in market basket analysis. 5.2. Advantages of FP-Growth This algorithm only needs to read the file twice, as opposed to Apriori who reads it once for every iteration. It removes the need to calculate the pairs to be counted, which is very processing heavy, because it uses th is muc algorith versio 5.3. F 5. e FP-Tree. This makes it O(n) which h faster than Apriori. The FP-Growth m stores in memory a compact n of the database. P-Growth Bottlenecks The biggest problem is the Learning and Data Mining Methods in Diabetes Research, 104 116. [2] Beenish Zahoor, Ratab Gull, Saba Rasheed, Umar Shoaib, Washma Abid, (2016) Pre Processing of Twitter s Data for Opinion Mining in Political Context 1560 1570. to find interesting patterns in databases. In order to obtain the association rules the frequent itemsets must be previously generated. The data are retrieved from the various domains and customer relationship is helpful for organization for identifying customers and improves profit rate of the market. Apriori and FP-Growth algorithm are used to perform these actions in the knowledge data. FP-Growth beats Apriori by far. It has less memory usage and less runtime. FP-Growth is more scalable because of its linear running time. From the experimental data presented it can be concluded that the FP-growth algorithm behaves better than the Apriori algorithm for effective way of increasing the marketing system. References [1] Athanasios Salifoglou, Ioanna Chouvarda, Ioannis Kavakiotis, Ioannis Vlahavas, Nicos Maglaveras, Olga Tsave, (2017) Machine 8
interdependency of data. The interdependency problem is that for the parallelization of the algorithm some that still needs to be shared, which creates a bottleneck in the shared memory. 6. Conclusion The market basket analysis plays a major role in marketing applications, trying [3] Bhatnagar Sakshi, Verma Sheenu, (2014) An Effective Dynamic Unsupervised Clustering Algorithmic Approach for Market Basket Analysis International Journal of Enterprise Computing and Business Systems. [4] Femina Bahari T, Sudheep Elayidom M, (2015) An Efficient CRM-Data Mining Framework for the Prediction of Customer Behavior 725 731. [5] Gupta Savi, Mamtora Roopal, (2014) A Survey on Association Rule Mining in Market Basket Analysis 409-414. [6] Jain.S, Kumar.H, (2015) Analyzing Assembly Election Using Textual Context of Social Media 78-85. [7] Jue Jin, Shui Wang, (2017) RUP/FRUP-Growth: An Efficient Algorithm for Mining High Utility Itemsets, 895 903. [8] Kumar D, Loraine Charlet Annie M.C, (2014) Analysis for a Supermarket based on Frequent Itemset Mining Vol. 9, Issue 5, No 3, 1694-0814. [9] Manpreet Kaur, Shivani Kang, (2016) Identify the changing trends of market data using association rule mining, 78 85. [10] Ravneet Kaur, Sarbjeet Singh, (2016) A survey of data mining and social network analysis based anomaly detection techniques, 199 216. 9