Research and Application of E-Commerce Recommendation System Based on Association Rules Algorithm

Research and Application of E-Commerce Recommendation System Based on Association Rules Algorithm Qingting Zhu 1*, Haifeng Lu 2 and Xinliang Xu 3 1 School of Computer Science and Software Engineering, East China Normal University, Shanghai, China, 200241 2 College of Computer Science and Technology, Donghua University, Shanghai, China, 201620 3 College of Information Technology, Shanghai JianQiao University, Shanghai, China, 201306 Corresponding Author EMAIL: qtzhu@cc.ecnu.edu.cn. Abstract: With the advent of the information age, online shopping has become an integral part of people's lives, but how to help users quickly and effectively choose the product is a big challenge. This paper puts forward the system structure of e-commerce recommendation system, and combines it with the association rules algorithm to reduce the time and space cost of e-commerce recommendation system in practical application. Its innovation is to use the characteristics of association rules, through the analysis of the user's previous data to recommend the user needs. At last, this paper compares the performance gap between the Apriori algorithm and the FP-Growth algorithm, and further improves the value of electronic commerce recommendation system. Keywords: Association rule; Apriori algorithm; FP-Growth algorithm; E-commerce recommendation system 1. Introduction With the rapid development of the information society and the network economy, the e-commerce information presents the explosive growth, which causes the customer to choose the goods to become more and more difficult, so the personalized recommendation system has gradually become an important research content in the field of e-commerce, and it has been widely concerned [1]. E-commerce recommendation system refers to the use of e-commerce sites to provide customers with product information and advice to help customers decide what products to buy, it can simulate the sales staff to help customers complete the purchase process. Its greatest advantage is to analyze and deal with the user's interest preferences, then take the initiative to make personalized recommendations for users through the collection of user's browsing history and operating records [2]. Personalized recommendation is to analyze the user's consumption habits and preferences from a large number of operational records, in order to recommend to the user according to products that user may be interested in, so the realization of personalized recommendation mainly consists of two aspects of the recommendation algorithm and massive data. And with the development of information technology and the extensive use of database technology, the data in e-commerce is increasingly rich and complex, and data is no longer the key factor restricting the development of the recommendation system. So in order to further improve the real-time and accuracy of recommendation results, according to the size of the different data to determine the appropriate recommendation algorithm has become a top priority. By comparing the performance of Apriori algorithm and FP-Growth algorithm, this paper discusses the application of association rules in e-commerce recommendation system [3]. 2. Materials and methods Association rule refers to a set of items and a record set that are given, through the analysis of records collection to derive the correlation between the project. The rule itself is a simple form of "if the condition is a kind of situation, the result is a kind of situation", which can be expressed as the association rule of "X=>Y". The front part X can include one or more conditions, it is necessary to make the second part Y to be true in a given correct rate, and all the conditions in the X must be true at the same time. And the relative important concepts of association rules are support degree and confidence degree. Support degree If there are two sets of A and B, then the support degree refers to the probability of the union set of A and B in all affairs D. That is, and the D represents the total transaction set, the Num () represents the number of specific items in the transaction set [4]. Confidence degree The confidence degree refers to the probability of, the probability of the item set B appears when the set A appears in the transaction set D. That is the possibility of containing the A in the project that contains the B, which is expressed in the form of a transaction set [5]., where the Num () represents the number of specific items in 3. Results In general, e-commerce recommendation system mainly has three aspects: Changing viewers of the e-commerce website to buyers; Improving the cross selling ability of e-commerce website; Improving the customer's loyalty to the e-commerce website. Therefore, the excellent recommendation system design will determine the success or failure of the e-commerce website to a certain extent. 3.1 System structure E-commerce recommendation system consists of three modules: input module, recommendation module and output module [6]: Input module The input module is mainly used to collect and update the business data, which is generally from the two parts of the commodity groups and individual customers. The data of commodities group mainly includes the purchase history, overall evaluation and grading, and other forms of collective data. And the data of individual customers mainly include the implicit browsing input, explicit browsing input, customer evaluation, individual purchase history and other forms of data. Recommendation module Journal of Residuals Science & Technology, Vol. 13, No. 5, 2016 122.1

The recommendation module is the core of the personalized recommendation system, which determines the performance of the system. So the accurate and effective algorithm has a great impact on the success of e-commerce sites. Output module The output module is responsible for the feedback recommendation results to customers. Its output forms are diverse, mainly divided into individual score output, related commodity output and message notification output. 3.2 Workflow The working flow of e-commerce recommendation system based on association rules is shown in Figure 1. Figure. 1 Working flow chart of e-commerce recommendation system based on association rules Data collection and preprocessing Customers log in e-commerce site firstly to register in order to form a personal information table, at this time the user can choose to shopping or browse commodity information, then the goods can be evaluated if shopping is successful. In this process, the system will record the user's browsing history, purchase history and personal evaluation data, these data must be preprocessed before writing to the database. Generally speaking, the mining data of association rules is transaction data, which is composed of transaction code, commodity code and so on. At the same time, in order to improve the analysis efficiency, the commodity description information which the commodity code corresponds needs to carry on the corresponding migration. Therefore, in the process of preprocessing data to complete the transformation of the transaction set and the project set, in order to provide follow-up data to meet the requirements of the mining [7]. Analyzing and establishing customer files According to the customer's operation record form the specific customer information table, which should include the basic information such as the customer's name, age, gender, occupation, income and so on. At the same time, it should also include customer transaction information such as the purchase amount, time, quantity, type of goods, etc. The customer's behavior model is established based on this, which is used to analyze the customer's character and buying habits. The formation of association rules On the basis of the analysis of a large number of transaction data, combined with customer files to analyze the corresponding customer's buying habits and patterns. In order to reduce the number of rules and improve the efficiency of the model and the accuracy of the model by setting the appropriate correlation degree and confidence threshold. Finally, the generated association rules are merged to form the rule base. Sales management According to the type and the size of e-commerce sites to design different recommendation algorithms, combined with rules that have been formed to automatically develop a set of different customers for different customers. On the one hand to particular customers for sales of related products. On the other hand, the recommended model is added to the sales knowledge base for analysis and processing, in order to develop a number of marketing programs for similar customers, to promote the profitability and competitiveness of e-commerce websites [8]. 4. Discussion Journal of Residuals Science & Technology, Vol. 13, No. 5, 2016 122.2

The typical algorithm of association rules is Apriori algorithm and FP-Growth algorithm, both of which can be mining the relationship between the data to generate rule base, in order to realize the electronic commerce website product recommendation. 4.1 Apriori algorithm Steps of Apriori algorithm Apriori algorithm is to find frequent item sets by increasing the number of item sets, its principle is generally based on the recursive algorithm of the two stage frequent set theory. The basic idea of Apriori algorithm is to first find out all the frequent itemsets, and its support is asked to not lower than the predefined minimum support degree min_sup. Then the frequent item sets generate strong association rules, which must satisfy the minimum confidence min_con. Finally, the association rules are selected, and only those rules are the only effective. The specific steps of the algorithm are as follows [9]: Scanning the transaction set D, then compared with the minimum support threshold min sup to generate frequent 1-itemsets; Candidate (k+1) itemsets are generated by joining frequent K-itemsets with themselves. Calculating the support degree for each candidate options. If the result is greater than minimum support degree min_sup, adding it to frequent (K + 1) - itemsets, which the initial state of frequent (K + 1) sets is empty. If the frequent (K + 1) set is the empty set, frequent k-itemsets are maximal frequent item set, then jump to step (5) to generate association rules; otherwise jump to step (2) continue to cycle. Generating the rule based on frequent itemsets that are generated in step (4), that is the association rule X=>Y satisfies support(x=>y) min_supp and confidence(x=>y) min_conf, the strong association rules that is screened out can actually be applied to various fields. Advantages and disadvantages of Apriori algorithm As one of the most influential algorithms of mining Boolean association rules, Apriori algorithm has the characteristics of simple, easy to understand, low data requirements, but its shortcomings are obvious, which has the following disadvantages especially when dealing with massive data sets: There are too many combinations of loops in each step to generate candidate itemsets, and there is no rule that you should not participate in an elements of combination; Every time when calculating the support degree of itemsets, all records in the database D need to be scanned. If it is a large database, it will greatly increase the I/O overhead of the computer system. 4.2 FP-Growth algorithm Steps of FP-Growth algorithm FP-Growth algorithm is a mining algorithm which does not generate candidate sets, and it is highly compressed by constructing a data structure FP-Tree. In the process of generating association rules in FP-Growth, it needs to scan the database for two times: The first scan database is to get frequent 1- item sets. The second scan database is to get a non-frequent set in the database by filtering the frequent 1- itemset, then generating the FP-Tree. At last, the association rules are generated by FP-Tree. The specific steps of the algorithm are as follows [10]: Scanning the transaction database D once to get frequent items set F and the support degree of each frequent item, and then construct FPtree through the frequent items set F ordered by the support degree. Mining FP-tree. The condition of FP-tree is constructed by constructing its conditional mode based on the frequent pattern of each length of 1, and then using the recursive method to mine the tree. Mode growth through the suffix mode and the conditions of the FP-tree generated frequent patterns connected to achieve. FP-tree mining until its results are empty, or it contains only one path, this path can generate a combination of all sub paths, and each combination is a frequent pattern. Once frequent item sets can be found through the database D in the transaction, you can generate strong association rules based on it. Advantages and disadvantages of FP-Growth algorithm FP-Growth algorithm has solved the great defects of the Apriori algorithm, it has the following advantages: the FP-Growth algorithm only needs to scan the database for two times, which greatly reduces the I/O overhead, especially when dealing with a large number of data it has a very high efficiency. the FP-Growth algorithm does not generate candidate itemsets, so its computation is greatly reduced. FP-growth algorithm divides the original mining task into a set of tasks that can be searched for a particular frequent pattern in a finite condition database, thus reducing the search space. Of course, FP-Growth also has its drawbacks: Because FP - Growth is calculated after the data compressed into memory, so it is required for high memory capacity. FP-Growth algorithm generates the FP-Tree that may be irregular, so it is easy to appear in the shape of the tree dwarf. Compared with the Apriori algorithm, the FP-Growth algorithm is more complex, and its implementation process is also very complex. 5. Simulation experiment The simulation experiment of this paper is to use association rules to analyze the historical data of online shopping, in order to compare the superiority of the FP-Growth algorithm to the Apriori algorithm, verify the correctness of the principle of the two algorithms, and improve the real-time and accuracy of e-commerce recommendation system. The experiment is implemented by the python language, it runs on the computer of Mac OS, and the experimental computer is configured as 16GB memory and Core i7 Intel processor. Experimental data is from web crawler collating. Part of the data as shown in Figure 2. Figure.2 Partial data graph Journal of Residuals Science & Technology, Vol. 13, No. 5, 2016 122.3

In the test, in order to compare the time difference between the two algorithms in different orders and different support threshold, so this experiment will be divided into two parts. In the first part, the support of the two algorithms is 0.3, with an exponential increase in orders of magnitude, its final running time is summarized as shown in table 1; In the second part, the number of the two algorithms is 1 million, but the support degree is different, and its final running time is summarized as shown in Table 2, where Time1 represents the time spent on the Apriori algorithm, Time2 represents the time spent on the FP-Growth algorithm. Table 1 The comparison of the two algorithms spend time in the first part Data size/ten thousand Time1/s Time2/s 0.01 0.006 0.032 0.1 0.038 0.047 1 0.438 0.166 10 4.528 1.123 100 51.384 11.011 Table 2 The comparison of the two algorithms spend time in the second part Support degree Time1/s Time2/s 0.3 51.384 11.011 0.4 34.650 10.846 0.5 16.367 9.728 0.6 14.064 9.459 0.7 12.280 9.409 In order to make the results more clear and intuitive, the data of two tables can use a broken line to represent the time difference, the results are shown in figure 3 and figure 4. Figure.3 The line chart of the two algorithms spend time in the first part Journal of Residuals Science & Technology, Vol. 13, No. 5, 2016 122.4

Figure.4 The line chart of the two algorithms spend time in the second part The experimental results show that with an exponential increase in orders of magnitude in the first part, the growth rate of FP-Growth algorithm is much less than that of the Apriori algorithm. The experimental results are consistent with the characteristics of the two algorithms. Because Apriori algorithm needs to generate a large number of candidate itemsets, so it is not good at dealing with large amounts of data. And The FP-Growth algorithm has improved the defects of this aspect, so it has a great advantage in dealing with a large number of data. In the second part of the experimental results, with the growth of support degree threshold, the time decreasing amplitude of Apriori algorithm is far greater than that of FP-Growth algorithm. Because with the support threshold increase in the Apriori algorithm, the number of candidate items ets will be reduced, so the time spent of disk I/O will reduce. And characteristics of the FP-Growth decide to scan the database for only two times, so there is this experimental phenomenon. According to the experimental results, with the growing e-commerce data, FP-Growth has a significant advantage over the Apriori algorithm in a large number of data processing, so the use of this algorithm can be more timely and efficient to recommend to the user. 6. Conclusion At first, this paper analyzes the e-commerce recommendation system framework and work flow. Subsequently through analyzing the processing steps of Apriori algorithm and FP growth algorithm, the performance difference of the two kinds of algorithms are compared by experiments. Finally using this result to draw a conclusion that FP-Growth algorithm is suitable for the analysis and processing of the growing e-commerce data. References [1] Wang Y, Ou H Y, Zhang J M. Design and Implementation of E-Commerce Recommendation System Based on Ontology Technology. Advanced Materials Research, 2014, 978, pp.244-247. [2] Parikh V, Shah P. E-commerce Recommendation System using Association Rule Mining and Clustering, International Journal of Innovations & Advancement in Computer Science. 2015. [3] Wu Z, Wang Q. Application Research in E-Commerce Recommendation System of Web Mining Technology. Atlantis Press, 2014. [4] Almaolegi M, Arkok B. An Improved Apriori Algorithm for Association Rules. Eprint Arxiv, 2014, 3(1). [5] Qi S, Wong C U I. An Application of Apriori Algorithm Association Rules Mining to Profiling the Heritage Visitors of Macau[M]// Information and Communication Technologies in Tourism 2015. Springer International Publishing, 2015, pp.139-151. [6] Reshamwala A. Improving Efficiency of Apriori Algorithms for Sequential Pattern Mining. Bonfring International Journal of Data Mining, 2014, 4(1), pp.01-06. [7] Zhang J H. Design and implementation of data mining based on distributed computing. Applied Mechanics & Materials, 2014, 644-650, pp.1702-1705. [8] Zeng Y, Yin S, Liu J, et al. Research of improved FP-Growth algorithm in association rules mining. Scientific Programming, 2015, 2015, pp.1-6. [9] Fadi Thabtah, Suhel Hammoud, Hussein AbdelJaber. Parallel Associative Classification Data Mining Frameworks Based MapReduce. Parallel Processing Letters, 2015, 25(2). [10] Yang X, Lian L. A New Data Mining Algorithm based on MapReduce and Hadoop[J]. International Journal of Signal Processing Image Processing & Pattern Recognition, 2014, 7, pp.131-142. [11] Patil, S., A. Patil and P. Chavan, Earned value management for tracking project progress.int. Eng. Res. Applic., 2012,2, pp. 1026-1029. Journal of Residuals Science & Technology, Vol. 13, No. 5, 2016 122.5