Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques

Enhancement of Infrequent Purchased Product Recommendaton Usng Data Mnng Technques Noraswalza Abdullah, Yue Xu, Shlomo Geva, and Mark Loo Dscplne of Computer Scence Faculty of Scence and Technology Queensland Unversty of Technology, 2 George Street Brsbane QLD 4000 noraswalza.abdullah@student.qut.edu.au, {yue.xu,shlomo.geva,m.loo}@qut.edu.au Abstract. Recommender Systems (RS) have emerged to help users make good decsons about whch products to choose from the vast range of products avalable on the Internet. Many of the exstng recommender systems are developed for smple and frequently purchased products usng a collaboratve flterng (CF) approach. Ths approach s not applcable for recommendng nfrequently purchased products, as no user ratngs data or prevous user purchase hstory s avalable. Ths paper proposes a new recommender system approach that uses knowledge extracted from user onlne revews for recommendng nfrequently purchased products. Opnon mnng and rough set assocaton rule mnng are appled to extract knowledge from user onlne revews. The extracted knowledge s then used to expand a user s query to retreve the products that most lkely match the user s preferences. The result of the experment shows that the proposed approach, the Query Expanson Matchng-based Search (QEMS), mproves the performance of the exstng Standard Matchng-based Search (SMS) by recommendng more products that satsfy the user s needs. Keywords: Recommender system, opnon mnng, assocaton rule mnng, user revew. 1 Introducton The large amount of nformaton that s avalable on the Internet leads to an nformaton overload problem [1]. Recommender systems (RS) have emerged to help users deal wth ths problem by provdng product suggestons accordng to ther needs and requrements. Nowadays, recommender systems have been wdely appled by major e-commerce webstes for recommendng varous products and servng mllons of consumers [2]. However, many of the recommender systems are developed for recommendng nexpensve and frequently purchased products lke books, moves and musc. Many of the systems that are currently avalable for searchng nfrequently purchased products lke cars or houses only provde a standard matchng-based search functon, whereby the system retreves products that match exactly wth the user s query. Ths query s normy short and does not reflect the user requrements fully. In M. Bramer (Ed.): IFIP AI 2010, IFIP AICT 331, pp. 57 66, 2010. IFIP Internatonal Federaton for Informaton Processng 2010

58 N. Abdullah et al. addton, many users do not have much knowledge about the products, thus, they cannot provde detaled requrements of the product attrbutes or features. Therefore, a recommender system that can predct users preferences from the ntal nput gven by the users s needed for recommendng nfrequently purchased products. Many of the current recommendaton systems are developed usng a collaboratve flterng (CF) approach [2][3][4]. The collaboratve flterng approach utlzes a large amount of ratngs data or users prevous purchase data to make meanngful recommendatons. Ths approach s not sutable for recommendng nfrequently purchased products because there s no prevous users purchase hstory or explct ratngs data about the avalable products, as the products are not often purchased by the users durng ther lfetme, and users are not able to provde ratngs for products they never use. Fortunately, wth the popularty of e-commerce applcatons for sellng products on the web, users are gven more opportunty to express ther opnon on products they prevously owned va the onlne merchant webstes and, as a result, more and more users share revews concernng ther experence wth the products. These revews provde valuable nformaton that can be used by recommender systems for recommendng nfrequently purchased products. Ths paper proposes a recommender system approach that utlzes knowledge extracted from user revews for recommendng nfrequently purchased products. Opnon mnng and rough set assocaton rule mnng are appled to extract knowledge from the user revew data to predct a user s preferences. The knowledge about user s preferences s used to expand a user s query to mprove the recommendaton result. The followng sectons of ths paper are organzed as follows. Frst, the related work wll be brefly revewed n secton 2. Then, the proposed approach wll be dscussed n secton 3. The expermental results and evaluaton wll be dscussed n secton 4. Fny, the concluson wll be gven n secton 5. 2 Related Work Recently, automatc revew mnng and summarzaton of extractng product features values from user revews s becomng a popular research topc [5][6][7]. Revew mnng and summarzaton, also ced opnon mnng, ams at extractng product features on whch the revewers express ther opnon and determnng whether the opnons are postve or negatve [7]. [5] proposed a model of feature-based opnon mnng and summarzaton, whch uses a lexcon-based method to determne whether the opnon expressed on a product feature s postve or negatve. The opnon lexcon or the set of opnon words used n ths method s obtaned through a bootstrappng process usng the WordNet. Then, [6] proposed a technque that performs better than the prevous methods by usng the holstc lexcon-based approach. Ths technque deals wth context dependent opnon words and aggregatng multple opnon words n the same sentence, whch are the two man problems of the exstng technques. Despte the growth n the number of onlne revews and the valuable nformaton that they can provde, not much work has been done on utlzng onlne user revews for creatng recommendatons [4]. [8] employed text mnng technques to extract useful nformaton from revew comments and then mapped the revew comments nto the ontology s nformaton structure, whch s used by the recommender system

Enhancement of Infrequent Purchased Product Recommendaton 59 to make recommendatons. In ther approach, users must nput the features of the product that are most mportant to them and the recommendatons are generated based on the features provded by the users. In contrast, our approach ams to predct users preferences about the product features from the ntal nput gven by them, and use the knowledge to recommend products to the users. The followng secton wll dscuss the proposed approach n detal. 3 Proposed Approach User revews contan wrtten comments expressed by prevous users about a partcular product. Each comment contans a user s opnon or how the user feels about the product s features (e.g. good or bad). Opnon mnng technques are appled on user revews to determne each user s sentmental orentaton towards each feature, whch ndcates whether the user lkes or dslkes the product n terms of ths feature. The over orentaton of each revew s also determned to summarze whether a user s opnon about the product s postve, negatve or neutral. The user s opnons generated from the revews reflect ther vewpont concernng the qualty of the products. A revew wth a postve orentaton ndcates that the revewer (.e. the user) was satsfed wth the product n some aspects. Ths means that at least some attrbutes of ths product were attractve to the user. If we can dentfy these attractve attrbutes for each product, based on these attrbutes we can determne the products that wll be of most nterest to the user. Based on ths dea, we propose to apply assocaton rule mnng technques to generate patterns and assocaton rules from users postve revews. By usng the extracted patterns and assocaton rules for a target user, we can predct the user s preferred product attrbutes and, thus, recommend products that best match the user s preferences. The proposed recommender system approach contans three man processes: ) Opnon mnng to extract a user s sentmental orentatons to the product features from the user onlne revews, summarzng and presentng the revews n a structured format, ) Rough set assocaton rule mnng to generate assocaton rules between the product attrbute values, and ) Query expanson to expand a user s query by usng assocaton rules between product attrbute values. The followng sectons wll provde the defntons of the concepts and enttes nvolved and the specfc problems of ths research. In addton, they wll also explan each process n detal. 3.1 Defntons Ths secton frst defnes the mportant concepts and enttes used n ths paper and then hghlghts the specfc problems that we am to solve. Product Products nclude any type of product or onlne servce for whch users can search for nformaton or purchase. Ths paper focuses partcularly on nfrequently purchased products such as cars or houses. A product p can be represented by two-tuple ( C, F), C = c, c,..., c } s a set of attrbutes representng the techncal characterstcs of the { 1 2 n

60 N. Abdullah et al. product defned by doman experts and F = { f1, f2,... fm} s a set of usage features representng the usage performance of the product defned by doman experts or the users of the product. The usage features are usuy the aspects commented upon by the users of the product. In ths paper, we assume that both the product attrbutes and usage features have been specfed. For example, for the onlne car search doman on whch we conducted our experments, the followng car characterstcs and usage aspects were chosen as the car attrbutes and usage features: C= {Make, Model, Seres, Year, Engne Sze, Fuel System, Fuel Consumpton, Tank Capacty, Power, Torque, Body Type, Seatng Capacty, Standard Transmsson, Drve, Turnng Crcle, Kerb Weght, Dmenson, Wheelbase} F= {Comfort Practcalty, Prce Equpment, Under Bonnet, How Drves, Safety Securty, Qualty Relablty, Servcng Runnng Costs, Aesthetcs Stylng} User Revews For a product, there s a set of wrtten revews about the product gven by users. Each revew conssts of a set of sentences comprsed of a sequence of words. In many e- commerce webstes, the product features to be revewed have been specfed so that users can provde ther comments and opnons on each partcular feature. For revews that are not classfed accordng to any specfc feature, opnon mnng technques can be used to dentfy the product features that are addressed by each sentence n a revew [5]. In ths paper, we assume that the sentences n each revew have been dvded nto groups, each of whch conssts of the sentences that talk about one feature of the product. Let R = { R1, R2,..., Rm} be a revew gven by a user to a product, R s a set of sentences that are comments concernng feature f. By applyng opnon mnng technques, whch wll be dscussed n the next secton, we can generate the user s sentmental orentaton concernng each feature, denoted as O = { o1, o2,..., om} and an over orentaton of the revew O, where o, O { postve, negatve, neutral}. Structured revew A structured revew s a 4-tuple consstng of the sentmental orentatons to a product generated from a revew and the product s attrbutes and features, denoted as sr = ( C, F, O, O ), where C and F are the attrbutes and features of the product, O and O are the sentmental orentatons to the features and the over orentaton of the revew, respectvely. Let SR = sr, sr,..., sr } be a set of structured revews. { 1 2 SR Informaton System Informaton system, I contans 2-tuple of nformaton, denoted as I = ( U, A), where U s a set of objects, and A s a set of attrbutes for each object. In ths paper, U s a set of structured revews and A conssts of the product attrbutes, features, the sentmental orentatons to the features and the over orentaton of the revew,.e. A = c,..., c, f,..., f, o,..., o, O }. { 1 n 1 m 1 m

Enhancement of Infrequent Purchased Product Recommendaton 61 The problems that we am to solve are as follows: ) Gven a user revew R on a product p, the revew has to be summarzed and represented n a structured revew sr. Then from a set of structured revews SR, an nformaton system I has to be generated usng only revews sr that have a postve or neutral over orentaton O { postve, neutral }. ) From the nformaton model I, a set of assocaton rules between product attrbute values c has to be extracted usng rough set assocaton rule mnng to represent users preferences. ) To develop a query expanson technque by utlzng assocaton rules extracted from nformaton model I, to retreve products that best meet the users preferences. 3.2 Opnon Mnng We adopted the approach proposed by [5] to perform the opnon mnng process. The frst task n ths process s to dentfy the sentmental orentatons concernng the features. A user expresses a postve, negatve or neutral opnon o, on each feature f, n a revew R usng a set of opnon words W = { w1, w2,..., wn}. To fnd out opnon words used by the user (e.g. good, amazng, poor etc.) that express hs/her opnon on a product feature f, adjectves used by the user n a revew R are extracted. The orentaton of each opnon word ow { negatve, postve} s then dentfed by utlzng the adjectves synonym set and antonym set n WordNet [9]. In WordNet, adjectves share the same orentaton as ther synonym and opposte orentatons as ther antonyms. To predct the orentaton ow of a target adjectve word w, a set of common adjectves wth known orentaton S = { s1, s2,..., sn} ced as seed adjectves, and WordNet are searched to fnd the word s synonym or antonym wth the known orentaton. If the word s synonym s found, the word s orentaton s set to the same orentaton as ts synonym and the seed lst s updated. Otherwse, f the word s antonym s found, the word s orentaton s set to the opposte of the antonym and s added to the seed lst. The process s repeated for the target words wth unknown orentaton and the words orentatons are dentfed usng the updated seed lst. Fny, the sentmental orentaton o of each feature f s dentfed by fndng the domnant orentaton of the opnon words n the sentence through countng the number of postve opnon words ow {postve} and the negatve opnon words ow {negatve}, for a revew R. If the number of postve opnon words s more than the negatve opnon words, the orentaton o of the feature f s postve o {postve}, otherwse negatve o {negatve}. If the number of postve opnon words equals the negatve opnon words, the orentaton o of the feature f s neutral o {neutral}. Fny, opnon summarzaton s performed to determne the over orentaton of each revew R and represent the revew n a structured revew sr = C, F, O, O ). (

62 N. Abdullah et al. O s determned by calculatng the number of postve features o {postve}, neutral features o {neutral}, and negatve features o {negatve} for the revew. If the number of postve features and neutral features s more than negatve features, the over orentaton O for the revew s postve O {postve}, otherwse negatve O {negatve}. If the number of postve features and neutral features s equal to the negatve features, the over orentaton O for the revew s neutral O {neutral}. 3.3 Rough Set Assocaton Rule Mnng Standard onlne product search engnes perform a match process to fnd products that satsfy a user s query, whch usuy conssts of the product attrbutes or characterstcs that the users are lookng for. However, many users do not have suffcent knowledge about the product and may not know the exact product attrbutes. Therefore, the attrbutes n the query may not be the rght attrbutes to query. Onlne user revews are provded by users who have used the product and the opnons about the product reflects the users vewponts concernng the product based on ther experence of usng the product. The products that are postvely revewed must possess attractve attrbutes or characterstcs that pleased ther users. Based on ths ntuton, we propose to fnd the assocatons between the product attrbutes from the users revews that have a postve orentaton. These assocatons can be used to predct users preferences to product attrbutes. In ths paper, we utlze the rough set assocaton rule mnng approach [10] to fnd hdden patterns n data and generate sets of assocaton rules from the data. We chose the rough set assocaton rule mnng technque because t ows us to easly select the condton and decson attrbutes of the rule. Rough set data analyss starts from a data set that s also ced a decson table or an nformaton system. In the table, each row represents an object, each column represents an attrbute, and entres of the table are attrbute values. An attrbute can be a varable or an observaton or a property, etc. As we have defned above, an nformaton system s wrtten as I = ( U, A), n ths paper, U s a set of structured revews and A conssts of the product attrbutes, features, and the sentmental orentatons,.e. A = { c1,..., cn, f1,..., fm, o1,..., om, O}. In ths paper, the nformaton system I = ( U, A) s created from the structured revews wth postve/neutral orentaton. Let sr SR be a structured revew, sr (a) be the value of attrbute a A, U = { sr sr SR, sr( O ) { postve, neutral}} s the set of objects n the table. The nformaton system contans attrbute values for a set of products that have receved good comments from the revewers. The next step n rough set assocaton rule mnng s to partton the nformaton system nto two dsjonted classes of attrbutes, ced condton C and decson D attrbutes. The nformaton system s then ced a decson table S = ( U, C, D), where C and D are dsjonted sets of condtons and decson attrbutes, respectvely. The condton and decson attrbutes are selected from product attrbutes C and features F n A n the nformaton system I. The attrbutes chosen as the condton are the product attrbutes or features that are usuy provded by a user as the ntal nput n a query and the decson contans other attrbutes and features of the products. For

Enhancement of Infrequent Purchased Product Recommendaton 63 example, for the onlne car search on whch we conducted our experments, the car make, model, prce, etc are chosen as the condton. Then, assocaton rules are generated from the decson table through determnng the decson attrbutes values based on the condton attrbute values. The assocaton rules between the product attrbutes values show the relatonshp between the ntal product attrbute values gven by a user wth other product attrbutes n whch the user may be nterested. Thus, these assocaton rules can be used to represent the user s preferences to retreve products that wll most lkely ft the user s requrements. 3.4 Query Expanson The query expanson process ams to mprove the ntal user s query n order to retreve more products that mght ft the user s requrements. A user s query Q s represented by a set of terms Q = { q1, q2,..., qn} that the user provdes to the search engne. In the product search, the terms n the query are attrbute values of the product that the user s lookng for. The query, genery, s very short and lacks suffcent terms to present the user s actual preferences or needs. Query expanson nvolves addng new attrbute values E = { e1, e2,..., en} to the exstng search terms Q = { q1, q2,..., qn} to generate an expanded query EQ = { eq eq ( E Q)}. The attrbute values eq EQ are used to retreve products to recommend to the user. All products that have attrbute values that match wth any attrbute values of the expanded query eq are selected as the canddate products CP = { cp1, cp2,..., cpn}. The smlarty between each product cp CP and the expanded query EQ s calculated by matchng each attrbute value of the product query. The smlarty value v for the product calculated as the total number of attrbute values of the product that match wth the attrbute values n the expanded query. Then, the products are ranked based on the smlarty value. The top-n products are recommended to the user based on ther rankng. The proposed system may retreve products that exactly match the user s nput, as well as other products n whch the user may be nterested by predctng the user s preferences from hs/her ntal nput. cp wth the value of the same product attrbute eq n the expanded cp wth the expanded query EQ s 4 Experment and Evaluaton 4.1 Experment Method A case study was conducted for the cars doman. Data was collected from one of the car sellng webstes that contans revews provded by users for cars prevously owned by them. The dataset contans 5,504 revews and 3,119 cars. Onlne revews on cars from prevous users were used to extract rules between attrbute values. The opnon mnng technque was frst appled to generate structured revews from the onlne revews. Then ROSETTA [11], a rough set mnng tool was used for extractng rules from the nformaton system generated from the structured revew. Four processes were nvolved n extractng rules ) Data pre-processng and attrbute selecton, whch

64 N. Abdullah et al. ncluded preparng a dataset and selectng mportant attrbutes to be ncluded n the decson tables, ) Data categorzaton, whch nvolved transformng the attrbute values nto categorcal values to reduce the dmensonalty of the data and to reduce the number of rules generated, ) Selecton of decson and condton attrbutes and v) Rule nducton, whch generated rules representng assocatons between query terms (e.g. car make and model) and other product attrbutes (e.g. seres year, prce new, engne sze, fuel system etc.) from the decson table. The example of the rule s shown below: CarMake(Toyota), CarModel(Camry) -> Year(>2000_and<=2005), Prce(>30000_and<=50000), EngneSze(>1.6L_and<=3.0L), Seat(4_5), BodyType(SEDAN), Drve(FWD), FuelSystem(FUEL_INJECTED), FuelConsumpton(>9.0L_and<=11.5L), TankCapacty(>51L_and<=70L), StandardTransmsson(4A), Power(>82Kw_and<=146Kw), Torque(>150Nm_and<=284Nm), TurnngCrcle(>11.20m), Wheelbase(>2710mm), KerbWeght(>1126Kg_and<=1520Kg), Dmenson(>4794) Two search technques were developed - the Standard Matchng-based Search (SMS) and the Query Expanson Matchng-based Search (QEMS). The SMS technque retreves cars that match wth a user s query terms exactly. In addton, the QEMS technque retreves cars based on the expanded query of the user s preferences predcted from the rules generated from rough set assocaton rule mnng process. For evaluatng the proposed approach, prevous users navgaton hstory from the exstng system log data was used as a testng data. A sequence of cars vewed by each user was generated from the log data. The frst car n each user s sequence was chosen as the nput and some of the attrbutes of the car, such as car make and car model, were consdered as the query for that user. The other cars n the sequence were consdered as the cars that the user s nterested n and they were used as the testng cars to test whether the two search engnes recommend these cars to the user. For each query (.e. the frst car of a user), cars recommended by both systems for a dfferent number of the top retreved results (N=10, 20, 30, 40 and 50) were compared wth the testng cars for that user. The rec and precson values for each user were calculated for both technques by usng the followng formula: NM rec = NT precson = NM NR Where NM s the number of cars retreved that match wth the testng cars, NT s the number of testng cars, and NR s the number of retreved cars. Fny, the average rec and precson values for users were calculated for both technques. 4.2 Results and The graphs n Fgure 1 and Fgure 2 show the evaluaton results of the proposed approach. The evaluaton results show that the Query Expanson Matchng-based Search (QEMS) outperformed the Standard Matchng-based Search (SMS), n that ths approach can retreve more car models that the users are nterested n wthout requrng much effort from them. The expanded query can mprove the retreval performance of the Standard Matchng-based Search as t provdes more keywords to represent a user s requrements or preferences.

Enhancement of Infrequent Purchased Product Recommendaton 65 Fg. 1. Precson for dfferent number of top retreved results of the SMS and QEMS Fg. 2. Rec for dfferent number of top retreved results of the SMS and QEMS 5 Concluson We have proposed a recommender system approach for recommendng nfrequently purchased products by utlzng user revews data. The evaluaton result shows that our recommendaton approach leads to recommendatons novelty or serendpty, where more unexpected or dfferent tems that meet the users nterests wll be recommended to the users. Ths approach s able to predct a user s preferences and may suggest more products that ft the user s requrements and, also, may help onlne vendors promote ther products. In future work, we ntend to utlze sentmental orentatons of the features for mprovng the product recommendatons. Acknowledgments. Ths paper was party supported by the Smart Servces CRC (Cooperatve Research Centres, Australa). References 1. Schafer, J.B., Konstan, J., Redl, J.: E-commerce Recommendaton Applcatons. Data Mnng and Knowledge Dscovery 5(1-2), 115 153 (2001) 2. Leavtt, N.: Recommendaton Technology: Wll It Boost E-Commerce? Computer Socety 39(5), 13 16 (2006)

66 N. Abdullah et al. 3. Sarwar, B., Karyps, G., Konstan, J., Redl, J.: Analyss of Recommendaton Algorthms for E-commerce. In: 2nd ACM Conference on Electronc Commerce, pp. 158 167. ACM, New York (2000) 4. Acar, S., Zhang, D., Smoff, S., Debenham, J.: Recommender System Based on Consumer Product Revews. In: 2006 IEEE/WIC/ACM Internatonal Conference on Web Intellgence, pp. 719 723. IEEE Computer Socety, Washngton (2006) 5. Hu, M., Lu, B.: Mnng and Summarzng User Revews. In: Tenth ACM SIGKDD Internatonal Conference on Knowledge Dscovery and Data Mnng, pp. 168 177. ACM, New York (2004) 6. Dng, X., Lu, B., Yu, P.S.: A Holstc Lexcon-based Approach to Opnon Mnng. In: Internatonal Conference on Web Search and Web Data Mnng, pp. 231 240. ACM, New York (2008) 7. Zhuang, L., Jng, F., Zhu, X.Y.: Move Revew Mnng and Summarzaton. In: 15th ACM Internatonal Conference on Informaton and Knowledge Management, pp. 43 50. ACM, New York (2006) 8. Acar, S., Zhang, D., Smoff, S., Debenham, J.: Informed Recommender: Basng Recommendatons on Consumer Product Revews. IEEE Intellgent Systems 22(3), 39 47 (2007) 9. Mller, G., Beckwth, R., Fellbaum, C., Gross, D., Mller, K.: Introducton to WordNet: An Onlne Lexcal Database. Internatonal Journal of Lexcography (Specal Issue) 3(4), 235 312 (1990) 10. Pawlak, Z.: Rough Sets and Intellgent Data Analyss. Informaton Scence 147(1-4), 1 12 (2002) 11. Øhrn, A.: ROSETTA Techncal Reference Manual. Department of Computer and Informaton Scence, Norwegan Unversty of Scence and Technology (NTNU), Trondhem, Norway (2000)