IN recent years, recommender systems, which help users discover

Size: px

Start display at page:

Download "IN recent years, recommender systems, which help users discover"

Brenda Welch
5 years ago
Views:

1 Heterogeneous Informaton Network Embeddng for Recommendaton Chuan Sh, Member, IEEE, Bnbn Hu, Wayne Xn Zhao Member, IEEE and Phlp S. Yu, Fellow, IEEE 1 arxv: v1 [cs.si] 29 Nov 2017 Abstract Due to the flexblty n modellng data heterogenety, heterogeneous nformaton network (HIN) has been adopted to characterze complex and heterogeneous auxlary data n recommender systems, called HIN based recommendaton. It s challengng to develop effectve methods for HIN based recommendaton n both extracton and explotaton of the nformaton from HINs. Most of HIN based recommendaton methods rely on path based smlarty, whch cannot fully mne latent structure features of users and tems. In ths paper, we propose a novel heterogeneous network embeddng based approach for HIN based recommendaton, called HERec. To embed HINs, we desgn a meta-path based random walk strategy to generate meanngful node sequences for network embeddng. The learned node embeddngs are frst transformed by a set of fuson functons, and subsequently ntegrated nto an extended matrx factorzaton (MF) model. The extended MF model together wth fuson functons are jontly optmzed for the ratng predcton task. Extensve experments on three real-world datasets demonstrate the effectveness of the HERec model. Moreover, we show the capablty of the HERec model for the cold-start problem, and reveal that the transformed embeddng nformaton from HINs can mprove the recommendaton performance. Index Terms Heterogeneous nformaton network, Network embeddng, Matrx factorzaton, Recommender system. 1 INTRODUCTION IN recent years, recommender systems, whch help users dscover tems of nterest from a large resource collecton, have been playng an ncreasngly mportant role n varous onlne servces [1], [2]. Tradtonal recommendaton methods (e.g., matrx factorzaton) manly am to learn an effectve predcton functon for characterzng user-tem nteracton records (e.g., user-tem ratng matrx). Wth the rapd development of web servces, varous knds of auxlary data (a.k.a., sde nformaton) become avalable n recommender systems. Although auxlary data s lkely to contan useful nformaton for recommendaton [3], t s dffcult to model and utlze these heterogeneous and complex nformaton n recommender systems. Furthermore, t s more challengng to develop a relatvely general approach to model these varyng data n dfferent systems or platforms. As a promsng drecton, heterogeneous nformaton network (HIN), consstng of multple types of nodes and lnks, has been proposed as a powerful nformaton modelng method [4] [6]. Due to ts flexlty n modelng data heterogenety, HIN has been adopted n recommender systems to characterze rch auxlary data. In Fg. 1(a), we present an example for move recommendaton characterzed by HINs. We can see that the HIN contans multple types of enttes connected by dfferent types of relatons. Under the HIN based representaton, the recommendaton problem can be consdered as a smlarty search task over the HIN [4]. Such a recommendaton settng s called as HIN based recommendaton [7]. HIN based recommendaton has receved much attenton n the lterature [7] [12]. The basc dea of most exstng HIN based recommendaton methods s to leverage path based semantc relatedness between users and tems over HINs, e.g., meta-path based smlartes, for recommendaton. Although HIN based methods have acheved performance mprovement to some extent, there are two major problems for these methods usng meta-path based smlartes. Frst, meta-path based smlartes rely on explct path reachablty, and may not be relable to be used for recommendaton when path connectons are sparse or nosy. It s lkely that some lnks n HINs are accdentally formed whch do not convey meanngful semantcs. Second, meta-path based smlartes manly characterze semantc relatons defned over HINs, and may not be drectly applcable to recommender systems. It s lkely that the derved path based smlartes have no explct mpact on the recommendaton performance n some cases. Exstng methods manly learn a lnear weghtng mechansm to combne the path based smlartes [11] or latent factors [7], whch cannot learn the complcated mappng mechansm of HIN nformaton for recommendaton. The two problems essentally reflect two fundamental ssues for HIN based recommendaton, namely effectve nformaton extracton and explotaton based on HINs for recommendaton. For the frst ssue, t s challengng to develop a way to effectvely extract and represent useful nformaton for HINs due to data heterogenety. Unlke prevous studes usng metapath based smlartes [4], [7], our dea s to learn effectve heterogeneous network representatons for summarzng mportant structural characterstcs and propertes of HINs. Followng [13], [14], we characterze nodes from HINs wth low-dmensonal vectors,.e., embeddngs. Instead of relyng on explct path connecton, we would lke to encode useful nformaton from HINs wth latent vectors. Compared wth meta-path based smlarty, the learned embeddngs are n a more compact form that s easy to use and ntegrate. Also, the network embeddng approach tself s more resstant to sparse and nosy data. However, most exstng network embeddng methods focus on homogeneous networks only consstng of a sngle type of nodes and lnks, and cannot drectly deal wth heterogeneous networks consstng of multple types of nodes and lnks. Hence, we propose a new heterogeneous network embeddng method. Consderng heterogeneous characterstcs and rch semantcs reflected by meta-paths, the proposed method frst uses a random walk strategy guded by

2... e (1) e (l)... e (V) + (b 1 ) Meta-path based random walk (b 2 ) Transform and fuse (a) An Example of HIN (b) HIN Embeddng (c) Recomendaton Fg. 1. The schematc llustraton of the proposed HERec approach.

2 2... e (1) e (l)... e (V) + (b 1 ) Meta-path based random walk (b 2 ) Transform and fuse (a) An Example of HIN (b) HIN Embeddng (c) Recomendaton Fg. 1. The schematc llustraton of the proposed HERec approach. meta-paths to generate node sequences. For each meta-path, we learn a unque embeddng representaton for a node by maxmzng ts co-occurrence probablty wth neghborng nodes n the sequences sampled accordng to the gven meta-path. We fuse the multple embeddngs w.r.t. dfferent meta-paths as the output of HIN embeddng. After obtanng the embeddngs from HINs, we study how to ntegrate and utlze such nformaton n recommender systems. We don t assume the learned embeddngs are naturally applcable n recommender systems. Instead, we propose and explore three fuson functons to ntegrate multple embeddngs of a node nto a sngle representaton for recommendaton, ncludng smple lnear fuson, personalzed lnear fuson and non-lnear fuson. These fuson functons provde flexble ways to transform HIN embeddngs nto useful nformaton for recommendaton. Specally, we emphasze that personalzaton and non-lnearty are two key ponts to consder for nformaton transformaton n our settng. Fnally, we extend the classc matrx factorzaton framework by ncorporatng the fused HIN embeddngs. The predcton model and the fuson functon are jontly optmzed for the ratng predcton task. By ntegratng the above two parts together, ths work presents a novel HIN embeddng based recommendaton approach, called HERec for short. HERec frst extracts useful HIN based nformaton usng the proposed HIN embeddng method, and then utlzes the extracted nformaton for recommendaton usng the extended matrx factorzaton model. We present the overall llustraton for the proposed approach n Fg. 1. Extensve experments on three real-world datasets demonstrate the effectveness of the proposed approach. We also verfy the ablty of HERec to allevate coldstart problem and examne the mpact of meta-paths on performance. The key contrbutons of ths paper can be summarzed as follows: We propose a heterogeneous network embeddng method guded by meta-paths to uncover the semantc and structural nformaton of heterogeneous nformaton networks. Moreover, we propose a general embeddng fuson approach to ntegerate dfferent embeddngs based on dfferent meta-paths nto a sngle representaton. We propose a novel heterogeneous nformaton network embeddng for recommendaton model, called HERec for short. HERec can effectvely ntegrate varous knds of embeddng nformaton n HIN to enhance the recommendaton performance. In addton, we desgn a set of three flexble fuson functons to effectvely transform HIN embeddngs nto useful nformaton for recommendaton. Extensve experments on three real-world datasets demonstrate the effectveness of the proposed model. Moreover, we show the capablty of the proposed model for the cold-start predcton problem, and reveal that the transformed embeddng nformaton from HINs can mprove the recommendaton performance. The remander of ths paper s organzed as follows. Secton 2 ntroduces the related works. Secton 3 descrbes notatons used n the paper and presents some prelmnary knowledge. Then, We propose the heterogeneous network embeddng method and the HERec model n Secton 4. Experments and detaled analyss are reported n Secton 5. Fnally, we conclude the paper n Secton 6. 2 RELATED WORK In ths secton, we wll revew the related studes n three aspects, namely recommender systems, heterogeneous nformaton networks and network embeddng. In the lterature of recommender systems, early works manly adopt collaboratve flterng (CF) methods to utlze hstorcal nteractons for recommendaton [3]. Partcularly, the matrx factorzaton approach [15], [16] has shown ts effectveness and effcency n many applcatons, whch factorzes user-tem ratng matrx nto two low rank user-specfc and tem-specfc matrces, and then utlzes the factorzed matrces to make further predctons [2]. Snce CF methods usually suffer from coldstart problem, many works [8], [17], [18] attempt to leverage addtonal nformaton to mprove recommendaton performance. For example, Ma et al. [19] ntegrate socal relatons nto matrx factorzaton n recommendaton. Lng et al. [20] consder the nformaton of both ratngs and revews and propose a unfed model to combne content based flterng wth collaboratve flterng for ratng predcton task. Ye et al. [21] ncorporate user preference, socal nfluence and geographcal nfluence n the recommendaton and propose a unfed POI recommendaton framework. More recently, Sedhan et al. [22] explan drawbacks of three popular cold-start models [23] [25] and further propose a learnng based approach for the cold-start problem to leverage socal data va randomsed SVD. And many works begn to utlze deep models (e.g., convolutonal neural network, auto encoder) to explot text nformaton [26], mage nformaton [27] and network structure nformaton [28] for better recommendaton. In addton, there are also some typcal frameworks focusng on ncorporatng auxlary nformaton for recommendaton. Chen et al. [29] propose a typcal SVDFeature framework to effcently solve the feature based matrx factorzaton. And Rendle [30] proposes factorzaton

3 3 Group Drector Author Complment Category User Move Type User Book Year User Busness Cty Actor Publsher (a) Douban Move (b) Douban Book (c) Yelp Fg. 2. Network schemas of heterogeneous nformaton networks for the used three datasets. In our task, users and tems are our major focus, denoted by large-szed crcles, whle the other attrbutes are denoted by small-szed crcles. machne, whch s a generc approach to combne the generalty of feature engneerng. As a newly emergng drecton, heterogeneous nformaton network [5] can naturally model complex objects and ther rch relatons n recommender systems, n whch objects are of dfferent types and lnks among objects represent dfferent relatons [31]. And several path based smlarty measures [4], [32], [33] are proposed to evaluate the smlarty of objects n heterogeneous nformaton network. Therefore, some researchers have began to be aware of the mportance of HIN based recommendaton. Wang et al. [8] propose the OptRank method to allevate the coldstart problem by utlzng heterogeneous nformaton contaned n socal taggng system. Furthermore, the concept of meta-path s ntroduced nto hybrd recommender systems [34]. Yu et al. [7] utlze meta-path based smlartes as regularzaton terms n the matrx factorzaton framework. Yu et al. [9] take advantage of dfferent types of entty relatonshps n heterogeneous nformaton network and propose a personalzed recommendaton framework for mplct feedback dataset. Luo et al. [35] propose a collaboratve flterng based socal recommendaton method usng heterogeneous relatons. More recently, Sh et al. [10] propose the concept of weghted heterogeneous nformaton network and desgn a meta-path based collaboratve flterng model to flexbly ntegrate heterogeneous nformaton for personalzed recommendaton. In [11], [12], [36], the smlartes of users and tems are both evaluated by path based smlarty measures under dfferent semantc meta-paths and a matrx factorzaton based on dual regularzaton framework s proposed for ratng predcton. Most of HIN based methods rely on the path based smlarty, whch may not fully mne latent features of users and tems on HINs for recommendaton. On the other hand, network embeddng has shown ts potental n structure feature extracton and has been successfully appled n many data mnng tasks [37], [38], such as classfcaton [39], clusterng [40], [41] and recommendaton [42]. Deepwalk [13] combnes random walk and skp-gram to learn network representatons. Furthermore, Grover and Leskovec [14] propose a more flexble network embeddng framework based on a based random walk procedure. In addton, LINE [43] and SDNE [44] characterze the second-order lnk proxmty, as well as neghbor relatons. Cao et al. [45] propose the GraRep model to capture hgher-order graph proxmty for network representatons. Besdes leanng network embeddng from only the topology, there are also many works [46] [48] leveragng node content nformaton and other avalable graph nformaton for the robust representatons. Unfortunately, most of network embeddng methods focus on homogeneous networks, and thus they cannot drectly be appled for heterogeneous networks. Recently, several works [49] [53] attempt to analyze heterogeneous networks va embeddng methods. Partcularly, Chang et al. [49] desgn a deep embeddng model to capture the complex nteracton between the heterogeneous data n the network. Xu et al. [51] propose a EOE method to encode the ntra-network and nter-network edges for the coupled heterogeneous network. Dong et al. [53] defne the neghbor of nodes va meta-path and learn the heterogeneous embeddng by skp-gram wth negatve samplng. Although these methods can learn network embeddngs n varous heterogeneous network, ther representatons of nodes and relatons may not be optmum for recommendaton. To our knowledge, t s the frst attempt whch adopts the network embeddng approach to extract useful nformaton from heterogeneous nformaton network and leverage such nformaton for ratng predcton. The proposed approach utlzes the flexblty of HIN for modelng complex heterogeneous context nformaton, and meanwhle borrows the capablty of network embeddng for learnng effectve nformaton representaton. The fnal ratng predcton component further ncorporates a transformaton mechansm mplemented by three flexble functons to utlze the learned nformaton from network embeddng. 3 PRELIMINARY A heterogeneous nformaton network s a specal knd of nformaton network, whch ether contans multple types of objects or multple types of lnks. Defnton 1. Heterogeneous nformaton network [54]. A HIN s denoted as G = {V, E} consstng of a object set V and a lnk set E. A HIN s also assocated wth an object type mappng functon φ : V A and a lnk type mappng functon ψ : E R. A and R denote the sets of predefned object and lnk types, where A + R > 2. The complexty of heterogeneous nformaton network drves us to provde the meta level (e.g., schema-level) descrpton for understandng the object types and lnk types better n the network. Hence, the concept of network schema s proposed to descrbe the meta structure of a network. Defnton 2. Network schema [31], [55]. The network schema s denoted as S = (A, R). It s a meta template for an nformaton

4 4 network G = {V, E} wth the object type mappng φ : V A and the lnk type mappng ψ : E R, whch s a drected graph defned over object types A, wth edges as relatons from R. Example 1. As shown n Fg. 1(a), we have represented the settng of move recommender systems by HINs. We further present ts correspondng network schema n Fg. 2(a), consstng of multple types of objects, ncludng User (U), Move (M), Drector (D). There exst dfferent types of lnks between objects to represent dfferent relatons. A user-user lnk ndcates the frendshp between two users, whle a user-move lnk ndcates the ratng relaton. Smlarly, we present the schematc network schemas for book and busness recommender systems n Fg. 2(b) and Fg. 2(c) respectvely. In HINs, two objects can be connected va dfferent semantc paths, whch are called meta-paths. Defnton 3. Meta-path [4]. A meta-path ρ s defned on a network schema S = (A, R) and s denoted as a path n R the form of A 1 R 1 2 R A2 l Al+1 (abbrevated as A 1 A 2 A l+1 ), whch descrbes a composte relaton R = R 1 R 2 R l between object A 1 and A l+1, where denotes the composton operator on relatons. Example 2. Takng Fg. 2(a) as an example, two objects can be connected va multple meta-paths, e.g., User - User (UU) and User - Move - User (UMU). Dfferent metapaths usually convey dfferent semantcs. For example, the U U path ndcates frendshp between two users, whle the UMU path ndcates the co-watch relaton between two users,.e., they have watched the same moves. As wll be seen later, the detaled meta-paths used n ths work s summarzed n Table 3. Recently, HIN has become a manstream approach to model varous complex nteracton systems [5]. Specally, t has been adopted n recommender systems for characterzng complex and heterogenous recommendaton settngs. Defnton 4. HIN based recommendaton. In a recommender system, varous knds of nformaton can be modeled by a HIN G = {V, E}. On recommendaton-orented HINs, two knds of enttes (.e., users and tems) together wth the relatons between them (.e., ratng relaton) are our focus. Let U V and I V denote the sets of users and tems respectvely, a trplet u,, r u, denotes a record that a user u assgns a ratng of r u, to an tem, and R = { u,, r u, } denotes the set of ratng records. We have U V, I V and R E. Gven the HIN G = {V, E}, the goal s to predct the ratng score r u, of u U to a non-rated tem I. Several efforts have been made for HIN based recommendaton. Most of these works manly leverage meta-path based smlartes to enhance the recommendaton performance [7], [9] [11]. Next, we wll present a new heterogeneous network embeddng based approach to ths task, whch s able to effectvely explot the nformaton reflected n HINs. The notatons we wll use throughout the artcle are summarzed n Table 1. Notaton TABLE 1 Notatons and explanatons. Explanaton G heterogeneous nformaton network V object set E lnk set S network schema A object type set R lnk type set U user set I tem set r u, predcted ratng user u gves to tem e v low-dmensonal representaton of node v N u neghborhood of node u ρ a meta-path P meta-path set, e (I) fnal representatons of user u, tem d dmenson of HIN embeddngs D dmenson of latent factors x u, y latent factors of user u, tem, γ (I) latent factors for parng HIN embeddng of user u, tem α, β parameters for ntegratng HIN embeddngs M (l) transformaton matrx w.r.t the l-th mete-path b (l) bas vector w.r.t the l-th mete-path e (U) u γ (U) u w (l) u Θ (U), Θ (I) λ η preference weght of user u over the l-th meta-path parameters of fuson functons for users, tems regularzaton parameter learnng rate 4 THE PROPOSED APPROACH In ths secton, we present a Heterogeneous network Embeddng based approach for Recemendaton, called HERec. For addressng the two ssues ntroduced n Secton 1, the proposed HERec approach conssts of two major components. Frst, we propose a new heterogeneous network embeddng method to learn the user/tem embeddngs from HINs. Then, we extend the classc matrx factorzaton framework by ncorporatng the learned embeddngs usng a flexble set of fuson functons. We present an overall schematc llustraton of the proposed approach n Fg. 1. After the constructon of the HINs (Fg. 1(a)), two major steps are presented, namely HIN embeddng (Fg. 1(b)) and recommendaton (Fg. 1(c)). Next, we wll present detaled llustraton of the proposed approach. 4.1 Heterogeneous Network Embeddng Inspred by the recent progress on network embeddng [13], [14], we adopt the representaton learnng method to extract and represent useful nformaton of HINs for recommendaton. Gven a HIN G = {V, E}, our goal s to learn a low-dmensonal representaton e v R d (a.k.a., embeddng) for each node v V. The learned embeddngs are expected to hghly summarze nformatve characterstcs, whch are lkely to be useful n recommender systems represented on HINs. Compared wth meta-path based smlarty [4], [10], t s much easer to use and ntegrate the learned representatons n subsequent procedures. However, most of the exstng network embeddng methods manly focus on homogeneous networks, whch are not able to effectvely model heterogeneous networks. For nstance, the poneerng study deepwalk [13] uses random walk to generate node sequences, whch cannot dscrmnate nodes and edges wth dfferent types. Hence, t requres a more prncpled way to traverse the HINs and generate meanngful node sequences.

5 5 User Move Drector u 1 u 2 u 3 u 4 u 5 m 1 m 2 m 3 d 1 d 2 d 3 UMU UMDMU MDM u 1 m 1 u 2 m 2 u 3 m 1 u 1 d 2 m 2 u 3... m 1 d 2 m 2 d 3 m m 2 u 4 u 1 u 2 u 3 u 4... Flterng u 1 u 3... m 1 m 2 m 3 Fg. 3. An llustratve example of the proposed meta-path based random walk. We frst perform random walks guded by some selected metapaths, and then flter the node sequences not wth the user type or tem type Meta-path based Random Walk To generate meanngful node sequences, the key s to desgn an effectve walkng strategy that s able to capture the complex semantcs reflected n HINs. In the lterature of HINs, meta-path s an mportant concept to characterze the semantc patterns for HINs [4]. Hence, we propose to use the meta-path based random walk method to generate node sequences. Gvng a heterogeneous R network G = {V, E} and a meta-path ρ : A 1 R 1 t At R A t+1 l Al+1, the walk path s generated accordng to the followng dstrbuton: = P (n t+1 = x n t = v, ρ) (1) { 1, (v, x) E and φ(x) = A N A t+1 (v) t+1; 0, otherwse, where n t s the t-th node n the walk, v has the type of A t, and N (At+1) (v) s the frst-order neghbor set for node v wth the type of A t+1. A walk wll follow the pattern of a meta-path repettvely untl t reaches the pre-defned length. Example 3. We stll take Fg. 1(a) as an example, whch represents the heterogeneous nformaton network of move recommender systems. Gven a meta-path UMU, we can generate two sample walks (.e., node sequences) by startng wth the user node of Tom: (1) Tom User The Termnator Move Mary User, and (2) Tom User Avater Move Bob User The Termnator Move Mary User. Smlarly, gven the metapath UMDMU, we can also generate another node sequence: Tom User The Termnator Move Cameron Drector Avater Move Mary User. It s ntutve to see that these meta-paths can lead to meanngful node sequences correspondng to dfferent semantc relatons Type Constrant and Flterng Snce our goal s to mprove the recommendaton performance, the man focus s to learn effectve representatons for users and tems, whle objects wth other types are of less nterest n our task. Hence, we only select meta-paths startng wth user type or tem type. Once a node sequence has been generated usng the above method, t s lkely to contan nodes wth dfferent types. We further remove the nodes wth a type dfferent from the startng type. In ths way, the fnal sequence wll only consst of nodes wth the startng type. Applyng type flterng to node sequences has two benefts. Frst, although node sequences are... constructed usng meta-paths wth heterogeneous types, the fnal representatons are learned usng homogeneous neghborhood. We embed the nodes wth the same type n the same space, whch relaxes the challengng goal of representng all the heterogeneous objects n a unfed space. Second, gven a fxed-length wndow, a node s able to utlze more homogeneous neghbors that are more lkely to be relevant than others wth dfferent types. Example 4. As shown n Fg. 3, n order to learn effectve representatons for users and tems, we only consder the meta-paths n whch the startng type s user type or tem type. In ths way, we can derve some meta-paths, such as UMU, UMDMU and MUM. Take the meta-path of UMU as an nstance. We can generate a sampled sequence u 1 m 1 u 2 m 2 u 3 m 2 u 4 accordng to Eq. 1. Once a sequence has been constructed, we further remove the nodes wth a dfferent type compared wth the startng node. In ths way, we fnally obtan a homogeneous node sequence u 1 u 2 u 3 u 4. The connectons between homogeneous nodes are essentally constructed va the heterogeneous neghborhood nodes. After ths step, our next focus wll be how to learn effectve representatons for homogeneous sequences Optmzaton Objectve Gven a meta-path, we can construct the neghborhood N u for node u based on co-occurrence n a fxed-length wndow. Followng node2vec [14], we can learn the representatons of nodes to optmze the followng objectve: max f log P r(n u f(u)), (2) u V where f : V R d s a functon (amng to learn) mappng each node to d-dmensonal feature space, and N u V represents the neghborhood of node u, w.r.t. a specfc meta-path. We can learn the embeddng mappng functon f( ) by applyng stochastc gradent descent (SGD) to optmze ths objectve. A major dfference between prevous methods and ours les n the constructon of N u. Our method selects homogeneous neghbors usng meta-path based random walks. The whole algorthm framework s shown n Algorthm Embeddng Fuson Heterogeneous network embeddng provdes a general way to extract useful nformaton from HINs. For our model, gven a node v V, we can obtan a set of representatons {e (l) v } P l=1, where P denotes the set of meta-paths, and e (l) v denotes the representaton of v w.r.t. the l-th meta-path. It requres a prncpled fuson way to transform node embeddngs nto a more sutable form that s useful to mprove recommendaton performance. Exstng studes usually adopt a lnear weghtng mechansm to combne the nformaton mned from HINs (e.g., meta-path based smlartes), whch may not be capable of dervng effectve nformaton representatons for recommendaton. Hence, we propose to use a general functon g( ), whch ams to fuse the learned node embeddngs for users and tems: e (U) u g({e (l) u }), (3) e (I) g({e (l) }),

6 Algorthm 1 HIN embeddng algorthm for a sngle meta-path. Input: the heterogeneous nformaton network G = {V, E}; the gven meta-path ρ; the target node type A t ; the dmenson of embeddng d; the walk length wl; the neghborhood sze ns; the number of walks per node r. Output: The embeddng of target node type w.r.t the sngle metapath, denoted by e 1: Intalze e by standard normal dstrbuton; 2: paths = []; 3: for each v V and φ(v) == A t do 4: for = 1 to r do 5: path = []; 6: whle wl > 0 do 7: walk to node x accordng to Eq. 1; 8: f φ(x) == A t then 9: append node x nto path; 10: wl wl 1; 11: end f 12: end whle 13: Add path to paths; 14: end for 15: end for 16: e = SGD(paths, d, ns); 17: return e. where e (U) u and e (I) are the fnal representatons for a user u and an tem respectvely, called HIN embeddngs. Snce users and tems are our focus, we only learn the embeddngs for users and tems. At the current stage, we do not specfy the form of functon g( ). Instead, we beleve that a good fuson functon should be learned accordng to the specfc task. Hence, we leave the formulaton and optmzaton of the fuson functon n our recommendaton model. 4.2 Integratng Matrx Factorzaton wth Fused HIN Embeddng for Recommendaton Prevously, we have studed how to extract and represent useful nformaton from HINs for recommendaton. Wth HIN embeddng, we can obtan user embeddngs {e (U) u } u U and tem embeddngs {e (I) } I, whch are further specfed by a functon g( ) that s to learn. Now we study how to utlze the learned embeddngs for recommendaton Ratng Predctor We buld our ratng predctor based on the classc matrx factorzaton (MF) model [56], whch factorzes the user-tem ratng matrx nto user-specfc and tem-specfc matrces. In MF, the ratng of a user u on an tem s smply defned as follows: r u, = x u y, (4) where x u R D and y R D denote the latent factors correspondng to user u and tem. Snce we have also obtaned the representatons for user u and tem, we further ncorporate them nto the ratng predctor as below r u, = x u y + α e (U) (I) u γ + β γ (U) (I) u e, (5) where e (U) u and e (I) are the fused embeddngs, γ (U) u and γ (I) are user-specfc and tem-specfc latent factors to par wth the HIN embeddngs e (U) u and e (I) respectvely, and α and β are the tunng parameters to ntegrate the three terms. For Eq. 5, we need to note two ponts. Frst, e (U) u and e (I) are the output of functon g( ) n Eq. 3. We assume that the derved embeddngs after transformaton by functon g( ) are applcable n MF. Second, we don t drectly par e (U) u wth e (I). Recall the proposed embeddng method ndeed characterzes the relatedness between objects wth the same type. We ncorporate new latent factors γ (U) u and γ (I) to relax the assumpton that e (U) u and e (I) have to be n the same space, whch ncreases the flexlty to the predcton model Settng the Fuson Functon Prevously, we assume the functon g( ) has been gven n a general form. Now, we study how to set the fuson functon, whch transforms HIN embeddngs nto a form that s useful n recommender systems. We only dscuss the functon for fusng user embeddngs, and t s smlar to fuse tem embeddngs. We propose to use three fuson functons to ntegrate embeddngs: Smple lnear fuson. We assume each user has the same preference for each meta-path, and therefore, we assgn each meta-path wth a unfed weght (.e., average value) for each user. Moreover, we lnearly transform embeddngs to target space. g({e (l) u }) = 1 P P l=1 6 (M (l) e (l) u + b (l) ), (6) where P s the set of consdered meta-paths, M (l) R D d and b (l) R D are the transformaton matrx and bas vector w.r.t. the l-th meta-path. Personalzed lnear fuson. The smple lnear fuson cannot model users personalzed preference over the meta-paths. So we further assgn each user wth a weght vector on meta-paths, representng user s personalzed preference for each meta-path. It s more reasonable that each user has hs/her personal nterest preference n many real applcatons. g({e (l) u }) = P l=1 w (l) u (M (l) e (l) u + b (l) ), (7) where w u (l) s the preference weght of user u over the l-th meta-path. Personalzed non-lnear fuson. Lnear fuson has lmted expressve power n modelng complex data relatons. Hence, we use non-lnear functon to enhance the fuson ablty. ( P g({e (l) u }) = σ w u (l) σ ( M (l) e (l) u + b (l))), (8) l=1 where σ( ) s a non-lnear functon,.e., sgmod functon n our work. Although we only use two non-lnear transformatons, t s flexble to extend to multple non-lnear layers, e.g., Mult-Layer Perceptrons Model Learnng We blend the fuson functon nto matrx factorzaton framework for learnng the parameters of the proposed model. The objectve can be formulated as follows:

7 = u,,r u, R (r u, r u, ) 2 + λ u ( x u 2 + y 2 + γ u (U) 2 + γ (I) 2 + Θ (U) 2 + Θ (I) 2 ), (9) where r u, s the predcted ratng usng Eq. 5 by the proposed model, λ s the regularzaton parameter, and Θ (U) and Θ (I) are the parameters of the functon g( ) for users and tems respectvely. We adopt SGD to effcently optmze the fnal objectve. The update of orgnal latent factors {x u } and {y } are the same as that of standard MF n Eq. 4. The parameters of the proposed model wll be updated as follows: Θ (U) u,l Θ (U) u,l η ( α(r u, r u,)γ (I) e (U) u Θ (U) u,l + λ ΘΘ (U) u,l ), (10) γ u (U) γ u (U) η ( β(r u, r u,)e (I) + λ γγ u (U) ), (11) Θ (I),l Θ (I),l η ( β(ru, ru,)γ(u) u e (I) Θ (I),l + λ ΘΘ (I),l ), (12) γ (I) γ (I) η ( α(r u, r u,)e (U) u + λ γγ (I) ), (13) where η s the learnng rate, λ Θ s the regularzaton for parameters Θ (U) and Θ (I), and λ γ s the regularzaton for parameters γ (U) and γ (I). In our work, we utlze the sgmod functon for non-lnear transformaton, and we can take advantage of the propertes of sgmod functon for ease of dervatve calculaton. It s worth notng that the symbol Θ denotes all the parameters n the e fuson functon, and the calculaton of Θ,l wll be dfferent for dfferent parameters n Θ. Next, we present the detaled dervaton of for personalzed non-lnear fuson functon e Θ,l e = (14) Θ,l w (l) σ(z s )σ(z f )(1 σ(z s ))(1 σ(z f ))e (l), Θ = M; w (l) σ(z s )σ(z f )(1 σ(z s ))(1 σ(z f )), Θ = b; σ(z s )σ(z f )(1 σ(z s )), Θ = w, where Z s = P l=1 w(l) σ ( M (l) e (l) + b (l)) and Z f = M (l) e (l) + b (l) e. The dervatons of Θ,l can be calculated n the above way for both users and tems. We omt the dervatons for lnear fuson functons, snce t s relatvely straghtforward. The whole algorthm framework s shown n Algorthm 2. In lnes 1-6, we perform HIN embeddng to obtan the representatons of users and tems. And n lnes 8-19, we adopt the SGD algorthm to optmze the parameters n the fuson functon and ratng functon Complexty Analyss HERec contans two major parts: (1) HIN embeddng. The complexty of deepwalk s O(d V ), where d s the embeddng dmenson and V s the number of nodes n the network. Therefore t takes O(d U ) and O(d I ) to learn users and tems embeddngs accordng to a sngle meta-path, respectvely. And the total complexty of HIN embeddng s O( P d ( U + I )) Algorthm 2 The overall learnng algorthm of HERec. Input: the ratng matrx R; the learnng rate η; the adjustable parameters α, β; the regularzaton parameter λ; the metapath sets for users and tems, P (U) and P (I). Output: the latent factors for users and tems, x and y; the latent factors to par HIN embeddng of users and tems, γ (U) and γ (I) ; the parameters of the fuson functon for users and tems, Θ (U) and Θ (I) 1: for l = 1 to P (U) do 2: Obtan users embeddngs {e (l) u } based on meta-path P (U) l accordng to Algorthm 1; 3: end for 4: for l = 1 to P (I) do 5: Obtan tems embeddngs {e (l) } based on the meta-path set P (I) l accordng to Algorthm 1; 6: end for 7: Intalze x, y, γ (U), γ (I), Θ (U), Θ (I) by standard normal dstrbuton; 8: whle not convergence do 9: Randomly select a trple u,, r u, R; 10: Update x u, y by typcal MF; 11: for l = 1 to P (U) do 12: Calculate e(u) u Θ (U) u,l by Eq. 14; 13: Update Θ (U) u,l by Eq. 10; 14: end for 15: Update γ u (U) by Eq. 11; 16: for l = 1 to P (I) do 17: Calculate e(i) by Eq. 14; Θ (I),l 18: Update Θ (I),l by Eq. 12; 19: end for 20: Update γ (I) by Eq. 13; 21: end whle 22: return x, y, γ (U), γ (I), Θ (U), Θ (I). snce the number of selected meta-paths s P. It s worth notng that HIN embeddng can be easly traned n parallel and we wll mplement t usng a mult-thread mode n order to mprove the effcency of model. (2) Matrx factorzaton. For each trplet u,, r u,, updatng x u, y, γ u (U), γ (I) takes O(D) tme, where D s the number of latent factors. And updatng Θ u (U), Θ (I) takes O( P D d) tme to learn the transformaton matrces M for all meta-paths. In the proposed approach, P s generally small, and d and D are at most several hundreds, whch makes the proposed method effcent n large datasets. Specally, SGD has very good practce performance, and we have found that t has a fast convergence rate on our datasets. 5 EXPERIMENTS In ths secton, we wll demonstrate the effectveness of HERec by performng experments on three real datasets compared to the state-of-the-art recommendaton methods. 5.1 Evaluaton Datasets We adopt three wdely used datasets from dfferent domans, consstng of Douban Move dataset 1 from move doman, Douban 1 7

8 8 TABLE 2 Statstcs of the three datasets. Dataset Relatons Number Number Number Ave. degrees Ave. degrees (Densty) (A-B) of A of B of (A-B) of A of B Douban Move (0.63%) Douban Book (0.27%) Yelp (0.08%) User-Move 13,367 12,677 1,068, User-User 2,440 2,294 4, User-Group 13,337 2, , Move-Drector 10,179 2,449 11, Move-Actor 11,718 6,311 33, Move-Type 12, , User-Book 13,024 22, , User-User 12,748 12, , Book-Author 21,907 10,805 21, Book-Publsher 21,773 1,815 21, Book-Year 21, , User-Busness 16,239 14, , User-User 10,580 10, , User-Complment 14, , Busness-Cty 14, , Busness-Category 14, , TABLE 3 The selected meta-paths for three datasets n our work. Dataset Douban Move Douban Book Yelp Meta-paths UMU, UMDMU, UMAMU, UMTMU MUM, MAM, MDM, MTM UBU, UBABU, UBPBU, UBYBU BUB, BPB, BYB UBU, UBCBU, UBCaBU BUB, BCB, BCaB Book dataset 2 from book doman, and Yelp dataset 3 from busness doman. Douban Move dataset ncludes 13,367 users and 12,677 moves wth 1,068,278 move ratngs rangng from 1 to 5. Moreover, the dataset also ncludes socal relatons and attrbute nformaton of users and moves. As for the Douban Book dataset, t ncludes 13,024 users and 22,347 books wth 792,026 ratngs rangng from 1 to 5. Yelp dataset records user ratngs on local busnesses and contans socal relatons and attrbute nformaton of busnesses, whch ncludes 16,239 users and 14,282 local busnesses wth 198,397 ratngs ragng from 1 to 5. The detaled descrptons of the three datasets are shown n Table 2, and the network schemas of the three datasets are plotted n Fg. 2. Besdes the doman varaton, these three datasets also have dfferent ratng sparsty degrees: the Yelp dataset s very sparse, whle the Douban Move dataset s much denser. 5.2 Metrcs We use the wdely used mean absolute error () and root mean square error () to measure the qualty of recommendaton of dfferent models. The metrcs and are defned as follows: = 1 r,j r,j, D test (,j) D test (15) = 1 (r,j r,j ) D test, (,j) D test (16) where r,j s the actual ratng that user assgns to tem j, r,j s the predcted ratng from a model, and D test denotes the test set of ratng records. From the defnton, we can see that a smaller or value ndcates a better performance. 5.3 Methods to Compare We consder the followng methods to compare: PMF [56]: It s the classc probablstc matrx factorzaton model by explctly factorzng the ratng matrx nto two lowdmensonal matrces. SoMF [19]: It ncorporates socal network nformaton nto the recommendaton model. The socal relatons are characterzed by a socal regularzaton term, and ntegrated nto the basc matrx factorzaton model. FM HIN : It s the context-aware factorzaton machne [57], whch s able to utlze varous knds of auxlary nformaton. In our experments, we extract heterogeneous nformaton as context features and ncorporate them nto the factorzaton machne for ratng predcton. We mplement the model wth the code from the authors 4. HeteMF [7]: It s a MF based recommendaton method, whch utlzes entty smlartes calculated over HINs usng metapath based algorthms. SemRec [10]: It s a collaboratve flterng method on weghted heterogeneous nformaton network, whch s constructed by connectng users and tems wth the same ratngs. It flexbly ntegrates heterogeneous nformaton for recommendaton by weghted meta-paths and a weght ensemble method. We mplement the model wth the code from the authors 5. DSR [12]: It s a MF based recommendaton method wth dual smlarty regularzaton, whch mposes the constrant on users and tems wth hgh and low smlartes smultaneously. HERec dw : It s a varant of HERec, whch adopts the homogeneous network embeddng method deepwalk [13] and gnores the heterogenety of nodes n HINs. We use the code of deepwalk provded by the authors

9 TABLE 4 Results of effectveness experments on three datasets. A smaller or value ndcates a better performance. For ease of readng the results, we also report the mprovement w.r.t. the PMF model for each other method. A larger mprovement rato ndcates a better performance. Dataset Tranng Metrcs PMF SoMF FM HIN HeteMF SemRec DSR HERec dw HERec mp HERec Douban Move Douban Book Yelp 80% 60% 80% 60% 90% 80% 70% 60% Improve -1.32% +8% -0.16% +0.80% +1.04% +0.66% +3.93% +3.86% Improve -0.07% +5.55% +1.53% +3.58% +5.85% +2.97% % Improve -2.11% +1.67% -0.46% +2.19% +0.61% +0.49% +4.36% +4.77% Improve -5% +0.62% +1.34% +4.30% +6.12% +2.80% +7.94% +9.41% Improve -4.11% % +2.18% -1.51% +0.08% +5.44% +6.23% Improve -1.89% +9.10% % +5.66% +3.17% +11% % Improve +3.69% % +4.84% % +9.14% +7.56% % % Improve -4.36% % +0.88% +8.91% % +3.86% % % Improve +0.31% +1.00% +0.59% +1.71% +0.59% -1.75% +3.17% +4.71% Improve +1.55% +2.94% +7% +1.81% +2.84% -0.44% +4.53% +8.17% Improve +2.67% +4.17% +3.99% +3.83% +4% -2.28% +6.58% +7.67% Improve +4.93% +7.45% +5.59% +5.10% +4.50% +0.04% +7.46% +9.93% Improve % % % -2.59% % % Improve % % % % % +1.97% % % Improve % % % % % +1.72% % % Improve % % % % % +8.39% % % Improve +3.04% % +8.88% % % +0.23% % % Improve +6.14% % % % % +4.81% +24% % Improve +3.87% % % % % +0.38% % % Improve +6.98% % % % +5.00% % % Improve +4.26% % +10% % % -0.23% % % Improve +7.71% % % % % +4.91% % % Improve +5.46% % % % % +4% % Improve +8.78% % % % % +6.09% % % 9 HERec mp : It can be vew as a varant of HERec, whch combnes the heterogeneous network embeddng of metapath2vec++ [53] and stll preserves the superorty of HINs. We use the code of metapath2vec++ provded by the authors 7. HERec: It s the proposed recommendaton model based on heterogeneous nformaton network embeddng. In the effectveness experments, we utlze the personalzed non-lnear fuson functon, whch s formulated as Eq. 8. And the performance of dfferent fuson functon wll be reported n the later secton. The selected baselnes have a comprehensve coverage of exstng ratng predcton methods. PMF and SoMF are classc MF based ratng predcton methods; FM HIN, HeteMF, SemRec and DSR are HIN based recommendaton methods usng metapath based smlartes, whch represent state-of-the-art HIN based methods. The proposed approach contans an HIN embeddng 7 component, whch can be replaced by other network embeddng methods. Hence, two varants HERec dw and HERec mp are adopted as baselnes. Among these methods, HIN based methods need to specfy the used meta-paths. We present the used meta-paths n Table 3. Followng [4], we only select short meta-paths of at most four steps, snce long meta-paths are lkely to ntroduce nosy semantcs. We set parameters for HERec as follows: the embeddng dmenson number d = 64 and the tunng coeffcents α = β = 1.0. Followng [13], the path length for random walk s set to 40. For HERec dw and HERec mp, we set the embeddng dmenson number d = 64 for farness. Followng [11], [12], we set the number of latent factors for all the MF based methods to 10. Other baselne parameters ether adopt the orgnal optmal settng or are optmzed by usng a valdaton set of 10% tranng data.

10 Effectveness Experments For each dataset, we splt the entre ratng records nto a tranng set and a test set. For Double Move and Book datasets, we set four tranng ratos as n {80%, 60%,, }; whle for Yelp dataset, followng [10], we set four larger tranng ratos as n {90%, 80%, 70%, 60%}, snce t s too sparse. For each rato, we randomly generate ten evaluaton sets. We average the results as the fnal performance. The results are shown n Table 4. The major fndngs from the expermental results are summarzed as follows: (1) Among these baselnes, HIN based methods (HeteMF, SemRec, FM HIN and DSR) perform better than tradtonal MF based methods (PMF and SoMF), whch ndcates the usefulness of the heterogeneous nformaton. It s worthwhle to note that the FM HIN model works very well among the three HIN based baselnes. An ntutve explanaton s that n our datasets, most of the orgnal features are the attrbute nformaton of users or tems, whch are lkely to contan useful evdence to mprove the recommendaton performance. (2) The proposed HERec method (correspondng to the last column) s consstently better than the baselnes, rangng from PMF to DSR. Compared wth other HIN based methods, HERec adopts a more prncpled way to leverage HINs for mprovng recommender systems, whch provdes better nformaton extracton (a new HIN embeddng model) and utlzaton (an extended MF model). Moreover, the superorty of the proposed HERec becomes more sgnfcant wth less tranng data. In partcular, the mprovement rato of HERec over the PMF s up to wth tranng data on the Douban Book dataset, whch ndcates a sgnfcant performance boost. As mentoned prevously, the Yelp dataset s very sparse. In ths case, even wth 60% tranng data, the HERec model mproves over PMF by about 26%. As a comparson, wth the same tranng rato (.e., 60%), the mprovement ratos over PMF are about 29% n terms of scores. These results ndcate the effectveness of the proposed approach, especally n a sparse dataset. (3) Consderng the two HERec varants HERec dw and HERec mp, t s easy to see HERec dw performs much worse than HERec mp. The major dfference les n the network embeddng component. HERec mp adopts the recently proposed HIN embeddng method metapath2vec [53], whle HERec dw gnores data heterogenety and casts HINs as homogeneous networks for learnng. It ndcates HIN embeddng methods are mportant to HIN based recommendaton. The major dfference between the proposed embeddng method and metapath2vec (adopted by HERec mp ) les n that we try to transform heterogeneous network nformaton nto homogeneous neghborhood, whle metapath2vec tres to map all types of enttes nto the same representaton space usng heterogeneous sequences. Indeed, metapath2vec s a general-purpose HIN embeddng method, whle the proposed embeddng method ams to mprove the recommendaton performance. Our focus s to learn the embeddngs for users and tems, whle objects of other types are only used as the brdge to construct the homogeneous neghborhood. Based on the results (HERec > HERec mp ), we argue that t s more effectve to perform task-specfc HIN embeddng for mprovng the recommendaton performance. 5.5 Detaled Analyss of The Proposed Approach In ths part, we perform a seres of detaled analyss for the proposed approach Selecton of Dfferent Fuson Functons HERec requres a prncpled fuson way to transform node embeddngs nto a more sutable form that s useful to enhance recommendaton performance. Therefore, we wll dscuss the mpact of dfferent fuson functons on the recommendaton performance. For convenence, we call the HERec varant wth the smple lnear fuson functon (Eq. 6) as HERec sl, the varant wth personalzed lnear fuson functon (Eq. 7) as HERec pl, and the varant wth personalzed non-lnear fuson functon (Eq. 8) as HERec pnl. We present the performance comparson of the three varants of HERec on the three datasets n Fg. 4. As shown n Fg. 4, we can fnd that overall performance rankng s as follows: HERec pnl > HERec pl > HERec sl. Among the three varants, the smple lnear fuson functon performs the worst, as t gnores the personalzaton and non-lnearty. Indeed, users are lkely to have varyng preferences over meta-paths [10], whch should be consdered n meta-path based methods. The personalzaton factor mproves the performance sgnfcantly. As a comparson, the performance mprovement of HERec pnl over HERec pl s relatvely small. A possble reason s that the ncorporaton of personalzed combnaton parameters ncreases the capablty of lnear models. Nonetheless, HERec pnl stll performs the best by consderng personalzaton and non-lnear transformaton. In a complcated recommender settng, HIN embeddngs may not be drectly applcable n recommender systems, where a non-lnear mappng functon s preferred. Snce HERec pnl s the best varant of the proposed model, n what follows, HERec wll use the personalzed non-lnear fuson functon,.e., HERec pnl by default Cold-start Predcton HINs are partcularly useful to mprove cold-start predcton, where there are fewer ratng records but heterogeneous context nformaton s avalable. We consder studyng the performance w.r.t. dfferent cold-start degrees,.e., the ratng sparsty. To test t, we frst categorze cold users nto three groups accordng to the numbers of ther ratng records,.e., (0, 5], (5, 15] and (15, 30]. It s easy to see that the case for the frst group s the most dffcult, snce users from ths group have fewest ratng records. Here, we only select the baselnes whch use HIN based nformaton for recommendaton, ncludng SoMF, HeteMF, SemRec, DSR and FM HIN. We present the performance comparson of dfferent methods n Fg. 5. For convenence, we report the mprovement ratos w.r.t. PMF. Overall, all the comparson methods are better than PMF (.e., a postve y-axs value). The proposed method performs the best among all the methods, and the mprovement over PMF becomes more sgnfcant for users wth fewer ratng records. The results ndcate that HIN based nformaton s effectve to mprove the recommendaton performance, and the proposed HERec method can effectvely utlze HIN nformaton n a more prncpled way Impact of Dfferent Meta-Paths As shown n Table 3, the proposed approach uses a selected set of meta-paths. To further analyze the mpact of dfferent meta-paths, we gradually ncorporate these meta-paths nto the proposed approach and check the performance change. In Fg. 6, we can observe that generally the performance mproves (.e., becomng smaller) wth the ncorporaton of more meta-paths. Both meta-paths startng wth the user type and tem type are

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department