Neural Collaborative Filtering
emm我只是总结分享下论文阅读体验,原创转载翻译好像都不太合适,但是转载翻译还需要授权就选了原创,如果侵权会转成私人可见的。
Addressed Problem
This work addresses the aforementioned research problems by formalizing a neural network modelling approach for collaborative filtering. We focus on implicit feedback, which indirectly reflects users’ preference through behaviours like watching videos, purchasing products and clicking items.
- explicit feedback (i.e., ratings and reviews)
- implicit feedback indirectly reflects users’ preference through behaviours like watching videos, purchasing products and clicking items.
implicit feedback can be tracked automatically and is thus much easier to collect for content providers.
Problem Formulation
: number of users
: number of items
:user–item interaction matrix.
Here a value of 1 for indicates that there is an interaction between user and item ; however, it does not mean actually likes . Similarly, a value of 0 does not necessarily mean does not like , it can be that the user is not aware of the item.
Notice: While observed entries at least reflect users’ interest on items, the unobserved entries can be just missing data and there is a natural scarcity of negative feedback.
The recommendation problem with implicit feedback is formulated as the problem of estimating the scores of unobserved entries in Y, which are used for ranking the items.
Goal
Learn
:the predicted score of interaction
:model parameters
: function that maps model parameters to the predicted score (which we term as an interaction function). (NN model)
Related Work
Two types of objective function
-
pointwise loss: natural extension of abundant work on explicit feedback, methods on pointwise learning usually follow a regression framework by minimizing the squared loss between and its target value .
denotes the set of observed interactions in , and denotes the set of negative instances, which can be all (or sampled from) unobserved interactions; and is a hyperparameter denoting the weight of training instance . -
pairwise loss: the idea is that observed entries should be ranked higher than the unobserved ones. As such, instead of minimizing the loss between and , pairwise learning maximizes the margin between observed entry and unobserved entry .
Proposed Loss
In what follows, we present a probabilistic approach for learning the pointwise NCF that pays special attention to the binary property of implicit data.
(Simply maximize log likehood)
Matrix Factorization
MF associates each user and item with a real-valued vector of latent features.
: latent vector for user
: latent vector for item
denotes the dimension of the latent space
Drawback:
MF can be deemed as a linear model of latent factors.
we use the Jaccard coefficient as the groundtruth similarity of two users that MF needs to recover.
Let us first focus on the first three rows (users) in Figure 1a. It is easy to have s23(0.66) > s12(0.5) > s13(0.4). As such, the geometric relations of p1,p2, and p3 in the latent space can be plotted as in Figure 1b. Now, let us consider a new user u4, whose input is given as the dashed line in Figure 1a. We can have s41(0.6) > s43(0.4) > s42(0.2), meaning that u4 is most similar to u1, followed by u3, and lastly u2. However, if a MF model places p4 closest to p1 (the two options are shown in Figure 1b with dashed lines), it will result in p4 closer to p2 than p3, which unfortunately will incur a large ranking loss.
NEURAL COLLABORATIVE FILTERING
- Since this work focuses on the pure collaborative filtering setting, we use only the identity of a user and an item as the input feature, transforming it to a binarized sparse vector with one-hot encoding. Note that with such a generic feature representation for inputs, our method can be easily adjusted to address the cold-start problem by using content features to represent users and items.
- Above the input layer is the embedding layer; it is a fully connected layer that projects the sparse representation to a dense vector.
- The user embedding and item embedding are then fed into a multi-layer neural architecture
Generalized Matrix Factorization(GMF)
GMF(left hand side): same as mf(latent vector comes from FC embedding)
Pre-training
- initialization plays an important role for the convergence and performance of deep learning models.
- we propose to initialize NeuMF using the pretrained models of GMF and MLP.
- We first train GMF and MLP with random initializations until convergence. We then use their model parameters as the initialization for the corresponding parts of NeuMF’s parameters.