Neural Collaborative Filtering

Addressed Problem
Problem Formulation
Goal
Related Work

Two types of objective function
Matrix Factorization

NEURAL COLLABORATIVE FILTERING

Generalized Matrix Factorization(GMF)
Pre-training

emm我只是总结分享下论文阅读体验，原创转载翻译好像都不太合适，但是转载翻译还需要授权就选了原创，如果侵权会转成私人可见的。

Addressed Problem

This work addresses the aforementioned research problems by formalizing a neural network modelling approach for collaborative ﬁltering. We focus on implicit feedback, which indirectly reﬂects users’ preference through behaviours like watching videos, purchasing products and clicking items.

explicit feedback (i.e., ratings and reviews)
implicit feedback indirectly reﬂects users’ preference through behaviours like watching videos, purchasing products and clicking items.

implicit feedback can be tracked automatically and is thus much easier to collect for content providers.

Problem Formulation

$M$ : number of users
$N$ : number of items
$Y \in \mathbb{R}^{M*N}$ :user–item interaction matrix.
Neural Collaborative Filtering
Here a value of 1 for $y_{ui}$ indicates that there is an interaction between user $u$ and item $i$ ; however, it does not mean $u$ actually likes $i$ . Similarly, a value of 0 does not necessarily mean $u$ does not like $i$ , it can be that the user is not aware of the item.

Notice: While observed entries at least reﬂect users’ interest on items, the unobserved entries can be just missing data and there is a natural scarcity of negative feedback.

The recommendation problem with implicit feedback is formulated as the problem of estimating the scores of unobserved entries in Y, which are used for ranking the items.

Goal

Learn $\hat{y}_{ui}=f(u,i|\Theta)$
$\hat{y}_{ui}$ :the predicted score of interaction $y_{ui}$
$\Theta$ :model parameters
$f$ : function that maps model parameters to the predicted score (which we term as an interaction function). (NN model)

Related Work

Two types of objective function

pointwise loss: natural extension of abundant work on explicit feedback, methods on pointwise learning usually follow a regression framework by minimizing the squared loss between $\hat{y}_{ui}$ and its target value $y_{ui}$ .
$L_{sqr}= \sum_{(u,i) \in \mathcal{Y} \cup \mathcal{Y}^-} w_{ui}(\hat{y}_{ui}-y_{ui})^2$
$Y$ denotes the set of observed interactions in $\mathcal{Y}$ , and $\mathcal{Y}^-$ denotes the set of negative instances, which can be all (or sampled from) unobserved interactions; and $w_{ui}$ is a hyperparameter denoting the weight of training instance $(u,i)$ .
pairwise loss: the idea is that observed entries should be ranked higher than the unobserved ones. As such, instead of minimizing the loss between $\hat{y}_{ui}$ and $y_{ui}$ , pairwise learning maximizes the margin between observed entry $\hat{y}_{ui}$ and unobserved entry $\hat{y}_{ui}$ .

Proposed Loss
In what follows, we present a probabilistic approach for learning the pointwise NCF that pays special attention to the binary property of implicit data.
(Simply maximize log likehood)
Neural Collaborative Filtering

Matrix Factorization

MF associates each user and item with a real-valued vector of latent features.
$p_u$ : latent vector for user $u$
$q_i$ : latent vector for item $i$
$K$ denotes the dimension of the latent space
$\hat{y}_{ui} =f(u,i|p_u,q_i)=p_u^Tq_i=\sum_{k=1}^{K}p_{uk}q_{ik}$

Drawback:
MF can be deemed as a linear model of latent factors.
Neural Collaborative Filtering
we use the Jaccard coeﬃcient as the groundtruth similarity of two users that MF needs to recover.

Let us ﬁrst focus on the ﬁrst three rows (users) in Figure 1a. It is easy to have s23(0.66) > s12(0.5) > s13(0.4). As such, the geometric relations of p1,p2, and p3 in the latent space can be plotted as in Figure 1b. Now, let us consider a new user u4, whose input is given as the dashed line in Figure 1a. We can have s41(0.6) > s43(0.4) > s42(0.2), meaning that u4 is most similar to u1, followed by u3, and lastly u2. However, if a MF model places p4 closest to p1 (the two options are shown in Figure 1b with dashed lines), it will result in p4 closer to p2 than p3, which unfortunately will incur a large ranking loss.

NEURAL COLLABORATIVE FILTERING

Neural Collaborative Filtering

Since this work focuses on the pure collaborative ﬁltering setting, we use only the identity of a user and an item as the input feature, transforming it to a binarized sparse vector with one-hot encoding. Note that with such a generic feature representation for inputs, our method can be easily adjusted to address the cold-start problem by using content features to represent users and items.
Above the input layer is the embedding layer; it is a fully connected layer that projects the sparse representation to a dense vector.
The user embedding and item embedding are then fed into a multi-layer neural architecture

Generalized Matrix Factorization(GMF)

Neural Collaborative Filtering
GMF(left hand side): same as mf(latent vector comes from FC embedding)

Pre-training

initialization plays an important role for the convergence and performance of deep learning models.
we propose to initialize NeuMF using the pretrained models of GMF and MLP.
We ﬁrst train GMF and MLP with random initializations until convergence. We then use their model parameters as the initialization for the corresponding parts of NeuMF’s parameters.