Matrix Factorization-based algortihms¶
-
class
surprise.prediction_algorithms.matrix_factorization.
SVD
¶ Bases:
surprise.prediction_algorithms.algo_base.AlgoBase
The famous SVD algorithm, as popularized by Simon Funk during the Netflix Prize. When baselines are not used, this is equivalent to Probabilistic Matrix Factorization [SM08] (see note below).
The prediction \(\hat{r}_{ui}\) is set as:
\[\hat{r}_{ui} = \mu + b_u + b_i + q_i^Tp_u\]If user \(u\) is unknown, then the bias \(b_u\) and the factors \(p_u\) are assumed to be zero. The same applies for item \(i\) with \(b_i\) and \(q_i\).
For details, see eq. (5) from [KBV09]. See also [RRSK10], section 5.3.1.
To estimate all the unkown, we minimize the following regularized squared error:
\[\sum_{r_{ui} \in R_{train}} \left(r_{ui} - \hat{r}_{ui} \right)^2 + \lambda\left(b_i^2 + b_u^2 + ||q_i||^2 + ||p_u||^2\right)\]The minimization is performed by a very straightforward stochastic gradient descent:
\[\begin{split}b_u &\leftarrow b_u &+ \gamma (e_{ui} - \lambda b_u)\\ b_i &\leftarrow b_i &+ \gamma (e_{ui} - \lambda b_i)\\ p_u &\leftarrow p_u &+ \gamma (e_{ui} q_i - \lambda p_u)\\ q_i &\leftarrow q_i &+ \gamma (e_{ui} p_u - \lambda q_i)\end{split}\]where \(e_{ui} = r_{ui} - \hat{r}_{ui}\). These steps are performed over all the ratings of the trainset and repeated
n_epochs
times. Baselines are initialized to0
. User and item factors are initialized to0.1
, as recommended by Funk.You have control over the learning rate \(\gamma\) and the regularization term \(\lambda\). Both can be different for each kind of parameter (see below). By default, learning rates are set to
0.005
and regularization termes are set to0.02
.Note
You can choose to use an unbiased version of this algorithm, simply predicting:
\[\hat{r}_{ui} = q_i^Tp_u\]This is equivalent to Probabilistic Matrix Factorization ([SM08], section 2) and can be achieved by setting the
biased
parameter toFalse
.Parameters: - n_factors – The number of factors. Default is
100
. - n_epochs – The number of iteration of the SGD procedure. Default is
20
. - biased (bool) – Whether to use baselines (or biases). See note above. Default is
True
. - lr_all – The learning rate for all parameters. Default is
0.005
. - reg_all – The regularization term for all parameters. Default is
0.02
. - lr_bu – The learning rate for \(b_u\). Takes precedence over
lr_all
if set. Default isNone
. - lr_bi – The learning rate for \(b_i\). Takes precedence over
lr_all
if set. Default isNone
. - lr_pu – The learning rate for \(p_u\). Takes precedence over
lr_all
if set. Default isNone
. - lr_qi – The learning rate for \(q_i\). Takes precedence over
lr_all
if set. Default isNone
. - reg_bu – The regularization term for \(b_u\). Takes precedence
over
reg_all
if set. Default isNone
. - reg_bi – The regularization term for \(b_i\). Takes precedence
over
reg_all
if set. Default isNone
. - reg_pu – The regularization term for \(p_u\). Takes precedence
over
reg_all
if set. Default isNone
. - reg_qi – The regularization term for \(q_i\). Takes precedence
over
reg_all
if set. Default isNone
. - verbose – If
True
, prints the current epoch. Default isFalse
.
- n_factors – The number of factors. Default is
-
class
surprise.prediction_algorithms.matrix_factorization.
SVDpp
¶ Bases:
surprise.prediction_algorithms.algo_base.AlgoBase
The SVD++ algorithm, an extension of
SVD
taking into account implicite ratings.The prediction \(\hat{r}_{ui}\) is set as:
\[\hat{r}_{ui} = \mu + b_u + b_i + q_i^T\left(p_u + |I_u|^{-\frac{1}{2}} \sum_{j \in I_u}y_j\right)\]Where the \(y_j\) terms are a new set of item factors that capture implicite ratings. Here, an implicite rating describes the fact that a user \(u\) rated an item \(j\), regardless of the rating value.
If user \(u\) is unknown, then the bias \(b_u\) and the factors \(p_u\) are assumed to be zero. The same applies for item \(i\) with \(b_i\), \(q_i\) and \(y_i\).
For details, see section 4 of [Kor08]. See also [RRSK10], section 5.3.1.
Just as for
SVD
, the parameters are learnt using a SGD on the regularized squared error objective.Baselines are initialized to
0
. User and item factors are initialized to0.1
, as recommended by Funk.You have control over the learning rate \(\gamma\) and the regularization term \(\lambda\). Both can be different for each kind of parameter (see below). By default, learning rates are set to
0.005
and regularization termes are set to0.02
.Parameters: - n_factors – The number of factors. Default is
20
. - n_epochs – The number of iteration of the SGD procedure. Default is
20
. - lr_all – The learning rate for all parameters. Default is
0.007
. - reg_all – The regularization term for all parameters. Default is
0.02
. - lr_bu – The learning rate for \(b_u\). Takes precedence over
lr_all
if set. Default isNone
. - lr_bi – The learning rate for \(b_i\). Takes precedence over
lr_all
if set. Default isNone
. - lr_pu – The learning rate for \(p_u\). Takes precedence over
lr_all
if set. Default isNone
. - lr_qi – The learning rate for \(q_i\). Takes precedence over
lr_all
if set. Default isNone
. - lr_yj – The learning rate for \(y_j\). Takes precedence over
lr_all
if set. Default isNone
. - reg_bu – The regularization term for \(b_u\). Takes precedence
over
reg_all
if set. Default isNone
. - reg_bi – The regularization term for \(b_i\). Takes precedence
over
reg_all
if set. Default isNone
. - reg_pu – The regularization term for \(p_u\). Takes precedence
over
reg_all
if set. Default isNone
. - reg_qi – The regularization term for \(q_i\). Takes precedence
over
reg_all
if set. Default isNone
. - reg_yj – The regularization term for \(y_j\). Takes precedence
over
reg_all
if set. Default isNone
. - verbose – If
True
, prints the current epoch. Default isFalse
.
- n_factors – The number of factors. Default is
-
class
surprise.prediction_algorithms.matrix_factorization.
NMF
¶ Bases:
surprise.prediction_algorithms.algo_base.AlgoBase
A collaborative filtering algorithm based on Non-negative Matrix Factorization.
This algorithm is very similar to
SVD
. The prediction \(\hat{r}_{ui}\) is set as:\[\hat{r}_{ui} = q_i^Tp_u,\]where user and item factors are kept positive. Our implementation follows that suggested in [LZXZ14], which is equivalent to [ZWFM96] in its unregularized form. Both are direct applications of NMF for dense matrices [LS01].
The optimization procedure is a (regularized) stochastic gradient descent with a specific choice of stepsize that ensures non-negativity of factors, provided that their initial values are also positive.
At each step of the SGD procedure, the factors \(f\) or user \(u\) and item \(i\) are updated as follows:
\[\begin{split}p_{uf} &\leftarrow p_{uf} &\cdot \frac{\sum_{i \in I_u} q_{if} \cdot r_{ui}}{\sum_{i \in I_u} q_{if} \cdot \hat{r_{ui}} + \lambda_u |I_u| p_{uf}}\\ q_{if} &\leftarrow q_{if} &\cdot \frac{\sum_{u \in U_i} p_{uf} \cdot r_{ui}}{\sum_{u \in U_i} p_{uf} \cdot \hat{r_{ui}} + \lambda_i |U_i| q_{if}}\\\end{split}\]where \(\lambda_u\) and \(\lambda_i\) are regularization parameters.
This algorithm is highly dependent on initial values. User and item factors are uniformly initialized between
init_low
andinit_high
. Change them at your own risks!A biased version is available by setting the
biased
parameter toTrue
. In this case, the prediction is set as\[\hat{r}_{ui} = \mu + b_u + b_i + q_i^Tp_u,\]still ensuring positive factors. Baselines are optimized in the same way as in the
SVD
algorithm. While yielding better accuracy, the biased version seems highly prone to overfitting so you may want to reduce the number of factors (or increase regularization).Parameters: - n_factors – The number of factors. Default is
15
. - n_epochs – The number of iteration of the SGD procedure. Default is
50
. - biased (bool) – Whether to use baselines (or biases). Default is
False
. - reg_pu – The regularization term for users \(\lambda_u\). Default is
0.06
. - reg_qi – The regularization term for items \(\lambda_i\). Default is
0.06
. - reg_bu – The regularization term for \(b_u\). Only relevent for
biased version. Default is
0.02
. - reg_bi – The regularization term for \(b_i\). Only relevent for
biased version. Default is
0.02
. - lr_bu – The learning rate for \(b_u\). Only relevent for biased
version. Default is
0.005
. - lr_bi – The learning rate for \(b_i\). Only relevent for biased
version. Default is
0.005
. - init_low – Lower bound for random initialization of factors. Must be
greater than
0
to ensure non-negative factors. Default is0
. - init_high – Higher bound for random initialization of factors. Default
is
1
. - verbose – If
True
, prints the current epoch. Default isFalse
.
- n_factors – The number of factors. Default is