Matrix Factorization-based algortihms

class surprise.prediction_algorithms.matrix_factorization.SVD

Bases: surprise.prediction_algorithms.algo_base.AlgoBase

The famous SVD algorithm, as popularized by Simon Funk during the Netflix Prize. When baselines are not used, this is equivalent to Probabilistic Matrix Factorization [SM08] (see note below).

The prediction \(\hat{r}_{ui}\) is set as:

\[\hat{r}_{ui} = \mu + b_u + b_i + q_i^Tp_u\]

If user \(u\) is unknown, then the bias \(b_u\) and the factors \(p_u\) are assumed to be zero. The same applies for item \(i\) with \(b_i\) and \(q_i\).

For details, see eq. (5) from [KBV09]. See also [RRSK10], section 5.3.1.

To estimate all the unkown, we minimize the following regularized squared error:

\[\sum_{r_{ui} \in R_{train}} \left(r_{ui} - \hat{r}_{ui} \right)^2 + \lambda\left(b_i^2 + b_u^2 + ||q_i||^2 + ||p_u||^2\right)\]

The minimization is performed by a very straightforward stochastic gradient descent:

\[\begin{split}b_u &\leftarrow b_u &+ \gamma (e_{ui} - \lambda b_u)\\ b_i &\leftarrow b_i &+ \gamma (e_{ui} - \lambda b_i)\\ p_u &\leftarrow p_u &+ \gamma (e_{ui} q_i - \lambda p_u)\\ q_i &\leftarrow q_i &+ \gamma (e_{ui} p_u - \lambda q_i)\end{split}\]

where \(e_{ui} = r_{ui} - \hat{r}_{ui}\). These steps are performed over all the ratings of the trainset and repeated n_epochs times. Baselines are initialized to 0. User and item factors are initialized to 0.1, as recommended by Funk.

You have control over the learning rate \(\gamma\) and the regularization term \(\lambda\). Both can be different for each kind of parameter (see below). By default, learning rates are set to 0.005 and regularization termes are set to 0.02.

Note

You can choose to use an unbiased version of this algorithm, simply predicting:

\[\hat{r}_{ui} = q_i^Tp_u\]

This is equivalent to Probabilistic Matrix Factorization ([SM08], section 2) and can be achieved by setting the biased parameter to False.

Parameters:
  • n_factors – The number of factors. Default is 100.
  • n_epochs – The number of iteration of the SGD procedure. Default is 20.
  • biased (bool) – Whether to use baselines (or biases). See note above. Default is True.
  • lr_all – The learning rate for all parameters. Default is 0.005.
  • reg_all – The regularization term for all parameters. Default is 0.02.
  • lr_bu – The learning rate for \(b_u\). Takes precedence over lr_all if set. Default is None.
  • lr_bi – The learning rate for \(b_i\). Takes precedence over lr_all if set. Default is None.
  • lr_pu – The learning rate for \(p_u\). Takes precedence over lr_all if set. Default is None.
  • lr_qi – The learning rate for \(q_i\). Takes precedence over lr_all if set. Default is None.
  • reg_bu – The regularization term for \(b_u\). Takes precedence over reg_all if set. Default is None.
  • reg_bi – The regularization term for \(b_i\). Takes precedence over reg_all if set. Default is None.
  • reg_pu – The regularization term for \(p_u\). Takes precedence over reg_all if set. Default is None.
  • reg_qi – The regularization term for \(q_i\). Takes precedence over reg_all if set. Default is None.
  • verbose – If True, prints the current epoch. Default is False.
class surprise.prediction_algorithms.matrix_factorization.SVDpp

Bases: surprise.prediction_algorithms.algo_base.AlgoBase

The SVD++ algorithm, an extension of SVD taking into account implicite ratings.

The prediction \(\hat{r}_{ui}\) is set as:

\[\hat{r}_{ui} = \mu + b_u + b_i + q_i^T\left(p_u + |I_u|^{-\frac{1}{2}} \sum_{j \in I_u}y_j\right)\]

Where the \(y_j\) terms are a new set of item factors that capture implicite ratings. Here, an implicite rating describes the fact that a user \(u\) rated an item \(j\), regardless of the rating value.

If user \(u\) is unknown, then the bias \(b_u\) and the factors \(p_u\) are assumed to be zero. The same applies for item \(i\) with \(b_i\), \(q_i\) and \(y_i\).

For details, see section 4 of [Kor08]. See also [RRSK10], section 5.3.1.

Just as for SVD, the parameters are learnt using a SGD on the regularized squared error objective.

Baselines are initialized to 0. User and item factors are initialized to 0.1, as recommended by Funk.

You have control over the learning rate \(\gamma\) and the regularization term \(\lambda\). Both can be different for each kind of parameter (see below). By default, learning rates are set to 0.005 and regularization termes are set to 0.02.

Parameters:
  • n_factors – The number of factors. Default is 20.
  • n_epochs – The number of iteration of the SGD procedure. Default is 20.
  • lr_all – The learning rate for all parameters. Default is 0.007.
  • reg_all – The regularization term for all parameters. Default is 0.02.
  • lr_bu – The learning rate for \(b_u\). Takes precedence over lr_all if set. Default is None.
  • lr_bi – The learning rate for \(b_i\). Takes precedence over lr_all if set. Default is None.
  • lr_pu – The learning rate for \(p_u\). Takes precedence over lr_all if set. Default is None.
  • lr_qi – The learning rate for \(q_i\). Takes precedence over lr_all if set. Default is None.
  • lr_yj – The learning rate for \(y_j\). Takes precedence over lr_all if set. Default is None.
  • reg_bu – The regularization term for \(b_u\). Takes precedence over reg_all if set. Default is None.
  • reg_bi – The regularization term for \(b_i\). Takes precedence over reg_all if set. Default is None.
  • reg_pu – The regularization term for \(p_u\). Takes precedence over reg_all if set. Default is None.
  • reg_qi – The regularization term for \(q_i\). Takes precedence over reg_all if set. Default is None.
  • reg_yj – The regularization term for \(y_j\). Takes precedence over reg_all if set. Default is None.
  • verbose – If True, prints the current epoch. Default is False.
class surprise.prediction_algorithms.matrix_factorization.NMF

Bases: surprise.prediction_algorithms.algo_base.AlgoBase

A collaborative filtering algorithm based on Non-negative Matrix Factorization.

This algorithm is very similar to SVD. The prediction \(\hat{r}_{ui}\) is set as:

\[\hat{r}_{ui} = q_i^Tp_u,\]

where user and item factors are kept positive. Our implementation follows that suggested in [LZXZ14], which is equivalent to [ZWFM96] in its unregularized form. Both are direct applications of NMF for dense matrices [LS01].

The optimization procedure is a (regularized) stochastic gradient descent with a specific choice of stepsize that ensures non-negativity of factors, provided that their initial values are also positive.

At each step of the SGD procedure, the factors \(f\) or user \(u\) and item \(i\) are updated as follows:

\[\begin{split}p_{uf} &\leftarrow p_{uf} &\cdot \frac{\sum_{i \in I_u} q_{if} \cdot r_{ui}}{\sum_{i \in I_u} q_{if} \cdot \hat{r_{ui}} + \lambda_u |I_u| p_{uf}}\\ q_{if} &\leftarrow q_{if} &\cdot \frac{\sum_{u \in U_i} p_{uf} \cdot r_{ui}}{\sum_{u \in U_i} p_{uf} \cdot \hat{r_{ui}} + \lambda_i |U_i| q_{if}}\\\end{split}\]

where \(\lambda_u\) and \(\lambda_i\) are regularization parameters.

This algorithm is highly dependent on initial values. User and item factors are uniformly initialized between init_low and init_high. Change them at your own risks!

A biased version is available by setting the biased parameter to True. In this case, the prediction is set as

\[\hat{r}_{ui} = \mu + b_u + b_i + q_i^Tp_u,\]

still ensuring positive factors. Baselines are optimized in the same way as in the SVD algorithm. While yielding better accuracy, the biased version seems highly prone to overfitting so you may want to reduce the number of factors (or increase regularization).

Parameters:
  • n_factors – The number of factors. Default is 15.
  • n_epochs – The number of iteration of the SGD procedure. Default is 50.
  • biased (bool) – Whether to use baselines (or biases). Default is False.
  • reg_pu – The regularization term for users \(\lambda_u\). Default is 0.06.
  • reg_qi – The regularization term for items \(\lambda_i\). Default is 0.06.
  • reg_bu – The regularization term for \(b_u\). Only relevent for biased version. Default is 0.02.
  • reg_bi – The regularization term for \(b_i\). Only relevent for biased version. Default is 0.02.
  • lr_bu – The learning rate for \(b_u\). Only relevent for biased version. Default is 0.005.
  • lr_bi – The learning rate for \(b_i\). Only relevent for biased version. Default is 0.005.
  • init_low – Lower bound for random initialization of factors. Must be greater than 0 to ensure non-negative factors. Default is 0.
  • init_high – Higher bound for random initialization of factors. Default is 1.
  • verbose – If True, prints the current epoch. Default is False.