Matrix Factorization-based algorithms¶

class surprise.prediction_algorithms.matrix_factorization.SVD

The famous SVD algorithm, as popularized by Simon Funk during the Netflix Prize. When baselines are not used, this is equivalent to Probabilistic Matrix Factorization [SM08] (see note below).

The prediction $$\hat{r}_{ui}$$ is set as:

$\hat{r}_{ui} = \mu + b_u + b_i + q_i^Tp_u$

If user $$u$$ is unknown, then the bias $$b_u$$ and the factors $$p_u$$ are assumed to be zero. The same applies for item $$i$$ with $$b_i$$ and $$q_i$$.

For details, see equation (5) from [KBV09]. See also [RRSK10], section 5.3.1.

To estimate all the unknown, we minimize the following regularized squared error:

$\sum_{r_{ui} \in R_{train}} \left(r_{ui} - \hat{r}_{ui} \right)^2 + \lambda\left(b_i^2 + b_u^2 + ||q_i||^2 + ||p_u||^2\right)$

The minimization is performed by a very straightforward stochastic gradient descent:

$\begin{split}b_u &\leftarrow b_u &+ \gamma (e_{ui} - \lambda b_u)\\ b_i &\leftarrow b_i &+ \gamma (e_{ui} - \lambda b_i)\\ p_u &\leftarrow p_u &+ \gamma (e_{ui} \cdot q_i - \lambda p_u)\\ q_i &\leftarrow q_i &+ \gamma (e_{ui} \cdot p_u - \lambda q_i)\end{split}$

where $$e_{ui} = r_{ui} - \hat{r}_{ui}$$. These steps are performed over all the ratings of the trainset and repeated n_epochs times. Baselines are initialized to 0. User and item factors are randomly initialized according to a normal distribution, which can be tuned using the init_mean and init_std_dev parameters.

You also have control over the learning rate $$\gamma$$ and the regularization term $$\lambda$$. Both can be different for each kind of parameter (see below). By default, learning rates are set to 0.005 and regularization terms are set to 0.02.

Note

You can choose to use an unbiased version of this algorithm, simply predicting:

$\hat{r}_{ui} = q_i^Tp_u$

This is equivalent to Probabilistic Matrix Factorization ([SM08], section 2) and can be achieved by setting the biased parameter to False.

Parameters: n_factors – The number of factors. Default is 100. n_epochs – The number of iteration of the SGD procedure. Default is 20. biased (bool) – Whether to use baselines (or biases). See note above. Default is True. init_mean – The mean of the normal distribution for factor vectors initialization. Default is 0. init_std_dev – The standard deviation of the normal distribution for factor vectors initialization. Default is 0.1. lr_all – The learning rate for all parameters. Default is 0.005. reg_all – The regularization term for all parameters. Default is 0.02. lr_bu – The learning rate for $$b_u$$. Takes precedence over lr_all if set. Default is None. lr_bi – The learning rate for $$b_i$$. Takes precedence over lr_all if set. Default is None. lr_pu – The learning rate for $$p_u$$. Takes precedence over lr_all if set. Default is None. lr_qi – The learning rate for $$q_i$$. Takes precedence over lr_all if set. Default is None. reg_bu – The regularization term for $$b_u$$. Takes precedence over reg_all if set. Default is None. reg_bi – The regularization term for $$b_i$$. Takes precedence over reg_all if set. Default is None. reg_pu – The regularization term for $$p_u$$. Takes precedence over reg_all if set. Default is None. reg_qi – The regularization term for $$q_i$$. Takes precedence over reg_all if set. Default is None. random_state (int, RandomState instance from numpy, or None) – Determines the RNG that will be used for initialization. If int, random_state will be used as a seed for a new RNG. This is useful to get the same initialization over multiple calls to fit(). If RandomState instance, this same instance is used as RNG. If None, the current RNG from numpy is used. Default is None. verbose – If True, prints the current epoch. Default is False.
pu

numpy array of size (n_users, n_factors) – The user factors (only exists if fit() has been called)

qi

numpy array of size (n_items, n_factors) – The item factors (only exists if fit() has been called)

bu

numpy array of size (n_users) – The user biases (only exists if fit() has been called)

bi

numpy array of size (n_items) – The item biases (only exists if fit() has been called)

class surprise.prediction_algorithms.matrix_factorization.SVDpp

The SVD++ algorithm, an extension of SVD taking into account implicit ratings.

The prediction $$\hat{r}_{ui}$$ is set as:

$\hat{r}_{ui} = \mu + b_u + b_i + q_i^T\left(p_u + |I_u|^{-\frac{1}{2}} \sum_{j \in I_u}y_j\right)$

Where the $$y_j$$ terms are a new set of item factors that capture implicit ratings. Here, an implicit rating describes the fact that a user $$u$$ rated an item $$j$$, regardless of the rating value.

If user $$u$$ is unknown, then the bias $$b_u$$ and the factors $$p_u$$ are assumed to be zero. The same applies for item $$i$$ with $$b_i$$, $$q_i$$ and $$y_i$$.

For details, see section 4 of [Kor08]. See also [RRSK10], section 5.3.1.

Just as for SVD, the parameters are learned using a SGD on the regularized squared error objective.

Baselines are initialized to 0. User and item factors are randomly initialized according to a normal distribution, which can be tuned using the init_mean and init_std_dev parameters.

You have control over the learning rate $$\gamma$$ and the regularization term $$\lambda$$. Both can be different for each kind of parameter (see below). By default, learning rates are set to 0.005 and regularization terms are set to 0.02.

Parameters: n_factors – The number of factors. Default is 20. n_epochs – The number of iteration of the SGD procedure. Default is 20. init_mean – The mean of the normal distribution for factor vectors initialization. Default is 0. init_std_dev – The standard deviation of the normal distribution for factor vectors initialization. Default is 0.1. lr_all – The learning rate for all parameters. Default is 0.007. reg_all – The regularization term for all parameters. Default is 0.02. lr_bu – The learning rate for $$b_u$$. Takes precedence over lr_all if set. Default is None. lr_bi – The learning rate for $$b_i$$. Takes precedence over lr_all if set. Default is None. lr_pu – The learning rate for $$p_u$$. Takes precedence over lr_all if set. Default is None. lr_qi – The learning rate for $$q_i$$. Takes precedence over lr_all if set. Default is None. lr_yj – The learning rate for $$y_j$$. Takes precedence over lr_all if set. Default is None. reg_bu – The regularization term for $$b_u$$. Takes precedence over reg_all if set. Default is None. reg_bi – The regularization term for $$b_i$$. Takes precedence over reg_all if set. Default is None. reg_pu – The regularization term for $$p_u$$. Takes precedence over reg_all if set. Default is None. reg_qi – The regularization term for $$q_i$$. Takes precedence over reg_all if set. Default is None. reg_yj – The regularization term for $$y_j$$. Takes precedence over reg_all if set. Default is None. random_state (int, RandomState instance from numpy, or None) – Determines the RNG that will be used for initialization. If int, random_state will be used as a seed for a new RNG. This is useful to get the same initialization over multiple calls to fit(). If RandomState instance, this same instance is used as RNG. If None, the current RNG from numpy is used. Default is None. verbose – If True, prints the current epoch. Default is False.
pu

numpy array of size (n_users, n_factors) – The user factors (only exists if fit() has been called)

qi

numpy array of size (n_items, n_factors) – The item factors (only exists if fit() has been called)

yj

numpy array of size (n_items, n_factors) – The (implicit) item factors (only exists if fit() has been called)

bu

numpy array of size (n_users) – The user biases (only exists if fit() has been called)

bi

numpy array of size (n_items) – The item biases (only exists if fit() has been called)

class surprise.prediction_algorithms.matrix_factorization.NMF

A collaborative filtering algorithm based on Non-negative Matrix Factorization.

This algorithm is very similar to SVD. The prediction $$\hat{r}_{ui}$$ is set as:

$\hat{r}_{ui} = q_i^Tp_u,$

where user and item factors are kept positive. Our implementation follows that suggested in [LZXZ14], which is equivalent to [ZWFM96] in its non-regularized form. Both are direct applications of NMF for dense matrices [LS01].

The optimization procedure is a (regularized) stochastic gradient descent with a specific choice of step size that ensures non-negativity of factors, provided that their initial values are also positive.

At each step of the SGD procedure, the factors $$f$$ or user $$u$$ and item $$i$$ are updated as follows:

$\begin{split}p_{uf} &\leftarrow p_{uf} &\cdot \frac{\sum_{i \in I_u} q_{if} \cdot r_{ui}}{\sum_{i \in I_u} q_{if} \cdot \hat{r_{ui}} + \lambda_u |I_u| p_{uf}}\\ q_{if} &\leftarrow q_{if} &\cdot \frac{\sum_{u \in U_i} p_{uf} \cdot r_{ui}}{\sum_{u \in U_i} p_{uf} \cdot \hat{r_{ui}} + \lambda_i |U_i| q_{if}}\\\end{split}$

where $$\lambda_u$$ and $$\lambda_i$$ are regularization parameters.

This algorithm is highly dependent on initial values. User and item factors are uniformly initialized between init_low and init_high. Change them at your own risks!

A biased version is available by setting the biased parameter to True. In this case, the prediction is set as

$\hat{r}_{ui} = \mu + b_u + b_i + q_i^Tp_u,$

still ensuring positive factors. Baselines are optimized in the same way as in the SVD algorithm. While yielding better accuracy, the biased version seems highly prone to overfitting so you may want to reduce the number of factors (or increase regularization).

Parameters: n_factors – The number of factors. Default is 15. n_epochs – The number of iteration of the SGD procedure. Default is 50. biased (bool) – Whether to use baselines (or biases). Default is False. reg_pu – The regularization term for users $$\lambda_u$$. Default is 0.06. reg_qi – The regularization term for items $$\lambda_i$$. Default is 0.06. reg_bu – The regularization term for $$b_u$$. Only relevant for biased version. Default is 0.02. reg_bi – The regularization term for $$b_i$$. Only relevant for biased version. Default is 0.02. lr_bu – The learning rate for $$b_u$$. Only relevant for biased version. Default is 0.005. lr_bi – The learning rate for $$b_i$$. Only relevant for biased version. Default is 0.005. init_low – Lower bound for random initialization of factors. Must be greater than 0 to ensure non-negative factors. Default is 0. init_high – Higher bound for random initialization of factors. Default is 1. random_state (int, RandomState instance from numpy, or None) – Determines the RNG that will be used for initialization. If int, random_state will be used as a seed for a new RNG. This is useful to get the same initialization over multiple calls to fit(). If RandomState instance, this same instance is used as RNG. If None, the current RNG from numpy is used. Default is None. verbose – If True, prints the current epoch. Default is False.
pu

numpy array of size (n_users, n_factors) – The user factors (only exists if fit() has been called)

qi

numpy array of size (n_items, n_factors) – The item factors (only exists if fit() has been called)

bu

numpy array of size (n_users) – The user biases (only exists if fit() has been called)

bi

numpy array of size (n_items) – The item biases (only exists if fit() has been called)