Matrix Factorization-based algorithms¶
- class surprise.prediction_algorithms.matrix_factorization.SVD(n_factors=100, n_epochs=20, biased=True, init_mean=0, init_std_dev=0.1, lr_all=0.005, reg_all=0.02, lr_bu=None, lr_bi=None, lr_pu=None, lr_qi=None, reg_bu=None, reg_bi=None, reg_pu=None, reg_qi=None, random_state=None, verbose=False)¶
Bases:
AlgoBase
The famous SVD algorithm, as popularized by Simon Funk during the Netflix Prize. When baselines are not used, this is equivalent to Probabilistic Matrix Factorization [SM08] (see note below).
The prediction \(\hat{r}_{ui}\) is set as:
\[\hat{r}_{ui} = \mu + b_u + b_i + q_i^Tp_u\]If user \(u\) is unknown, then the bias \(b_u\) and the factors \(p_u\) are assumed to be zero. The same applies for item \(i\) with \(b_i\) and \(q_i\).
For details, see equation (5) from [KBV09]. See also [RRSK10], section 5.3.1.
To estimate all the unknown, we minimize the following regularized squared error:
\[\sum_{r_{ui} \in R_{train}} \left(r_{ui} - \hat{r}_{ui} \right)^2 + \lambda\left(b_i^2 + b_u^2 + ||q_i||^2 + ||p_u||^2\right)\]The minimization is performed by a very straightforward stochastic gradient descent:
\[\begin{split}b_u &\leftarrow b_u &+ \gamma (e_{ui} - \lambda b_u)\\ b_i &\leftarrow b_i &+ \gamma (e_{ui} - \lambda b_i)\\ p_u &\leftarrow p_u &+ \gamma (e_{ui} \cdot q_i - \lambda p_u)\\ q_i &\leftarrow q_i &+ \gamma (e_{ui} \cdot p_u - \lambda q_i)\end{split}\]where \(e_{ui} = r_{ui} - \hat{r}_{ui}\). These steps are performed over all the ratings of the trainset and repeated
n_epochs
times. Baselines are initialized to0
. User and item factors are randomly initialized according to a normal distribution, which can be tuned using theinit_mean
andinit_std_dev
parameters.You also have control over the learning rate \(\gamma\) and the regularization term \(\lambda\). Both can be different for each kind of parameter (see below). By default, learning rates are set to
0.005
and regularization terms are set to0.02
.Note
You can choose to use an unbiased version of this algorithm, simply predicting:
\[\hat{r}_{ui} = q_i^Tp_u\]This is equivalent to Probabilistic Matrix Factorization ([SM08], section 2) and can be achieved by setting the
biased
parameter toFalse
.- Parameters
n_factors – The number of factors. Default is
100
.n_epochs – The number of iteration of the SGD procedure. Default is
20
.biased (bool) – Whether to use baselines (or biases). See note above. Default is
True
.init_mean – The mean of the normal distribution for factor vectors initialization. Default is
0
.init_std_dev – The standard deviation of the normal distribution for factor vectors initialization. Default is
0.1
.lr_all – The learning rate for all parameters. Default is
0.005
.reg_all – The regularization term for all parameters. Default is
0.02
.lr_bu – The learning rate for \(b_u\). Takes precedence over
lr_all
if set. Default isNone
.lr_bi – The learning rate for \(b_i\). Takes precedence over
lr_all
if set. Default isNone
.lr_pu – The learning rate for \(p_u\). Takes precedence over
lr_all
if set. Default isNone
.lr_qi – The learning rate for \(q_i\). Takes precedence over
lr_all
if set. Default isNone
.reg_bu – The regularization term for \(b_u\). Takes precedence over
reg_all
if set. Default isNone
.reg_bi – The regularization term for \(b_i\). Takes precedence over
reg_all
if set. Default isNone
.reg_pu – The regularization term for \(p_u\). Takes precedence over
reg_all
if set. Default isNone
.reg_qi – The regularization term for \(q_i\). Takes precedence over
reg_all
if set. Default isNone
.random_state (int, RandomState instance from numpy, or
None
) – Determines the RNG that will be used for initialization. If int,random_state
will be used as a seed for a new RNG. This is useful to get the same initialization over multiple calls tofit()
. If RandomState instance, this same instance is used as RNG. IfNone
, the current RNG from numpy is used. Default isNone
.verbose – If
True
, prints the current epoch. Default isFalse
.
- pu¶
The user factors (only exists if
fit()
has been called)- Type
numpy array of size (n_users, n_factors)
- qi¶
The item factors (only exists if
fit()
has been called)- Type
numpy array of size (n_items, n_factors)
- bu¶
The user biases (only exists if
fit()
has been called)- Type
numpy array of size (n_users)
- bi¶
The item biases (only exists if
fit()
has been called)- Type
numpy array of size (n_items)
- class surprise.prediction_algorithms.matrix_factorization.SVDpp(n_factors=20, n_epochs=20, init_mean=0, init_std_dev=0.1, lr_all=0.007, reg_all=0.02, lr_bu=None, lr_bi=None, lr_pu=None, lr_qi=None, lr_yj=None, reg_bu=None, reg_bi=None, reg_pu=None, reg_qi=None, reg_yj=None, random_state=None, verbose=False, cache_ratings=False)¶
Bases:
AlgoBase
The SVD++ algorithm, an extension of
SVD
taking into account implicit ratings.The prediction \(\hat{r}_{ui}\) is set as:
\[\hat{r}_{ui} = \mu + b_u + b_i + q_i^T\left(p_u + |I_u|^{-\frac{1}{2}} \sum_{j \in I_u}y_j\right)\]Where the \(y_j\) terms are a new set of item factors that capture implicit ratings. Here, an implicit rating describes the fact that a user \(u\) rated an item \(j\), regardless of the rating value.
If user \(u\) is unknown, then the bias \(b_u\) and the factors \(p_u\) are assumed to be zero. The same applies for item \(i\) with \(b_i\), \(q_i\) and \(y_i\).
For details, see section 4 of [Kor08]. See also [RRSK10], section 5.3.1.
Just as for
SVD
, the parameters are learned using a SGD on the regularized squared error objective.Baselines are initialized to
0
. User and item factors are randomly initialized according to a normal distribution, which can be tuned using theinit_mean
andinit_std_dev
parameters.You have control over the learning rate \(\gamma\) and the regularization term \(\lambda\). Both can be different for each kind of parameter (see below). By default, learning rates are set to
0.005
and regularization terms are set to0.02
.- Parameters
n_factors – The number of factors. Default is
20
.n_epochs – The number of iteration of the SGD procedure. Default is
20
.cache_ratings – Whether or not to cache ratings during fit(). This should speed-up the training, and has a higher memory footprint. Default is False.
init_mean – The mean of the normal distribution for factor vectors initialization. Default is
0
.init_std_dev – The standard deviation of the normal distribution for factor vectors initialization. Default is
0.1
.lr_all – The learning rate for all parameters. Default is
0.007
.reg_all – The regularization term for all parameters. Default is
0.02
.lr_bu – The learning rate for \(b_u\). Takes precedence over
lr_all
if set. Default isNone
.lr_bi – The learning rate for \(b_i\). Takes precedence over
lr_all
if set. Default isNone
.lr_pu – The learning rate for \(p_u\). Takes precedence over
lr_all
if set. Default isNone
.lr_qi – The learning rate for \(q_i\). Takes precedence over
lr_all
if set. Default isNone
.lr_yj – The learning rate for \(y_j\). Takes precedence over
lr_all
if set. Default isNone
.reg_bu – The regularization term for \(b_u\). Takes precedence over
reg_all
if set. Default isNone
.reg_bi – The regularization term for \(b_i\). Takes precedence over
reg_all
if set. Default isNone
.reg_pu – The regularization term for \(p_u\). Takes precedence over
reg_all
if set. Default isNone
.reg_qi – The regularization term for \(q_i\). Takes precedence over
reg_all
if set. Default isNone
.reg_yj – The regularization term for \(y_j\). Takes precedence over
reg_all
if set. Default isNone
.random_state (int, RandomState instance from numpy, or
None
) – Determines the RNG that will be used for initialization. If int,random_state
will be used as a seed for a new RNG. This is useful to get the same initialization over multiple calls tofit()
. If RandomState instance, this same instance is used as RNG. IfNone
, the current RNG from numpy is used. Default isNone
.verbose – If
True
, prints the current epoch. Default isFalse
.
- pu¶
The user factors (only exists if
fit()
has been called)- Type
numpy array of size (n_users, n_factors)
- qi¶
The item factors (only exists if
fit()
has been called)- Type
numpy array of size (n_items, n_factors)
- yj¶
The (implicit) item factors (only exists if
fit()
has been called)- Type
numpy array of size (n_items, n_factors)
- bu¶
The user biases (only exists if
fit()
has been called)- Type
numpy array of size (n_users)
- bi¶
The item biases (only exists if
fit()
has been called)- Type
numpy array of size (n_items)
- class surprise.prediction_algorithms.matrix_factorization.NMF(n_factors=15, n_epochs=50, biased=False, reg_pu=0.06, reg_qi=0.06, reg_bu=0.02, reg_bi=0.02, lr_bu=0.005, lr_bi=0.005, init_low=0, init_high=1, random_state=None, verbose=False)¶
Bases:
AlgoBase
A collaborative filtering algorithm based on Non-negative Matrix Factorization.
This algorithm is very similar to
SVD
. The prediction \(\hat{r}_{ui}\) is set as:\[\hat{r}_{ui} = q_i^Tp_u,\]where user and item factors are kept positive. Our implementation follows that suggested in [LZXZ14], which is equivalent to [ZWFM96] in its non-regularized form. Both are direct applications of NMF for dense matrices [LS01].
The optimization procedure is a (regularized) stochastic gradient descent with a specific choice of step size that ensures non-negativity of factors, provided that their initial values are also positive.
At each step of the SGD procedure, the factors \(f\) or user \(u\) and item \(i\) are updated as follows:
\[\begin{split}p_{uf} &\leftarrow p_{uf} &\cdot \frac{\sum_{i \in I_u} q_{if} \cdot r_{ui}}{\sum_{i \in I_u} q_{if} \cdot \hat{r_{ui}} + \lambda_u |I_u| p_{uf}}\\ q_{if} &\leftarrow q_{if} &\cdot \frac{\sum_{u \in U_i} p_{uf} \cdot r_{ui}}{\sum_{u \in U_i} p_{uf} \cdot \hat{r_{ui}} + \lambda_i |U_i| q_{if}}\\\end{split}\]where \(\lambda_u\) and \(\lambda_i\) are regularization parameters.
This algorithm is highly dependent on initial values. User and item factors are uniformly initialized between
init_low
andinit_high
. Change them at your own risks!A biased version is available by setting the
biased
parameter toTrue
. In this case, the prediction is set as\[\hat{r}_{ui} = \mu + b_u + b_i + q_i^Tp_u,\]still ensuring positive factors. Baselines are optimized in the same way as in the
SVD
algorithm. While yielding better accuracy, the biased version seems highly prone to overfitting so you may want to reduce the number of factors (or increase regularization).- Parameters
n_factors – The number of factors. Default is
15
.n_epochs – The number of iteration of the SGD procedure. Default is
50
.biased (bool) – Whether to use baselines (or biases). Default is
False
.reg_pu – The regularization term for users \(\lambda_u\). Default is
0.06
.reg_qi – The regularization term for items \(\lambda_i\). Default is
0.06
.reg_bu – The regularization term for \(b_u\). Only relevant for biased version. Default is
0.02
.reg_bi – The regularization term for \(b_i\). Only relevant for biased version. Default is
0.02
.lr_bu – The learning rate for \(b_u\). Only relevant for biased version. Default is
0.005
.lr_bi – The learning rate for \(b_i\). Only relevant for biased version. Default is
0.005
.init_low – Lower bound for random initialization of factors. Must be greater than
0
to ensure non-negative factors. Default is0
.init_high – Higher bound for random initialization of factors. Default is
1
.random_state (int, RandomState instance from numpy, or
None
) – Determines the RNG that will be used for initialization. If int,random_state
will be used as a seed for a new RNG. This is useful to get the same initialization over multiple calls tofit()
. If RandomState instance, this same instance is used as RNG. IfNone
, the current RNG from numpy is used. Default isNone
.verbose – If
True
, prints the current epoch. Default isFalse
.
- pu¶
The user factors (only exists if
fit()
has been called)- Type
numpy array of size (n_users, n_factors)
- qi¶
The item factors (only exists if
fit()
has been called)- Type
numpy array of size (n_items, n_factors)
- bu¶
The user biases (only exists if
fit()
has been called)- Type
numpy array of size (n_users)
- bi¶
The item biases (only exists if
fit()
has been called)- Type
numpy array of size (n_items)