Using prediction algorithms¶
Surprise provides a bunch of built-in algorithms. All algorithms derive from
the AlgoBase
base
class, where are implemented some key methods (e.g. predict
, fit
and test
). The list and
details of the available prediction algorithms can be found in the
prediction_algorithms
package
documentation.
Every algorithm is part of the global Surprise namespace, so you only need to import their names from the Surprise package, for example:
from surprise import KNNBasic
algo = KNNBasic()
Some of these algorithms may use baseline estimates, some may use a similarity measure. We will here review how to configure the way baselines and similarities are computed.
Baselines estimates configuration¶
Note
This section only applies to algorithms (or similarity measures) that try to minimize the following regularized squared error (or equivalent):
For algorithms using baselines in another objective function (e.g. the
SVD
algorithm), the baseline configuration is done differently and is specific to
each algorithm. Please refer to their own documentation.
First of all, if you do not want to configure the way baselines are computed, you don’t have to: the default parameters will do just fine. If you do want to well… This is for you.
You may want to read section 2.1 of [Kor10] to get a good idea of what are baseline estimates.
Baselines can be estimated in two different ways:
Using Stochastic Gradient Descent (SGD).
Using Alternating Least Squares (ALS).
You can configure the way baselines are computed using the bsl_options
parameter passed at the creation of an algorithm. This parameter is a
dictionary for which the key 'method'
indicates the method to use. Accepted
values are 'als'
(default) and 'sgd'
. Depending on its value, other
options may be set. For ALS:
'reg_i'
: The regularization parameter for items. Corresponding to \(\lambda_2\) in [Kor10]. Default is10
.'reg_u'
: The regularization parameter for users. Corresponding to \(\lambda_3\) in [Kor10]. Default is15
.'n_epochs'
: The number of iteration of the ALS procedure. Default is10
. Note that in [Kor10], what is described is a single iteration ALS process.
examples/baselines_conf.py
¶bsl_options = {"method": "als", "n_epochs": 5, "reg_u": 12, "reg_i": 5}
algo = BaselineOnly(bsl_options=bsl_options)
And for SGD:
'reg'
: The regularization parameter of the cost function that is optimized, corresponding to \(\lambda_1\) in [Kor10]. Default is0.02
.'learning_rate'
: The learning rate of SGD, corresponding to \(\gamma\) in [Kor10]. Default is0.005
.'n_epochs'
: The number of iteration of the SGD procedure. Default is 20.
examples/baselines_conf.py
¶bsl_options = {
"method": "sgd",
"learning_rate": 0.00005,
}
algo = BaselineOnly(bsl_options=bsl_options)
Note
For both procedures (ALS and SGD), user and item biases (\(b_u\) and \(b_i\)) are initialized to zero.
Some similarity measures may use baselines, such as the
pearson_baseline
similarity.
Configuration works just the same, whether the baselines are used in the actual
prediction \(\hat{r}_{ui}\) or not:
examples/baselines_conf.py
¶bsl_options = {
"method": "als",
"n_epochs": 20,
}
sim_options = {"name": "pearson_baseline"}
algo = KNNBasic(bsl_options=bsl_options, sim_options=sim_options)
This leads us to similarity measure configuration, which we will review right now.
Similarity measure configuration¶
Many algorithms use a similarity measure to estimate a rating. The way they can
be configured is done in a similar fashion as for baseline ratings: you just
need to pass a sim_options
argument at the creation of an algorithm. This
argument is a dictionary with the following (all optional) keys:
'name'
: The name of the similarity to use, as defined in thesimilarities
module. Default is'MSD'
.'user_based'
: Whether similarities will be computed between users or between items. This has a huge impact on the performance of a prediction algorithm. Default isTrue
.'min_support'
: The minimum number of common items (when'user_based'
is'True'
) or minimum number of common users (when'user_based'
is'False'
) for the similarity not to be zero. Simply put, if \(|I_{uv}| < \text{min_support}\) then \(\text{sim}(u, v) = 0\). The same goes for items.'shrinkage'
: Shrinkage parameter to apply (only relevant forpearson_baseline
similarity). Default is 100.
Usage examples:
examples/similarity_conf.py
¶sim_options = {
"name": "cosine",
"user_based": False, # compute similarities between items
}
algo = KNNBasic(sim_options=sim_options)
examples/similarity_conf.py
¶sim_options = {"name": "pearson_baseline", "shrinkage": 0} # no shrinkage
algo = KNNBasic(sim_options=sim_options)
See also
The similarities
module.