# Prediction algorithms¶

Surprise provides with a bunch of built-in algorithms. You can find the details
of each of these in the `surprise.prediction_algorithms`

package
documentation.

Every algorithm is part of the global Surprise namespace, so you only need to import their names from the Surprise package, for example:

```
from surprise import KNNBasic
algo = KNNBasic()
```

Some of these algorithms may use baseline estimates, some may use a similarity measure. We will here review how to configure the way baselines and similarities are computed.

## Baselines estimates configuration¶

Note

This section only applies to algorithms (or similarity measures) that try to minimize the following regularized squared error (or equivalent):

For algorithms using baselines in another objective function (e.g. the
`SVD`

algorithm), the baseline configuration is done differently and is specific to
each algorithm. Please refer to their own documentation.

First of all, if you do not want to configure the way baselines are computed, you don’t have to: the default parameters will do just fine. If you do want to well... This is for you.

You may want to read section 2.1 of Factor in the Neighbors: Scalable and Accurate Collaborative Filtering by Yehuda Koren to get a good idea of what are baseline estimates.

Baselines can be estimated in two different ways:

- Using Stochastic Gradient Descent (SGD).
- Using Alternating Least Squares (ALS).

You can configure the way baselines are computed using the `bsl_options`

parameter passed at the creation of an algorithm. This parameter is a
dictionary for which the key `'method'`

indicates the method to use. Accepted
values are `'als'`

(default) and `'sgd'`

. Depending on its value, other
options may be set. For ALS:

`'reg_i'`

: The regularization parameter for items. Corresponding to \(\lambda_2\) in the paper. Default is 10.`'reg_u'`

: The regularization parameter for users, orresponding to \(\lambda_3\) in the paper. Default is 15.`'n_epochs'`

: The number of iteration of the ALS procedure. Default is 10. Note that in the paper, what is described is a**single**iteration ALS process.

And for SGD:

`'reg'`

: The regularization parameter of the cost function that is optimized, corresponding to \(\lambda_1\) and then \(\lambda_5\) in the paper. Default is 0.02.`'learning_rate'`

: The learning rate of SGD, corresponding to \(\gamma\) in the paper. Default is 0.005.`'n_epochs'`

: The number of iteration of the SGD procedure. Default is 20.

Note

For both procedures (ALS and SGD), user and item biases (\(b_u\) and \(b_i\)) are initialized to zero.

Usage examples:

```
print('Using ALS')
bsl_options = {'method': 'als',
'n_epochs': 5,
'reg_u': 12,
'reg_i': 5
}
algo = BaselineOnly(bsl_options=bsl_options)
```

```
print('Using SGD')
bsl_options = {'method': 'sgd',
'learning_rate': .00005,
}
algo = BaselineOnly(bsl_options=bsl_options)
```

Note that some similarity measures may use baselines, such as the
`pearson_baseline`

similarity.
Configuration works just the same, whether the baselines are used in the actual
prediction \(\hat{r}_{ui}\) or not:

```
bsl_options = {'method': 'als',
'n_epochs': 20,
}
sim_options = {'name': 'pearson_baseline'}
algo = KNNBasic(bsl_options=bsl_options, sim_options=sim_options)
```

This leads us to similarity measure configuration, which we will review right now.

## Similarity measure configuration¶

Many algorithms use a similarity measure to estimate a rating. The way they can
be configured is done in a similar fashion as for baseline ratings: you just
need to pass a `sim_options`

argument at the creation of an algorithm. This
argument is a dictionary with the following (all optional) keys:

`'name'`

: The name of the similarity to use, as defined in the`similarities`

module. Default is`'MSD'`

.`'user_based'`

: Whether similarities will be computed between users or between items. This has a**huge**impact on the performance of a prediction algorithm. Default is`True`

.`'min_support'`

: The minimum number of common items (when`'user_based'`

is`'True'`

) or minimum number of common users (when`'user_based'`

is`'False'`

) for the similarity not to be zero. Simply put, if \(|I_{uv}| < \text{min_support}\) then \(\text{sim}(u, v) = 0\). The same goes for items.`'shrinkage'`

: Shrinkage parameter to apply (only relevent for`pearson_baseline`

similarity). Default is 100.

Usage examples:

```
sim_options = {'name': 'cosine',
'user_based': False # compute similarities between items
}
algo = KNNBasic(sim_options=sim_options)
```

```
sim_options = {'name': 'pearson_baseline',
'shrinkage': 0 # no shrinkage
}
algo = KNNBasic(sim_options=sim_options)
```

See also

The `similarities`

module.