k-NN inspired algorithms¶
These are algorithms that are directly derived from a basic nearest neighbors approach.
Note
For each of these algorithms, the actual number of neighbors that are
aggregated to compute an estimation is necessarily less than or equal to
\(k\). First, there might just not exist enough neighbors and second, the
sets \(N_i^k(u)\) and \(N_u^k(i)\) only include neighbors for which
the similarity measure is positive. It would make no sense to aggregate
ratings from users (or items) that are negatively correlated. For a given
prediction, the actual number of neighbors can be retrieved in the
'actual_k'
field of the details
dictionary of the prediction
.
You may want to read the User Guide
on how to configure the sim_options
parameter.
- class surprise.prediction_algorithms.knns.KNNBasic(k=40, min_k=1, sim_options={}, verbose=True, **kwargs)[source]¶
Bases:
SymmetricAlgo
A basic collaborative filtering algorithm.
The prediction \(\hat{r}_{ui}\) is set as:
\[\hat{r}_{ui} = \frac{ \sum\limits_{v \in N^k_i(u)} \text{sim}(u, v) \cdot r_{vi}} {\sum\limits_{v \in N^k_i(u)} \text{sim}(u, v)}\]or
\[\hat{r}_{ui} = \frac{ \sum\limits_{j \in N^k_u(i)} \text{sim}(i, j) \cdot r_{uj}} {\sum\limits_{j \in N^k_u(i)} \text{sim}(i, j)}\]depending on the
user_based
field of thesim_options
parameter.- Parameters
k (int) – The (max) number of neighbors to take into account for aggregation (see this note). Default is
40
.min_k (int) – The minimum number of neighbors to take into account for aggregation. If there are not enough neighbors, the prediction is set to the global mean of all ratings. Default is
1
.sim_options (dict) – A dictionary of options for the similarity measure. See Similarity measure configuration for accepted options.
verbose (bool) – Whether to print trace messages of bias estimation, similarity, etc. Default is True.
- class surprise.prediction_algorithms.knns.KNNWithMeans(k=40, min_k=1, sim_options={}, verbose=True, **kwargs)[source]¶
Bases:
SymmetricAlgo
A basic collaborative filtering algorithm, taking into account the mean ratings of each user.
The prediction \(\hat{r}_{ui}\) is set as:
\[\hat{r}_{ui} = \mu_u + \frac{ \sum\limits_{v \in N^k_i(u)} \text{sim}(u, v) \cdot (r_{vi} - \mu_v)} {\sum\limits_{v \in N^k_i(u)} \text{sim}(u, v)}\]or
\[\hat{r}_{ui} = \mu_i + \frac{ \sum\limits_{j \in N^k_u(i)} \text{sim}(i, j) \cdot (r_{uj} - \mu_j)} {\sum\limits_{j \in N^k_u(i)} \text{sim}(i, j)}\]depending on the
user_based
field of thesim_options
parameter.- Parameters
k (int) – The (max) number of neighbors to take into account for aggregation (see this note). Default is
40
.min_k (int) – The minimum number of neighbors to take into account for aggregation. If there are not enough neighbors, the neighbor aggregation is set to zero (so the prediction ends up being equivalent to the mean \(\mu_u\) or \(\mu_i\)). Default is
1
.sim_options (dict) – A dictionary of options for the similarity measure. See Similarity measure configuration for accepted options.
verbose (bool) – Whether to print trace messages of bias estimation, similarity, etc. Default is True.
- class surprise.prediction_algorithms.knns.KNNWithZScore(k=40, min_k=1, sim_options={}, verbose=True, **kwargs)[source]¶
Bases:
SymmetricAlgo
A basic collaborative filtering algorithm, taking into account the z-score normalization of each user.
The prediction \(\hat{r}_{ui}\) is set as:
\[\hat{r}_{ui} = \mu_u + \sigma_u \frac{ \sum\limits_{v \in N^k_i(u)} \text{sim}(u, v) \cdot (r_{vi} - \mu_v) / \sigma_v} {\sum\limits_{v \in N^k_i(u)} \text{sim}(u, v)}\]or
\[\hat{r}_{ui} = \mu_i + \sigma_i \frac{ \sum\limits_{j \in N^k_u(i)} \text{sim}(i, j) \cdot (r_{uj} - \mu_j) / \sigma_j} {\sum\limits_{j \in N^k_u(i)} \text{sim}(i, j)}\]depending on the
user_based
field of thesim_options
parameter.If \(\sigma\) is 0, than the overall sigma is used in that case.
- Parameters
k (int) – The (max) number of neighbors to take into account for aggregation (see this note). Default is
40
.min_k (int) – The minimum number of neighbors to take into account for aggregation. If there are not enough neighbors, the neighbor aggregation is set to zero (so the prediction ends up being equivalent to the mean \(\mu_u\) or \(\mu_i\)). Default is
1
.sim_options (dict) – A dictionary of options for the similarity measure. See Similarity measure configuration for accepted options.
verbose (bool) – Whether to print trace messages of bias estimation, similarity, etc. Default is True.
- class surprise.prediction_algorithms.knns.KNNBaseline(k=40, min_k=1, sim_options={}, bsl_options={}, verbose=True, **kwargs)[source]¶
Bases:
SymmetricAlgo
A basic collaborative filtering algorithm taking into account a baseline rating.
The prediction \(\hat{r}_{ui}\) is set as:
\[\hat{r}_{ui} = b_{ui} + \frac{ \sum\limits_{v \in N^k_i(u)} \text{sim}(u, v) \cdot (r_{vi} - b_{vi})} {\sum\limits_{v \in N^k_i(u)} \text{sim}(u, v)}\]or
\[\hat{r}_{ui} = b_{ui} + \frac{ \sum\limits_{j \in N^k_u(i)} \text{sim}(i, j) \cdot (r_{uj} - b_{uj})} {\sum\limits_{j \in N^k_u(i)} \text{sim}(i, j)}\]depending on the
user_based
field of thesim_options
parameter. For the best predictions, use thepearson_baseline
similarity measure.This algorithm corresponds to formula (3), section 2.2 of [Kor10].
- Parameters
k (int) – The (max) number of neighbors to take into account for aggregation (see this note). Default is
40
.min_k (int) – The minimum number of neighbors to take into account for aggregation. If there are not enough neighbors, the neighbor aggregation is set to zero (so the prediction ends up being equivalent to the baseline). Default is
1
.sim_options (dict) – A dictionary of options for the similarity measure. See Similarity measure configuration for accepted options. It is recommended to use the
pearson_baseline
similarity measure.bsl_options (dict) – A dictionary of options for the baseline estimates computation. See Baselines estimates configuration for accepted options.
verbose (bool) – Whether to print trace messages of bias estimation, similarity, etc. Default is True.