k-NN inspired algorithms

These are algorithms that are directly derived from a basic nearest neighbors approach.

Note

For each of these algorithms, the actual number of neighbors that are aggregated to compute an estimation is necessarily less than or equal to \(k\). First, there might just not exist enough neighbors and second, the sets \(N_i^k(u)\) and \(N_u^k(i)\) only include neighbors for which the similarity measure is positive. It would make no sense to aggregate ratings from users (or items) that are negatively correlated. For a given prediction, the actual number of neighbors can be retrieved in the 'actual_k' field of the details dictionary of the prediction.

You may want to read the User Guide on how to configure the sim_options parameter.

class surprise.prediction_algorithms.knns.KNNBasic(k=40, min_k=1, sim_options={}, verbose=True, **kwargs)[source]

Bases: SymmetricAlgo

A basic collaborative filtering algorithm.

The prediction \(\hat{r}_{ui}\) is set as:

\[\hat{r}_{ui} = \frac{ \sum\limits_{v \in N^k_i(u)} \text{sim}(u, v) \cdot r_{vi}} {\sum\limits_{v \in N^k_i(u)} \text{sim}(u, v)}\]

or

\[\hat{r}_{ui} = \frac{ \sum\limits_{j \in N^k_u(i)} \text{sim}(i, j) \cdot r_{uj}} {\sum\limits_{j \in N^k_u(i)} \text{sim}(i, j)}\]

depending on the user_based field of the sim_options parameter.

Parameters:
  • k (int) – The (max) number of neighbors to take into account for aggregation (see this note). Default is 40.

  • min_k (int) – The minimum number of neighbors to take into account for aggregation. If there are not enough neighbors, the prediction is set to the global mean of all ratings. Default is 1.

  • sim_options (dict) – A dictionary of options for the similarity measure. See Similarity measure configuration for accepted options.

  • verbose (bool) – Whether to print trace messages of bias estimation, similarity, etc. Default is True.

class surprise.prediction_algorithms.knns.KNNWithMeans(k=40, min_k=1, sim_options={}, verbose=True, **kwargs)[source]

Bases: SymmetricAlgo

A basic collaborative filtering algorithm, taking into account the mean ratings of each user.

The prediction \(\hat{r}_{ui}\) is set as:

\[\hat{r}_{ui} = \mu_u + \frac{ \sum\limits_{v \in N^k_i(u)} \text{sim}(u, v) \cdot (r_{vi} - \mu_v)} {\sum\limits_{v \in N^k_i(u)} \text{sim}(u, v)}\]

or

\[\hat{r}_{ui} = \mu_i + \frac{ \sum\limits_{j \in N^k_u(i)} \text{sim}(i, j) \cdot (r_{uj} - \mu_j)} {\sum\limits_{j \in N^k_u(i)} \text{sim}(i, j)}\]

depending on the user_based field of the sim_options parameter.

Parameters:
  • k (int) – The (max) number of neighbors to take into account for aggregation (see this note). Default is 40.

  • min_k (int) – The minimum number of neighbors to take into account for aggregation. If there are not enough neighbors, the neighbor aggregation is set to zero (so the prediction ends up being equivalent to the mean \(\mu_u\) or \(\mu_i\)). Default is 1.

  • sim_options (dict) – A dictionary of options for the similarity measure. See Similarity measure configuration for accepted options.

  • verbose (bool) – Whether to print trace messages of bias estimation, similarity, etc. Default is True.

class surprise.prediction_algorithms.knns.KNNWithZScore(k=40, min_k=1, sim_options={}, verbose=True, **kwargs)[source]

Bases: SymmetricAlgo

A basic collaborative filtering algorithm, taking into account the z-score normalization of each user.

The prediction \(\hat{r}_{ui}\) is set as:

\[\hat{r}_{ui} = \mu_u + \sigma_u \frac{ \sum\limits_{v \in N^k_i(u)} \text{sim}(u, v) \cdot (r_{vi} - \mu_v) / \sigma_v} {\sum\limits_{v \in N^k_i(u)} \text{sim}(u, v)}\]

or

\[\hat{r}_{ui} = \mu_i + \sigma_i \frac{ \sum\limits_{j \in N^k_u(i)} \text{sim}(i, j) \cdot (r_{uj} - \mu_j) / \sigma_j} {\sum\limits_{j \in N^k_u(i)} \text{sim}(i, j)}\]

depending on the user_based field of the sim_options parameter.

If \(\sigma\) is 0, than the overall sigma is used in that case.

Parameters:
  • k (int) – The (max) number of neighbors to take into account for aggregation (see this note). Default is 40.

  • min_k (int) – The minimum number of neighbors to take into account for aggregation. If there are not enough neighbors, the neighbor aggregation is set to zero (so the prediction ends up being equivalent to the mean \(\mu_u\) or \(\mu_i\)). Default is 1.

  • sim_options (dict) – A dictionary of options for the similarity measure. See Similarity measure configuration for accepted options.

  • verbose (bool) – Whether to print trace messages of bias estimation, similarity, etc. Default is True.

class surprise.prediction_algorithms.knns.KNNBaseline(k=40, min_k=1, sim_options={}, bsl_options={}, verbose=True, **kwargs)[source]

Bases: SymmetricAlgo

A basic collaborative filtering algorithm taking into account a baseline rating.

The prediction \(\hat{r}_{ui}\) is set as:

\[\hat{r}_{ui} = b_{ui} + \frac{ \sum\limits_{v \in N^k_i(u)} \text{sim}(u, v) \cdot (r_{vi} - b_{vi})} {\sum\limits_{v \in N^k_i(u)} \text{sim}(u, v)}\]

or

\[\hat{r}_{ui} = b_{ui} + \frac{ \sum\limits_{j \in N^k_u(i)} \text{sim}(i, j) \cdot (r_{uj} - b_{uj})} {\sum\limits_{j \in N^k_u(i)} \text{sim}(i, j)}\]

depending on the user_based field of the sim_options parameter. For the best predictions, use the pearson_baseline similarity measure.

This algorithm corresponds to formula (3), section 2.2 of [Kor10].

Parameters:
  • k (int) – The (max) number of neighbors to take into account for aggregation (see this note). Default is 40.

  • min_k (int) – The minimum number of neighbors to take into account for aggregation. If there are not enough neighbors, the neighbor aggregation is set to zero (so the prediction ends up being equivalent to the baseline). Default is 1.

  • sim_options (dict) – A dictionary of options for the similarity measure. See Similarity measure configuration for accepted options. It is recommended to use the pearson_baseline similarity measure.

  • bsl_options (dict) – A dictionary of options for the baseline estimates computation. See Baselines estimates configuration for accepted options.

  • verbose (bool) – Whether to print trace messages of bias estimation, similarity, etc. Default is True.