Co-clustering

class surprise.prediction_algorithms.co_clustering.CoClustering(n_cltr_u=3, n_cltr_i=3, n_epochs=20, random_state=None, verbose=False)

Bases: AlgoBase

A collaborative filtering algorithm based on co-clustering.

This is a straightforward implementation of [GM05].

Basically, users and items are assigned some clusters \(C_u\), \(C_i\), and some co-clusters \(C_{ui}\).

The prediction \(\hat{r}_{ui}\) is set as:

\[\hat{r}_{ui} = \overline{C_{ui}} + (\mu_u - \overline{C_u}) + (\mu_i - \overline{C_i}),\]

where \(\overline{C_{ui}}\) is the average rating of co-cluster \(C_{ui}\), \(\overline{C_u}\) is the average rating of \(u\)’s cluster, and \(\overline{C_i}\) is the average rating of \(i\)’s cluster. If the user is unknown, the prediction is \(\hat{r}_{ui} = \mu_i\). If the item is unknown, the prediction is \(\hat{r}_{ui} = \mu_u\). If both the user and the item are unknown, the prediction is \(\hat{r}_{ui} = \mu\).

Clusters are assigned using a straightforward optimization method, much like k-means.

Parameters
  • n_cltr_u (int) – Number of user clusters. Default is 3.

  • n_cltr_i (int) – Number of item clusters. Default is 3.

  • n_epochs (int) – Number of iteration of the optimization loop. Default is 20.

  • random_state (int, RandomState instance from numpy, or None) – Determines the RNG that will be used for initialization. If int, random_state will be used as a seed for a new RNG. This is useful to get the same initialization over multiple calls to fit(). If RandomState instance, this same instance is used as RNG. If None, the current RNG from numpy is used. Default is None.

  • verbose (bool) – If True, the current epoch will be printed. Default is False.