# How to build you own prediction algorithm¶

This page describes how to build a custom prediction algorithm using Surprise.

## The basics¶

Want to get your hands dirty? Cool.

Creating your own prediction algorithm is pretty simple: an algorithm is
nothing but a class derived from `AlgoBase`

that has an `estimate`

method. This is the method that is called by the `predict()`

method. It takes
in an **inner** user id, an **inner** item id (see this note), and returns the estimated rating \(\hat{r}_{ui}\):

```
from surprise import AlgoBase
from surprise import Dataset
from surprise import evaluate
class MyOwnAlgorithm(AlgoBase):
def __init__(self):
# Always call base method before doing anything.
AlgoBase.__init__(self)
def estimate(self, u, i):
return 3
data = Dataset.load_builtin('ml-100k')
algo = MyOwnAlgorithm()
evaluate(algo, data)
```

This algorithm is the dumbest we could have thought of: it just predicts a rating of 3, regardless of users and items.

If you want to store additional information about the prediction, you can also return a dictionary with given details:

```
def estimate(self, u, i):
details = {'info1' : 'That was',
'info2' : 'easy stuff :)'}
return 3, details
```

This dictionary will be stored in the `prediction`

as the `details`

field and can be used for later analysis.

## The `train`

method¶

Now, let’s make a slightly cleverer algorithm that predicts the average of all
the ratings of the trainset. As this is a constant value that does not depend
on current user or item, we would rather compute it once and for all. This can
be done by defining the `train`

method:

```
class MyOwnAlgorithm(AlgoBase):
def __init__(self):
# Always call base method before doing anything.
AlgoBase.__init__(self)
def train(self, trainset):
# Here again: call base method before doing anything.
AlgoBase.train(self, trainset)
# Compute the average rating. We might as well use the
# trainset.global_mean attribute ;)
self.the_mean = np.mean([r for (_, _, r) in
self.trainset.all_ratings()])
def estimate(self, u, i):
return self.the_mean
```

The `train`

method is called by the `evaluate`

function at each fold of a cross-validation
process, (but you can also call it yourself).
Before doing anything, you should call the base class `train()`

method.

## The `trainset`

attribute¶

Once the base class `train()`

method has returned,
all the info you need about the current training set (rating values, etc...) is
stored in the `self.trainset`

attribute. This is a `Trainset`

object that has many attributes and methods of
interest for prediction.

To illustrate its usage, let’s make an algorithm that predicts an average between the mean of all ratings, the mean rating of the user and the mean rating for the item:

```
def estimate(self, u, i):
sum_means = self.trainset.global_mean
div = 1
if self.trainset.knows_user(u):
sum_means += np.mean([r for (_, r) in self.trainset.ur[u]])
div += 1
if self.trainset.knows_item(i):
sum_means += np.mean([r for (_, r) in self.trainset.ir[i]])
div += 1
return sum_means / div
```

Note that it would have been a better idea to compute all the user means in the
`train`

method, thus avoiding the same computations multiple times.

## When the prediction is impossible¶

It’s up to your algorithm to decide if it can or cannot yield a prediction. If
the prediction is impossible, then you can raise the
`PredictionImpossible`

exception.
You’ll need to import it first):

```
from surprise import PredictionImpossible
```

This exception will be caught by the `predict()`

method, and the
estimation \(\hat{r}_{ui}\) will be set to the global mean of all ratings
\(\mu\).

## Using similarities and baselines¶

Should your algorithm use a similarity measure or baseline estimates, you’ll
need to accept `bsl_options`

and `sim_options`

as parmeters to the
`__init__`

method, and pass them along to the Base class. See how to use
these parameters in the Using prediction algorithms section.

Methods `compute_baselines()`

and
`compute_similarities()`

can
be called in the `train`

method (or anywhere else).

```
class MyOwnAlgorithm(AlgoBase):
def __init__(self, sim_options={}, bsl_options={}):
AlgoBase.__init__(self, sim_options=sim_options,
bsl_options=bsl_options)
def train(self, trainset):
AlgoBase.train(self, trainset)
# Compute baselines and similarities
self.bu, self.bi = self.compute_baselines()
self.sim = self.compute_similarities()
def estimate(self, u, i):
if not (self.trainset.knows_user(u) and self.trainset.knows_item(i)):
raise PredictionImpossible('User and/or item is unkown.')
# Compute similarities between u and v, where v describes all other
# users that have also rated item i.
neighbors = [(v, self.sim[u, v]) for (v, r) in self.trainset.ir[i]]
# Sort these neighbors by similarity
neighbors = sorted(neighbors, key=lambda x: x[1], reverse=True)
print('The 3 nearest neighbors of user', str(u), 'are:')
for v, sim_uv in neighbors[:3]:
print('user {0:} with sim {1:1.2f}'.format(v, sim_uv))
# ... Aaaaand return the baseline estimate anyway ;)
bsl = self.trainset.global_mean + self.bu[u] + self.bi[i]
return bsl
```

Feel free to explore the prediction_algorithms package source to get an idea of what can be done.