Trainset class

class surprise.Trainset(ur, ir, n_users, n_items, n_ratings, rating_scale, raw2inner_id_users, raw2inner_id_items)[source]

A trainset contains all useful data that constitute a training set.

It is used by the fit() method of every prediction algorithm. You should not try to build such an object on your own but rather use the Dataset.folds() method or the DatasetAutoFolds.build_full_trainset() method.

Trainsets are different from Datasets. You can think of a Dataset as the raw data, and Trainsets as higher-level data where useful methods are defined. Also, a Dataset may be comprised of multiple Trainsets (e.g. when doing cross validation).

ur

The users ratings. This is a dictionary containing lists of tuples of the form (item_inner_id, rating). The keys are user inner ids.

Type:

defaultdict of list

ir

The items ratings. This is a dictionary containing lists of tuples of the form (user_inner_id, rating). The keys are item inner ids.

Type:

defaultdict of list

n_users

Total number of users \(|U|\).

n_items

Total number of items \(|I|\).

n_ratings

Total number of ratings \(|R_{train}|\).

rating_scale

The minimum and maximal rating of the rating scale.

Type:

tuple

global_mean

The mean of all ratings \(\mu\).

all_items()[source]

Generator function to iterate over all items.

Yields:

Inner id of items.

all_ratings()[source]

Generator function to iterate over all ratings.

Yields:

A tuple (uid, iid, rating) where ids are inner ids (see this note).

all_users()[source]

Generator function to iterate over all users.

Yields:

Inner id of users.

build_anti_testset(fill=None)[source]

Return a list of ratings that can be used as a testset in the test() method.

The ratings are all the ratings that are not in the trainset, i.e. all the ratings \(r_{ui}\) where the user \(u\) is known, the item \(i\) is known, but the rating \(r_{ui}\) is not in the trainset. As \(r_{ui}\) is unknown, it is either replaced by the fill value or assumed to be equal to the mean of all ratings global_mean.

Parameters:

fill (float) – The value to fill unknown ratings. If None the global mean of all ratings global_mean will be used.

Returns:

A list of tuples (uid, iid, fill) where ids are raw ids.

build_testset()[source]

Return a list of ratings that can be used as a testset in the test() method.

The ratings are all the ratings that are in the trainset, i.e. all the ratings returned by the all_ratings() generator. This is useful in cases where you want to to test your algorithm on the trainset.

knows_item(iid)[source]

Indicate if the item is part of the trainset.

An item is part of the trainset if the item was rated at least once.

Parameters:

iid (int) – The (inner) item id. See this note.

Returns:

True if item is part of the trainset, else False.

knows_user(uid)[source]

Indicate if the user is part of the trainset.

A user is part of the trainset if the user has at least one rating.

Parameters:

uid (int) – The (inner) user id. See this note.

Returns:

True if user is part of the trainset, else False.

to_inner_iid(riid)[source]

Convert an item raw id to an inner id.

See this note.

Parameters:

riid (str) – The item raw id.

Returns:

The item inner id.

Return type:

int

Raises:

ValueError – When item is not part of the trainset.

to_inner_uid(ruid)[source]

Convert a user raw id to an inner id.

See this note.

Parameters:

ruid (str) – The user raw id.

Returns:

The user inner id.

Return type:

int

Raises:

ValueError – When user is not part of the trainset.

to_raw_iid(iiid)[source]

Convert an item inner id to a raw id.

See this note.

Parameters:

iiid (int) – The item inner id.

Returns:

The item raw id.

Return type:

str

Raises:

ValueError – When iiid is not an inner id.

to_raw_uid(iuid)[source]

Convert a user inner id to a raw id.

See this note.

Parameters:

iuid (int) – The user inner id.

Returns:

The user raw id.

Return type:

str

Raises:

ValueError – When iuid is not an inner id.