Trainset class

class surprise.Trainset(ur, ir, n_users, n_items, n_ratings, rating_scale, raw2inner_id_users, raw2inner_id_items)[source]

A trainset contains all useful data that constitute a training set.

It is used by the fit() method of every prediction algorithm. You should not try to build such an object on your own but rather use the Dataset.folds() method or the DatasetAutoFolds.build_full_trainset() method.

Trainsets are different from Datasets. You can think of a Dataset as the raw data, and Trainsets as higher-level data where useful methods are defined. Also, a Dataset may be comprised of multiple Trainsets (e.g. when doing cross validation).

ur

The users ratings. This is a dictionary containing lists of tuples of the form (item_inner_id, rating). The keys are user inner ids.

Type

defaultdict of list

ir

The items ratings. This is a dictionary containing lists of tuples of the form (user_inner_id, rating). The keys are item inner ids.

Type

defaultdict of list

n_users

Total number of users \(|U|\).

n_items

Total number of items \(|I|\).

n_ratings

Total number of ratings \(|R_{train}|\).

rating_scale

The minimum and maximal rating of the rating scale.

Type

tuple

global_mean

The mean of all ratings \(\mu\).

all_items()[source]

Generator function to iterate over all items.

Yields

Inner id of items.

all_ratings()[source]

Generator function to iterate over all ratings.

Yields

A tuple (uid, iid, rating) where ids are inner ids (see this note).

all_users()[source]

Generator function to iterate over all users.

Yields

Inner id of users.

build_anti_testset(fill=None)[source]

Return a list of ratings that can be used as a testset in the test() method.

The ratings are all the ratings that are not in the trainset, i.e. all the ratings \(r_{ui}\) where the user \(u\) is known, the item \(i\) is known, but the rating \(r_{ui}\) is not in the trainset. As \(r_{ui}\) is unknown, it is either replaced by the fill value or assumed to be equal to the mean of all ratings global_mean.

Parameters

fill (float) – The value to fill unknown ratings. If None the global mean of all ratings global_mean will be used.

Returns

A list of tuples (uid, iid, fill) where ids are raw ids.

build_testset()[source]

Return a list of ratings that can be used as a testset in the test() method.

The ratings are all the ratings that are in the trainset, i.e. all the ratings returned by the all_ratings() generator. This is useful in cases where you want to to test your algorithm on the trainset.

knows_item(iid)[source]

Indicate if the item is part of the trainset.

An item is part of the trainset if the item was rated at least once.

Parameters

iid (int) – The (inner) item id. See this note.

Returns

True if item is part of the trainset, else False.

knows_user(uid)[source]

Indicate if the user is part of the trainset.

A user is part of the trainset if the user has at least one rating.

Parameters

uid (int) – The (inner) user id. See this note.

Returns

True if user is part of the trainset, else False.

to_inner_iid(riid)[source]

Convert an item raw id to an inner id.

See this note.

Parameters

riid (str) – The item raw id.

Returns

The item inner id.

Return type

int

Raises

ValueError – When item is not part of the trainset.

to_inner_uid(ruid)[source]

Convert a user raw id to an inner id.

See this note.

Parameters

ruid (str) – The user raw id.

Returns

The user inner id.

Return type

int

Raises

ValueError – When user is not part of the trainset.

to_raw_iid(iiid)[source]

Convert an item inner id to a raw id.

See this note.

Parameters

iiid (int) – The item inner id.

Returns

The item raw id.

Return type

str

Raises

ValueError – When iiid is not an inner id.

to_raw_uid(iuid)[source]

Convert a user inner id to a raw id.

See this note.

Parameters

iuid (int) – The user inner id.

Returns

The user raw id.

Return type

str

Raises

ValueError – When iuid is not an inner id.