Trainset class

class surprise.Trainset(ur, ir, n_users, n_items, n_ratings, rating_scale, raw2inner_id_users, raw2inner_id_items)

A trainset contains all useful data that constitutes a training set.

It is used by the fit() method of every prediction algorithm. You should not try to built such an object on your own but rather use the Dataset.folds() method or the DatasetAutoFolds.build_full_trainset() method.

Trainsets are different from Datasets. You can think of a Datasets as the raw data, and Trainsets as higher-level data where useful methods are defined. Also, a Datasets may be comprised of multiple Trainsets (e.g. when doing cross validation).

ur

defaultdict of list – The users ratings. This is a dictionary containing lists of tuples of the form (item_inner_id, rating). The keys are user inner ids.

ir

defaultdict of list – The items ratings. This is a dictionary containing lists of tuples of the form (user_inner_id, rating). The keys are item inner ids.

n_users

Total number of users \(|U|\).

n_items

Total number of items \(|I|\).

n_ratings

Total number of ratings \(|R_{train}|\).

rating_scale

tuple – The minimum and maximal rating of the rating scale.

global_mean

The mean of all ratings \(\mu\).

all_items()

Generator function to iterate over all items.

Yields:Inner id of items.
all_ratings()

Generator function to iterate over all ratings.

Yields:A tuple (uid, iid, rating) where ids are inner ids (see this note).
all_users()

Generator function to iterate over all users.

Yields:Inner id of users.
build_anti_testset(fill=None)

Return a list of ratings that can be used as a testset in the test() method.

The ratings are all the ratings that are not in the trainset, i.e. all the ratings \(r_{ui}\) where the user \(u\) is known, the item \(i\) is known, but the rating \(r_{ui}\) is not in the trainset. As \(r_{ui}\) is unknown, it is either replaced by the fill value or assumed to be equal to the mean of all ratings global_mean.

Parameters:fill (float) – The value to fill unknown ratings. If None the global mean of all ratings global_mean will be used.
Returns:A list of tuples (uid, iid, fill) where ids are raw ids.
build_testset()

Return a list of ratings that can be used as a testset in the test() method.

The ratings are all the ratings that are in the trainset, i.e. all the ratings returned by the all_ratings() generator. This is useful in cases where you want to to test your algorithm on the trainset.

global_mean

Return the mean of all ratings.

It’s only computed once.

knows_item(iid)

Indicate if the item is part of the trainset.

An item is part of the trainset if the item was rated at least once.

Parameters:iid (int) – The (inner) item id. See this note.
Returns:True if item is part of the trainset, else False.
knows_user(uid)

Indicate if the user is part of the trainset.

A user is part of the trainset if the user has at least one rating.

Parameters:uid (int) – The (inner) user id. See this note.
Returns:True if user is part of the trainset, else False.
to_inner_iid(riid)

Convert an item raw id to an inner id.

See this note.

Parameters:riid (str) – The item raw id.
Returns:The item inner id.
Return type:int
Raises:ValueError – When item is not part of the trainset.
to_inner_uid(ruid)

Convert a user raw id to an inner id.

See this note.

Parameters:ruid (str) – The user raw id.
Returns:The user inner id.
Return type:int
Raises:ValueError – When user is not part of the trainset.
to_raw_iid(iiid)

Convert an item inner id to a raw id.

See this note.

Parameters:iiid (int) – The item inner id.
Returns:The item raw id.
Return type:str
Raises:ValueError – When iiid is not an inner id.
to_raw_uid(iuid)

Convert a user inner id to a raw id.

See this note.

Parameters:iuid (int) – The user inner id.
Returns:The user raw id.
Return type:str
Raises:ValueError – When iuid is not an inner id.