Trainset class¶
- class surprise.Trainset(ur, ir, n_users, n_items, n_ratings, rating_scale, raw2inner_id_users, raw2inner_id_items)[source]¶
A trainset contains all useful data that constitute a training set.
It is used by the
fit()
method of every prediction algorithm. You should not try to build such an object on your own but rather use theDataset.folds()
method or theDatasetAutoFolds.build_full_trainset()
method.Trainsets are different from
Datasets
. You can think of aDataset
as the raw data, and Trainsets as higher-level data where useful methods are defined. Also, aDataset
may be comprised of multiple Trainsets (e.g. when doing cross validation).- ur¶
The users ratings. This is a dictionary containing lists of tuples of the form
(item_inner_id, rating)
. The keys are user inner ids.- Type:
defaultdict
oflist
- ir¶
The items ratings. This is a dictionary containing lists of tuples of the form
(user_inner_id, rating)
. The keys are item inner ids.- Type:
defaultdict
oflist
- n_users¶
Total number of users \(|U|\).
- n_items¶
Total number of items \(|I|\).
- n_ratings¶
Total number of ratings \(|R_{train}|\).
- rating_scale¶
The minimum and maximal rating of the rating scale.
- Type:
tuple
- global_mean¶
The mean of all ratings \(\mu\).
- all_ratings()[source]¶
Generator function to iterate over all ratings.
- Yields:
A tuple
(uid, iid, rating)
where ids are inner ids (see this note).
- build_anti_testset(fill=None)[source]¶
Return a list of ratings that can be used as a testset in the
test()
method.The ratings are all the ratings that are not in the trainset, i.e. all the ratings \(r_{ui}\) where the user \(u\) is known, the item \(i\) is known, but the rating \(r_{ui}\) is not in the trainset. As \(r_{ui}\) is unknown, it is either replaced by the
fill
value or assumed to be equal to the mean of all ratingsglobal_mean
.- Parameters:
fill (float) – The value to fill unknown ratings. If
None
the global mean of all ratingsglobal_mean
will be used.- Returns:
A list of tuples
(uid, iid, fill)
where ids are raw ids.
- build_testset()[source]¶
Return a list of ratings that can be used as a testset in the
test()
method.The ratings are all the ratings that are in the trainset, i.e. all the ratings returned by the
all_ratings()
generator. This is useful in cases where you want to to test your algorithm on the trainset.
- knows_item(iid)[source]¶
Indicate if the item is part of the trainset.
An item is part of the trainset if the item was rated at least once.
- Parameters:
iid (int) – The (inner) item id. See this note.
- Returns:
True
if item is part of the trainset, elseFalse
.
- knows_user(uid)[source]¶
Indicate if the user is part of the trainset.
A user is part of the trainset if the user has at least one rating.
- Parameters:
uid (int) – The (inner) user id. See this note.
- Returns:
True
if user is part of the trainset, elseFalse
.
- to_inner_iid(riid)[source]¶
Convert an item raw id to an inner id.
See this note.
- Parameters:
riid (str) – The item raw id.
- Returns:
The item inner id.
- Return type:
int
- Raises:
ValueError – When item is not part of the trainset.
- to_inner_uid(ruid)[source]¶
Convert a user raw id to an inner id.
See this note.
- Parameters:
ruid (str) – The user raw id.
- Returns:
The user inner id.
- Return type:
int
- Raises:
ValueError – When user is not part of the trainset.