Trainset class¶
- class surprise.Trainset(ur, ir, n_users, n_items, n_ratings, rating_scale, raw2inner_id_users, raw2inner_id_items)[source]¶
A trainset contains all useful data that constitute a training set.
It is used by the
fit()method of every prediction algorithm. You should not try to build such an object on your own but rather use theDataset.folds()method or theDatasetAutoFolds.build_full_trainset()method.Trainsets are different from
Datasets. You can think of aDatasetas the raw data, and Trainsets as higher-level data where useful methods are defined. Also, aDatasetmay be comprised of multiple Trainsets (e.g. when doing cross validation).- ur¶
The users ratings. This is a dictionary containing lists of tuples of the form
(item_inner_id, rating). The keys are user inner ids.- Type:
defaultdictoflist
- ir¶
The items ratings. This is a dictionary containing lists of tuples of the form
(user_inner_id, rating). The keys are item inner ids.- Type:
defaultdictoflist
- n_users¶
Total number of users \(|U|\).
- n_items¶
Total number of items \(|I|\).
- n_ratings¶
Total number of ratings \(|R_{train}|\).
- rating_scale¶
The minimum and maximal rating of the rating scale.
- Type:
tuple
- global_mean¶
The mean of all ratings \(\mu\).
- all_ratings()[source]¶
Generator function to iterate over all ratings.
- Yields:
A tuple
(uid, iid, rating)where ids are inner ids (see this note).
- build_anti_testset(fill=None)[source]¶
Return a list of ratings that can be used as a testset in the
test()method.The ratings are all the ratings that are not in the trainset, i.e. all the ratings \(r_{ui}\) where the user \(u\) is known, the item \(i\) is known, but the rating \(r_{ui}\) is not in the trainset. As \(r_{ui}\) is unknown, it is either replaced by the
fillvalue or assumed to be equal to the mean of all ratingsglobal_mean.- Parameters:
fill (float) – The value to fill unknown ratings. If
Nonethe global mean of all ratingsglobal_meanwill be used.- Returns:
A list of tuples
(uid, iid, fill)where ids are raw ids.
- build_testset()[source]¶
Return a list of ratings that can be used as a testset in the
test()method.The ratings are all the ratings that are in the trainset, i.e. all the ratings returned by the
all_ratings()generator. This is useful in cases where you want to to test your algorithm on the trainset.
- knows_item(iid)[source]¶
Indicate if the item is part of the trainset.
An item is part of the trainset if the item was rated at least once.
- Parameters:
iid (int) – The (inner) item id. See this note.
- Returns:
Trueif item is part of the trainset, elseFalse.
- knows_user(uid)[source]¶
Indicate if the user is part of the trainset.
A user is part of the trainset if the user has at least one rating.
- Parameters:
uid (int) – The (inner) user id. See this note.
- Returns:
Trueif user is part of the trainset, elseFalse.
- to_inner_iid(riid)[source]¶
Convert an item raw id to an inner id.
See this note.
- Parameters:
riid (str) – The item raw id.
- Returns:
The item inner id.
- Return type:
int
- Raises:
ValueError – When item is not part of the trainset.
- to_inner_uid(ruid)[source]¶
Convert a user raw id to an inner id.
See this note.
- Parameters:
ruid (str) – The user raw id.
- Returns:
The user inner id.
- Return type:
int
- Raises:
ValueError – When user is not part of the trainset.