Reader class

class surprise.reader.Reader(name=None, line_format=u'user item rating', sep=None, rating_scale=(1, 5), skip_lines=0)

The Reader class is used to parse a file containing ratings.

Such a file is assumed to specify only one rating per line, and each line needs to respect the following structure:

user ; item ; rating ; [timestamp]

where the order of the fields and the separator (here ‘;’) may be arbitrarily defined (see below). brackets indicate that the timestamp field is optional.

For each built-in dataset, Surprise also provides predefined readers which are useful if you want to use a custom dataset that has the same format as a built-in one (see the name parameter).

  • name (string, optional) – If specified, a Reader for one of the built-in datasets is returned and any other parameter is ignored. Accepted values are ‘ml-100k’, ‘ml-1m’, and ‘jester’. Default is None.
  • line_format (string) – The fields names, in the order at which they are encountered on a line. Please note that line_format is always space-separated (use the sep parameter). Default is 'user item rating'.
  • sep (char) – the separator between fields. Example : ';'.
  • rating_scale (tuple, optional) –

    The rating scale used for every rating. Default is (1, 5).


    Using the rating_scale parameter in a Reader object is deprecated and will not be supported in future versions. rating_scale should now be specified when creating the dataset, e.g. using load_from_folds, load_from_file, or load_from_df.

  • skip_lines (int, optional) – Number of lines to skip at the beginning of the file. Default is 0.