SkillAgentSearch skills...

Surprise

A Python scikit for building and analyzing recommender systems

Install / Use

/learn @NicolasHug/Surprise
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

GitHub version Documentation Status python versions License DOI

logo

Overview

Surprise is a Python scikit for building and analyzing recommender systems that deal with explicit rating data.

Surprise was designed with the following purposes in mind:

The name SurPRISE (roughly :) ) stands for Simple Python RecommendatIon System Engine.

Please note that surprise does not support implicit ratings or content-based information.

Getting started, example

Here is a simple example showing how you can (down)load a dataset, split it for 5-fold cross-validation, and compute the MAE and RMSE of the SVD algorithm.

from surprise import SVD
from surprise import Dataset
from surprise.model_selection import cross_validate

# Load the movielens-100k dataset (download it if needed).
data = Dataset.load_builtin('ml-100k')

# Use the famous SVD algorithm.
algo = SVD()

# Run 5-fold cross-validation and print results.
cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Output:

Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.9367  0.9355  0.9378  0.9377  0.9300  0.9355  0.0029  
MAE (testset)     0.7387  0.7371  0.7393  0.7397  0.7325  0.7375  0.0026  
Fit time          0.62    0.63    0.63    0.65    0.63    0.63    0.01    
Test time         0.11    0.11    0.14    0.14    0.14    0.13    0.02    

Surprise can do much more (e.g, GridSearchCV)! You'll find more usage examples in the documentation .

Benchmarks

Here are the average RMSE, MAE and total execution time of various algorithms (with their default parameters) on a 5-fold cross-validation procedure. The datasets are the Movielens 100k and 1M datasets. The folds are the same for all the algorithms. All experiments are run on a laptop with an intel i5 11th Gen 2.60GHz. The code for generating these tables can be found in the benchmark example.

| Movielens 100k | RMSE | MAE | Time | |:---------------------------------------------------------------------------------------------------------------------------------------|-------:|------:|:--------| | SVD | 0.934 | 0.737 | 0:00:06 | | SVD++ (cache_ratings=False) | 0.919 | 0.721 | 0:01:39 | | SVD++ (cache_ratings=True) | 0.919 | 0.721 | 0:01:22 | | NMF | 0.963 | 0.758 | 0:00:06 | | Slope One | 0.946 | 0.743 | 0:00:09 | | k-NN | 0.98 | 0.774 | 0:00:08 | | Centered k-NN | 0.951 | 0.749 | 0:00:09 | | k-NN Baseline | 0.931 | 0.733 | 0:00:13 | | Co-Clustering | 0.963 | 0.753 | 0:00:06 | | Baseline | 0.944 | 0.748 | 0:00:02 | | Random | 1.518 | 1.219 | 0:00:01 |

| Movielens 1M | RMSE | MAE | Time | |:----------------------------------------------------------------------------------------------------------------------------------------|-------:|------:|:--------| | SVD | 0.873 | 0.686 | 0:01:07 | | SVD++ (cache_ratings=False) | 0.862 | 0.672 | 0:41:06 | | SVD++ (cache_ratings=True) | 0.862 | 0.672 | 0:34:55 | | NMF | 0.916 | 0.723 | 0:01:39 | | Slope One | 0.907 | 0.715 | 0:02:31 | | k-NN | 0.923 | 0.727 | 0:05:27 | | Centered k-NN | 0.929 | 0.738 | 0:05:43 | | k-NN Baseline | 0.895 | 0.706 | 0:05:55 | | Co-Clustering | 0.915 | 0.717 | 0:00:31 | | Baseline | 0.909 | 0.719 | 0:00:19 | | [Random](http://sur

View on GitHub
GitHub Stars6.8k
CategoryEducation
Updated14h ago
Forks1.1k

Languages

Python

Security Score

100/100

Audited on Mar 24, 2026

No findings