Collective Matrix Factorization

Implementation of collective matrix factorization, based on "Relational learning via collective matrix factorization", with some enhancements and alternative models for cold-start recommendations as described in "Cold-start recommendations in Collective Matrix Factorization", and adding implicit-feedback variants as described in "Collaborative filtering for implicit feedback datasets".

This is a hybrid collaborative filtering model for recommender systems that takes as input either explicit item ratings or implicit-feedback data, and side information about users and/or items (although it can also fit pure collaborative-filtering and pure content-based models). The overall idea was extended here to also be able to do cold-start recommendations (for users and items that were not in the training data but which have side information available).

Although the package was developed with recommender systems in mind, it can also be used in other domains (e.g. topic modeling, dimensionality reduction, missing value imputation) - just take any mention of users as rows in the main matrix, any mention of items as columns, and use the "explicit" models.

For more information about the implementation here, or if you would like to cite this in your research, see "Cold-start recommendations in Collective Matrix Factorization"

For a similar package with Poisson distributions see ctpfrec.

Written in C with Python and R interfaces. An additional Ruby interface can be found here.

For an introduction to the library and methods, see:

MovieLens Recommender with Side Information (Python).
R vignette (R).

Comparison against other libraries

For the full benchmark, code, and details see benchmarks.

Comparing the classical matrix factorization model for explicit feedback without side information in different software libraries (50 factors, 15 iterations, regularization of 0.05, float64 when supported) on the MovieLens10M dataset:

| Library | Method | Biases | Time (s) | RMSE | Additional | | :---: | :---: | :---: | :---: | :---: | :---: | cmfrec | ALS-CG | Yes | 13.64 | 0.788233 | | cmfrec | ALS-Chol | Yes | 35.35 | 0.782414 | Implicit features | LibMF | SGD | No | 1.79 | 0.785585 | float32 | Spark | ALS-Chol | No | 81 | 0.791316 | Manual center | cornac | SGD | Yes | 13.9 | 0.816548 | | Surprise | SGD | Yes | 178 | 1.060049 | | spotlight | ADAM | No | 12141 | 1.054698 | See details | LensKit | ALS-CD | Static | 26.8 | 0.796050 | Manual thread control | PyRecLab | SGD | Yes | 90 | 0.812566 | Reads from disk | rsparse | ALS-Chol | Yes | 30.13 | 0.786935 | | softImpute | ALS-Chol | Static | 88.93 | 0.810450 | Unscaled lambda | softImpute | ALS-SVD | Static | 195.73 | 0.808293 | Unscaled lambda | Vowpal Wabbit | SGD | Yes | 293 | 1.054546 | See details

Comparing the implicit variant without side information in different software libraries (50 factors, 15 iterations, regularization of 5, float64 when supported) on the LastFM-360K dataset:

| Library | Method | Time (s) | P@10 | MAP | Additional | | :---: | :---: | :---: | :---: | :---: | :---: | cmfrec | ALS-CG | 29.52 | 0.16969 | 0.12135 | | cmfrec | ALS-Chol | 51.28 | 0.1701 | 0.121761 | | implicit | ALS-CG | 29.0 | 0.17007 | 0.120986 | | implicit | ALS-Chol | 98 | 0.17031 | 0.121167 | | LensKit | ALS-CG | 68 | 0.17069 | 0.121846 | | LensKit | ALS-Chol | 84 | 0.16941 | 0.122121 | | cornac | ADAM | 13338 | 0.00889 | 0.006288 | float32 | Spark | ALS-Chol | oom | oom | oom | See details | rsparse | ALS-CG | 39.18 | 0.16998 | 0.121242 | | rsparse | ALS-Chol | 69.75 | 0.16941 | 0.121353 | | LibMF | ALS-CD | 143.67 | 0.14307 | 0.093755 | float32 | qmf | ALS-Chol | 102 | 0.17019 | 0.122017 |

Basic Idea

(See introductory notebook above for more details)

The model consist in predicting the rating (or weighted confidence for implicit-feedback case) that a user would give to an item by performing a low-rank factorization of an interactions matrix X of size users x items (e.g. ratings)

X ~ A * B.T

(where A and B are the fitted model matrices)

But does so using side information about the items (such as movie tags) and/or users (such as their demographic info) by also factorizing the item side info matrix and/or the user side info matrix

U ~ A * C.T,   I ~ B * D.T

Sharing the same item/user-factor matrix used to factorize the ratings, or sharing only some of the latent factors.

This also has the side effect of allowing recommendations for users and items for which there is side information but no ratings, although these predictions might not be as high quality.

Alternatively, can produce factorizations in wich the factor matrices are determined from the attributes directly (e.g. A = U * C), with or without a free offset.

While the method was initially devised for recommender systems, can also be used as a general technique for dimensionality reduction by taking the A matrix as low-dimensional factors, which can be calculated for new data too.

Alternatively, it might also produce good results when used as an imputer for missing values in tabular data. The Python version is scikit-learn compatible and has a separate class aimed at being used for imputation in scikit-learn pipelines. Example here.

Update 2020-03-20

The package has been rewritten in C with Python wrappers. If you've used earlier versions of this package which relied on Tensorflow for the calculations, the optimal hyperparameters will be very different now as it has changed some details of the loss function such as not dividing some terms by the number of entries.

The new version is faster, multi-threaded, and has some new functionality, but if for some reason you still need the old one, it can be found under the git branch "tensorflow".

Highlights

Can fit factorization models with or without user and/or item side information.
Can fit the usual explicit-feedback model as well as the implicit-feedback model with weighted binary entries (see [3]).
For the explicit-feedback model, can automatically add implicit features (created from the same "X" data).
Can be used for cold-start recommendations (when using side information).
Can be compiled for single and double precision (float32 and float64) - the Python package comes with both versions.
Supports user and item biases (these are not just pre-estimated beforehand as in other software).
Can fit models with non-negativity constraints on the factors and/or with L1 regularization.
Provides an API for top-N recommended lists and for calculating latent factors from new data.
Can work with both sparse and dense matrices for each input (e.g. can also be used as a general missing-value imputer for 2D data - example), and can work efficiently with a mix of dense and sparse inputs.
Can produce factorizations for variations of the problem such as sparse inputs with missing-as-zero instead of missing-as-unknown (e.g. when used for dimensionality reduction).
Can use either an alternating least-squares procedure (ALS) or a gradient-based procedure using an L-BFGS optimizer for the explicit-feedback models (the package bundles a modified version of Okazaki's C implementation).
For the ALS option, can use either the exact Cholesky method or the faster conjugate gradient method (see [4]). Can also use coordinate descent methods (when having non-negativity constraints or L1 regularization).
Can produce models with constant regularization or with dynamically-adjusted regularization as in [7].
Provides a content-based model and other models aimed at better cold-start recommendations.
Provides an intercepts-only "most-popular" model for non-personalized recommendations, which can be used as a benchmark as it uses the same hyperparameters as the other models.
Allows variations of the original collective factorization models such as setting some factors to be used only for one factorization, setting different weights for the errors on each matrix, or setting different regularization parameters for each matrix.
Can use sigmoid transformations for binary-distributed columns in the side info data.
Can work with large datasets (supports arrays/matrices larger than INT_MAX).
Supports observation weights (for the explicit-feedback models).

Installation

Python:

Note: requires a C compiler configured for Python. See this guide for instructions.

pip install cmfrec

or if that fails:

Cmfrec

Install / Use

README