Public | Automated Build

Last pushed: a month ago
Short Description
Scalable multi-GPU approach to non-negative matrix tri-factorization.
Full Description

CROW: Fast Non-Negative Matrix Tri-Factorization

A scalable implementation of non-negative matrix tri-factorization for multi-processor and multi-GPU environments.

Quick Setup

The most convenient way to setup the environment is to use provided docker images. Here is a quick setup guide for Ubuntu-based platforms. For other platforms and detailed setup instructions, please refer to the Online documentation.

Clone git repository

    git clone
    cd crow


Install script will autodetect if you have GPUs available.

    make install

Start docker container

This script checks if there are nvidia devices present, otherwise it falls back to CPU-only version (use -d option to put the process in the background). See Setup guide for details.


Attach to a running container

To get inside the container use crow-exec or crow-ssh.


Check this section if you encounter any problems with connection.

Test your configuration

Once you have the environment up and running, you can use crow-test script to test if everything works correctly. This generates a small random dataset and tries to factorize it. To test factorization using GPU environment, use -g switch.

    crow-test -g

Volumes and data

Docker volumes

Crow docker images makes use of the following external volumes:

  • crow: path to the crow source code (for development)
  • data: path to directory with data, mounted read-only.
  • cache: path to directory, where the application stores intermediate files.
    Note that cache can take several gigabytes, depending on your data. You can
    safely clean this folder, but note that it may take some time to process the data again.
  • results: this is where the factorized data will be stored.

You can modify the volume paths depending on where you store the data on your host system. This can be done by editing docker-compose.yml prior to docker-compose or nvidia-docker-compose call. By default, docker-compose creates the folders in the current directory. Instead of editing docker-compose file you can also use symbolic links or mount to link data, cache or results to a different folder or device.


To download preprocessed benchmark datasets, use the provided script.


This script downloads datasets that have already been preprocessed and converted into coordinate list or npz format:

Data format

The data can be provided in coordinate list format (coo), which is a form of csv file, where each row describes one element in a matrix with row, column, value and header stores matrix dimensions. In header, we define matrix dimensions n,m. After that, each row of the file represents one non-zero value in the matrix. In each row, the first column represents index at first dimension i, second column index of second dimension j and third column represents the value of data matrix X at location X[i,j].

For example, consider this 2D matrix:

    [[1, 0, 0], [5, 5, 0]]

Corresponding data file would look like this:


For convenient conversion between csv, npz and coo formats, crow-conv tool is provided in the docker image. Additional instructions can be found in Data manipulation section.

Factorize your data


Here, we demonstrate how to quickly factorize data on one of the provided datasets. To use different data, just replace the filename. Factorization rank (20 in these cases) is defined with '-k1' option.

For example: to factorize ArrayExpress dataset on 1-CPU, 1-GPU and 4-GPU (2x2 block) configurations, use the following commands:

    crow -k1 20 ArrayExpress.coo
    crow -g -k1 20 ArrayExpress.coo
    crow -g -b 2x2 -k1 20 ArrayExpress.coo

Once the process is finished, you will see average iteration time (seconds) and Frobenius norm error function. Factorized data is stored in results folder.

Note that first run takes longer (up to a few minutes), since the program needs to read large files from disk and convert them to dense numpy or sparse matrices. Subsequent runs on the same data will load faster, because the data is cached. More detailed information on how to reproduce and visualize the results can be found in the Benchmark section. We demonstrate application of NMTF including interpretation of results a co-clustering example.

Command line arguments

The following options can be set:

  • -b: block configuration, for example 2x2.
  • -e: calculate and print error function in each iteration. This can slow down factorization considerably.
  • -g: use this argument to run on GPUs. By default, only CPU cores will be used.
  • -i: maximum number of iterations, default is 10, but you should increase this number to get satisfactory results.
  • -k1: left factorization rank. Defines number of latent vectors of matrix U.
  • -k2: right factorization rank. Defines number of latent vectors of matrix V. By default, value of k1 is used.
  • -o: impose orthogonality in factors U and V. By default non-orthogonal NMTF will be used.
  • -p: parallelization degree, by default number of blocks equals to parallelization degree, but you can use parallelization degree smaller than the number of blocks. Useful to reduce memory requirements in GPU applications.
  • -s: use sparse data structures. Do not use this if the matrix density is larger than 10%.
  • -t: additional stopping criteria. By default, factorization will run for the number of iterations specified by -i. Available arguments are e4, e5, e6, e7. For example, when passing e6 parameter, the factorization stops after error function in two consecutive iterations changes for less than 10^-6.


  • Single argument specifies path to data file. You can also provide basename of data files that exist in data directory.
Docker Pull Command
Source Repository