Public Repository

Last pushed: 2 years ago
Short Description
Multi-instance CNN for classification of whole slide tissue images.
Full Description

Multi-Instance Convolutional Neural Network (MI-CNN)
A Software for Whole Slide Tissue Image Classification

Introduction
MI-CNN is an open source software that trains a classifier for automatic Whole Slide Tissue Image (WSI) classification.
You need docker and CUDA (for the GPU version) to run this software with linux command line interface. Matlab is recommended but not required.
MI-CNN shows state-of-the-art performance on glioma and Non-Small Cell Lung Cancer Cancer (NSCLC) subtype classification problem.
MI-CNN consists of a patch-level CNN and an image-level SVM. Details of the algorithm can be found at http://arxiv.org/abs/1504.07947
Please contact le.hou@stonybrook.edu if you have questions.

We highly recommend you to deploy this software on machines with computational GPUs.
To enable GPU computation, you need to install the following dependencies:
Nvidia driver: http://www.nvidia.com/Download/index.aspx
You can use the command nvidia-smi to check your installation.
CUDA: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/

Download an example:
wget http://vision.cs.stonybrook.edu/~lehhou/micnn/mi-cnn.tar.gz
tar -xzvf mi-cnn.tar.gz

You may want to download mi-cnn-test.tar.gz instead of mi-cnn.tar.gz if you want to verify the pipeline very quickly, before actually using the pipeline. The CNN in mi-cnn-test.tar.gz will be trained for much less iterations, making the pipeline much faster.

If you want to use GPU, do the following edits:
set solver_mode: CPU in mi-cnn/patch_cnn/VGG_ILSVRC_16_layers_5X/solver.prototxt and mi-cnn/patch_cnn/VGG_ILSVRC_16_layers_20X/solver.prototxt
Set USE_GPU=0 in mi-cnn/patch_cnn/VGG_ILSVRC_16_layers_5X/m_step.sh and mi-cnn/patch_cnn/VGG_ILSVRC_16_layers_20X/m_step.sh

Verify the docker environment:
./docker_run.sh uname -a

You can also start an interactive bash session to check the environment in the docker container.
./docker_interactive.sh
cat /code/conf/variable.sh
cat /code/scripts/test.sh
matlab -nodisplay -r “ls; exit;” </dev/null
exit

Input and Output Format

Input. All of the following files have examples given in the environment.
mi-cnn/images/
All Whole Slide Tissue Images (WSI) including the training and testing ones. Note that only .svs format is supported.
mi-cnn/conf/train.txt
List of training images with their ground truth class labels.
mi-cnn/conf/test.txt
List of testing images with their ground truth class labels. If you want to classify them, just use 0 as the labels.
mi-cnn/conf/variable.sh
All of the configurations. Please check the given example file. Each variable is explained in the given file. Make sure the values in this file is correct before you run the software.

Output
mi-cnn/log/wsi_svm/prediction.txt
The final predictions are stored in this file. Each line contains the prediction of a WSI. Note that one WSI will generate multiple prediction lines made by SVM with different parameters. The format of lines is:
Ground_Truth_Label Predicted_Label WSI_File_Name SVM_Parameters…
mi-cnn/log/wsi_svm/results.txt
The evaluation results. Note that there are multiple evaluation results made by SVM with different parameters. The format of lines is:
acc[Accuracy] map[Mean Average Precision] Conf[SVM Parameters]...

Execution Example
You can run the software using the initial input data we provided in the environment.
cd mi-cnn
cat conf/train.txt conf/test.txt > data/image_list.txt
./docker_run.sh /code/scripts/patch_extraction_and_augmentation.sh
cd data
bash untar_all_worker_data.sh
cd ..
./docker_run.sh /code/scripts/train_and_validate.sh
cat log/wsi_svm/results.txt

Data Preparation

MI-CNN runs on image patches extracted from WSIs. Therefore the first step is to extract patches. You can run multiple instances of this process on a single or multiple machines then merge the results.

Input:
mi-cnn/images/
mi-cnn/conf/variable.sh
mi-cnn/data/image_list.txt

Output:
mi-cnn/data/worker_data.tar.gz
All of the extracted patches are compressed in this file.
You must run mi-cnn/data/untar_all_worker_data.sh to untar all the patches.
If you have multiple worker_data.tar.gz generated by different machines, you must put all worker_data.tar.gz into one mi-cnn/data/ (remember to rename them so they don’t overwrite), and run mi-cnn/data/untar_all_worker_data.sh to untar and merge them all.
Execution
./docker_run.sh /code/scripts/patch_extraction_and_augmentation.sh
As mentioned above, you can copy the environment to multiple locations to run multiple instances of this process. Remember to merge all of the worker_data.tar.gz.

Model Training
When the data is ready, simply run the following command.
./docker_run.sh /code/scripts/train_and_validate.sh

You can use nohup to run it since it might take days.
nohup ./docker_run.sh /code/scripts/train_and_validate.sh &

Keep checking the logs (mi-cnn/log/). If there are obvious shell command or Matlab script fails, please check your configurations. Please contact le.hou@stonybrook.edu for bugs.

Classifying WSIs of Unknown Types
To classify WSIs of unknown types, you need to do the data preparation step for all of the new WSIs first, then list the new WSIs in the mi-cnn/conf/test.txt (set all ground truth labels to 0), modify mi-cnn/conf/variables.sh if needed, then run the test.sh to classify those images:
./docker_run.sh /code/scripts/test.sh

Also keep an eye on the logs. Final predictions will be found in the log file:
cat log/wsi_svm/prediction.txt

Pipeline Detail
MI-CNN consists of several modules that can be found under /code/ in the docker container.
./docker_interactive.sh
cd /code/
If some of the modules failed during the execution of model training or classifying unknown WSIs, you can rerun the failed modules instead of the whole pipeline. However, you need to be very careful running these modules individually.

Patch Extraction and Data Augmentation
Input.
mi-cnn/images/
mi-cnn/conf/variable.sh
mi-cnn/data/image_list.txt
Output
mi-cnn/data/worker_data.tar.gz
Execution
./docker_run.sh /code/scripts/patch_extraction_and_augmentation.sh

CNN Training
Input.
mi-cnn/data/all_worker_data/
mi-cnn/conf/train.txt
mi-cnn/conf/test.txt
mi-cnn/conf/variable.sh
Output
mi-cnn/patch_cnn/
The trained CNN models are stored under this folder. We use Caffe for CNN implementation.
Execution
./docker_run.sh /code/patch_cnn/ go.sh

SVM Training and Classifying WSIs
Input.
mi-cnn/patch_cnn/
mi-cnn/data/all_worker_data/
mi-cnn/conf/variable.sh
Output
mi-cnn/log/wsi_svm/
Execution
./docker_run.sh /code/wsi_svm/ go.sh

Known issues:
When a WSI is too small (less than 4000 by 4000 pixels), patches in 5X or 20X scale may not be extracted, resulting incorrect results.
Some extracted patches can be empty (do not contain tissue). The empty patch elimination step is not perfect. If the tissue in a WSI is too sparse, some or all of the patches extracted would be discarded since those patches would be considered empty.

Docker Pull Command
Owner
sbubmi