Lymphocyte Classification Pipeline
A Software for identifying lymphocyte infiltrated areas in histopathology images
- This software package provide a classifier to identifying lymphocyte infiltrated areas and generate resulting heatmaps which can be visualized in camicroscope.
- You need a CUDA-capable GPU and nvidia-docker run this software.
- The classifier is based on Convolutional Neural Network. Details of the algorithm can be found at https://arxiv.org/abs/1704.00406
- Please contact Le Hou firstname.lastname@example.org if you have questions.
- Install nvidia driver (http://www.nvidia.com/Download/index.aspx) and CUDA (https://developer.nvidia.com/cuda-downloads)
- Follow the instructions on web page: https://github.com/NVIDIA/nvidia-docker to install nvidia-docker.
- Download our software environment at http://vision.cs.stonybrook.edu/~lehhou/lym-pipeline.zip and unzip it under your working directory.
The environment contains the following folders:
|conf||contains the configuration file. Please review all configurations and change them accordingly.|
|data||contains training and validation datasets for Convolutional Autoencoder (CAE) and Convolutional Neural Network (CNN) training.|
|svs||contains all whole slide images.|
|log||contains log files.|
|patches||contains extracted patches (from whole slide images) and prediction results.|
|heatmap_jsons||contains generated json files that represents lymphocyte heatmaps.|
|models_training||contains CAE and CNN models during training.|
|models_prediction||contains CAE and CNN models during prediction.|
To run any tools we provided, first you need to start a docker container:
The docker container will be running in background. The rest of this instruction shows how to run the lymphocyte infiltrated area identification pipeline outside of the container (in the working directory on your machine).
Summary of functionalities
This docker image contains tools for lymphocyte infiltrated area identification and generate heatmaps for visualizing the prediction (identification) results. In particular, the pipeline has the following parts:
- Neural network training.
- Extract all patches in WSIs.
- Run trained Convolutional Neural Network (CNN) models on extracted patches.
- Generating and lymphocyte heatmaps and upload them into camicroscope.
Neural network training
Run the following script to train the lymphocyte classification Convolutional Neural Network (CNN), and necrosis segmentation CNN in sequence:
This uses an existing trained Convolutional Autoencoder (CAE) model ./models_training/cae_model.pkl to train the CNN models. If you want to train the CAE model also, please check the Advanced Usage section. Note that we have included a small training set under ./data/ for demonstration purpose only.
Generating heatmaps given whole slide images
If you have a trained CNN model and you want to generate and visualize lymphocyte heatmaps of some WSIs in camicroscope, just put those whole slide images under ./svs/ and run the following script:
This basically runs step 2, 3, 4. We have included trained models under ./models_prediction/. After the command above finished, you should be able to view heatmap results on camicroscope.
Please keep an eye on log files under ./log/. If you have questions, please contact Le Hou email@example.com
In this section, we show how to run separate pipeline parts and train the CAE from scratch.
First, you want to start an interactive bash interface in the docker container with the following command:
All of the source code will be under the following directory in the docker container.
/# cd /home/lym_pipeline/
conf heatmap_gen patch_extraction patches prediction svs training
The rest of this section assumes you are under the directory above under the docker container.
To start the CAE training, just run the following:
This will overwrite the provided CAE model ./models_training/cae_model.pkl. We have included a small training set under ./data/ for demonstration purpose only.
Extract all patches in WSIs
This step extracts patches from WSIs. A trained CNN will take extracted patches as input and generate prediction results as outputs. To extract patches, run the following script:
It starts four threads that breaks WSIs down to png tiles in 20X. It takes whole slide images under ./svs/ as input and outputs tiles under ./patches/.
Apply trained neural network on extracted patches
Run the following script:
It starts four threads that take extracted patches under ./patches/, generated by patch_extraction/start.sh. One thread to predict lymphocytes, one thread to predict necrosis, and two CPU threads to segment tissue apart from background. Prediction results will be stored also under ./patches/.
Not that this requires a trained lymphocyte classification CNN and a trained necrosis segmentation CNN. We provided two existing models: ./prediction/models/cnn_model.pkl, ./prediction/models/cnn_model_mu_sigma_necrosis.pkl.
The provided lymphocyte classification CNN was trained on 23,000 LUAD patches in 20X. The provided necrosis segmentation CNN was trained on around 3,000 LUAD patches in 6.67X. To use your own trained model, copy them from ./training/models/ to ./prediction/models/.
Generating lymphocyte heatmaps for camicroscope
Run the following script:
It takes data and prediction results under ./patches/ as input and produces low-resolution and high-resolution heatmaps in json files. Finally it uploads all json files to camicroscope.