Public Repository

Last pushed: 5 months ago
Short Description
Lymphocyte Classification Pipeline in Histopathology Images
Full Description

Lymphocyte Classification Pipeline

A Software for identifying lymphocyte infiltrated areas in histopathology images


Introduction

  1. This software package provide a classifier to identifying lymphocyte infiltrated areas and generate resulting heatmaps which can be visualized in camicroscope.
  2. You need a CUDA-capable GPU and nvidia-docker run this software.
  3. The classifier is based on Convolutional Neural Network. Details of the algorithm can be found at https://arxiv.org/abs/1704.00406
  4. Please contact Le Hou le.hou@stonybrook.edu if you have questions.

Installation

  1. Install nvidia driver (http://www.nvidia.com/Download/index.aspx) and CUDA (https://developer.nvidia.com/cuda-downloads)
  2. Follow the instructions on web page: https://github.com/NVIDIA/nvidia-docker to install nvidia-docker.
  3. Download our software environment at http://vision.cs.stonybrook.edu/~lehhou/lym-pipeline.zip and unzip it under your working directory.

Environment

The environment contains the following folders:

Folders Descriptions
conf contains the configuration file. Please review all configurations and change them accordingly.
data contains training and validation datasets for Convolutional Autoencoder (CAE) and Convolutional Neural Network (CNN) training.
svs contains all whole slide images.
log contains log files.
patches contains extracted patches (from whole slide images) and prediction results.
heatmap_jsons contains generated json files that represents lymphocyte heatmaps.
models_training contains CAE and CNN models during training.
models_prediction contains CAE and CNN models during prediction.

To run any tools we provided, first you need to start a docker container:
bash create_container.sh

The docker container will be running in background. The rest of this instruction shows how to run the lymphocyte infiltrated area identification pipeline outside of the container (in the working directory on your machine).


Summary of functionalities

This docker image contains tools for lymphocyte infiltrated area identification and generate heatmaps for visualizing the prediction (identification) results. In particular, the pipeline has the following parts:

  1. Neural network training.
  2. Extract all patches in WSIs.
  3. Run trained Convolutional Neural Network (CNN) models on extracted patches.
  4. Generating and lymphocyte heatmaps and upload them into camicroscope.

Neural network training

Run the following script to train the lymphocyte classification Convolutional Neural Network (CNN), and necrosis segmentation CNN in sequence:

bash train_models.sh

This uses an existing trained Convolutional Autoencoder (CAE) model ./models_training/cae_model.pkl to train the CNN models. If you want to train the CAE model also, please check the Advanced Usage section. Note that we have included a small training set under ./data/ for demonstration purpose only.


Generating heatmaps given whole slide images

If you have a trained CNN model and you want to generate and visualize lymphocyte heatmaps of some WSIs in camicroscope, just put those whole slide images under ./svs/ and run the following script:

bash svs_2_heatmap.sh

This basically runs step 2, 3, 4. We have included trained models under ./models_prediction/. After the command above finished, you should be able to view heatmap results on camicroscope.
Please keep an eye on log files under ./log/. If you have questions, please contact Le Hou le.hou@stonybrook.edu


Advanced Usage

In this section, we show how to run separate pipeline parts and train the CAE from scratch.
First, you want to start an interactive bash interface in the docker container with the following command:
bash start_interactive_bash.sh

All of the source code will be under the following directory in the docker container.
/# cd /home/lym_pipeline/
/home/lym_pipeline# ls
conf heatmap_gen patch_extraction patches prediction svs training

The rest of this section assumes you are under the directory above under the docker container.


CAE training

To start the CAE training, just run the following:

cd training
bash start_cnn_training.sh

This will overwrite the provided CAE model ./models_training/cae_model.pkl. We have included a small training set under ./data/ for demonstration purpose only.


Extract all patches in WSIs

This step extracts patches from WSIs. A trained CNN will take extracted patches as input and generate prediction results as outputs. To extract patches, run the following script:

cd patch_extraction
bash start.sh

It starts four threads that breaks WSIs down to png tiles in 20X. It takes whole slide images under ./svs/ as input and outputs tiles under ./patches/.


Apply trained neural network on extracted patches

Run the following script:

cd prediction
bash start.sh

It starts four threads that take extracted patches under ./patches/, generated by patch_extraction/start.sh. One thread to predict lymphocytes, one thread to predict necrosis, and two CPU threads to segment tissue apart from background. Prediction results will be stored also under ./patches/.

Not that this requires a trained lymphocyte classification CNN and a trained necrosis segmentation CNN. We provided two existing models: ./prediction/models/cnn_model.pkl, ./prediction/models/cnn_model_mu_sigma_necrosis.pkl.
The provided lymphocyte classification CNN was trained on 23,000 LUAD patches in 20X. The provided necrosis segmentation CNN was trained on around 3,000 LUAD patches in 6.67X. To use your own trained model, copy them from ./training/models/ to ./prediction/models/.


Generating lymphocyte heatmaps for camicroscope

Run the following script:

cd heatmap_gen
bash start.sh

It takes data and prediction results under ./patches/ as input and produces low-resolution and high-resolution heatmaps in json files. Finally it uploads all json files to camicroscope.

Docker Pull Command
Owner
lehou0312