crimac/unet

docker

Hub

Marine Acoustic Classification: Supervised Semantic Segmentation of Echosounder Data using CNNs

Introduction

The main objective of this repository is to classify acoustic backscatter in echosounder data.
The current implementation is adapted to Sandeel surveys.
This repository is developed by the Norwegian Computing Center and the Norwegian Institute of Marine Research as part of the research projects COGMAR and CRIMAC.

This repository contains scripts for:

Preprocessing acoustic backscatter data into a machine learning friendly format.
Supervised training of a convolutional neural network for semantic segmentation on echosounder data.
Making predictions with the trained network.

Prerequisites

The code on this repository was tested using Python 3.8 and the requirements which are listed in the requirements.txt document.
Create the file setpyenv.json in the local root directory:
```
      ### setpyenv.json ###
      ### Replace each "/dir_path/" with appropriate directory path.
      
      {
        "path_to_echograms": "/dir_path/"
        "path_to_zarr_files": "/dir_path/"          
        "path_to_trained_model": "/dir_path/"    
        "path_for_saving_preds_labels": "/dir_path/"
        "path_for_saving_figs": "/dir_path/"
        "path_to_korona_data": "/dir_path/"
        "path_to_korona_transducer_depths": "/dir_path/"
      }
```
- "path_to_echograms": Directory path to echogram folders stored in memmap format (optional if working with zarr files is wished)
- "path_to_zarr_files": Directory path to echogram folders stored in zarr format (optional if working with memmap files is wished)
- "path_to_trained_model": Directory path to the trained model
- "path_for_saving_preds_labels": [Optional] Directory path for saving predictions and labels after training
- "path_for_saving_figs": [Optional] Directory path for saving figures related to the evaluation of the model
- "path_to_korona_data": [Optional] Directory path to Korona predictions (only used when working with memmap files)
- "path_to_korona_transducer_depths": [Optional] Directory path to Korona transducer depths (only used when working with memmap files)

Make predictions with a trained model and save the results

Set the following configuration options in the pipeline_config.yaml file:
- 'data_mode' can be either 'zarr' (if working with zarr files) or 'memm' (if working with memmap files)
- 'unit_frequency' should be set to 'Hz' if 'zarr' mode is selected and to 'kHz' if 'memm' is selected
- 'partition_predict' can be 'selected surveys', 'single survey' or 'all surveys'
- 'selected_surveys' should be a list of the names of the selected surveys. Should not be an empty list if the previous parameter ('partition_predict') is 'selected surveys' OR 'single survey'.
- 'dir_save_preds_labels': should not be None
- 'save_labels': if set to True the labels are assumed to exist (option 'labels_available'=True) and they will also be saved to disk. Otherwise the labels will not be saved.
- 'eval_mode' can be 'all' (Consider all pixels), 'region' (Exclude all pixels not in a neighborhood of a labeled School) or 'fish' (only evaluate the discrimination on species). Note that the saved labels will look different depending on the chosen configuration parameter 'eval_mode'.
- 'resume_writing': if set to True it is assumed that a zarr directory of predictions exists and if new raw files are detected, predictions will be appended to the zarr directory
Run the following program: /pipeline_train_predict/save_predict.py
The program will then make predictions with the trained model and save the predictions (and labels) to disk (possibility to use path_for_saving_preds_labels indicated in setpyenv.json)

Make predictions with a trained model without saving the results

Set the following configuration options in the pipeline_config.yaml file:
- 'data_mode' can be either 'zarr' (if working with zarr files) or 'memm' (if working with memmap files)
- 'unit_frequency' should be set to 'Hz' if 'zarr' mode is selected and to 'kHz' if 'memm' is selected
- 'partition_predict' can be 'selected surveys', 'single survey' or 'all surveys'
- 'selected_surveys' should be a list of the names of the selected surveys. Should not be an empty list if the previous parameter ('partition_predict') is 'selected surveys' OR 'single survey'.
- 'labels_available' can be set to True if the labels are wished to be visualized with the predictions
Run the following program: /pipeline_train_predict/predict.py
The program will then make predictions with the trained model and visualize the output (without saving the predictions), together with a couple of data frequency channels.

Evaluate the quality of the predictions obtained with a trained model

Set the following configuration options in the pipeline_config.yaml file:
- 'data_mode' can be either 'zarr' (if working with zarr files) or 'memm' (if working with memmap files)
- 'partition_predict' can be 'selected surveys', 'single survey' or 'all surveys'.
- 'selected_surveys' should be a list of the names of the selected surveys. Should not be an empty list if the previous parameter ('partition_predict') is 'selected surveys' OR 'single survey'. Should be an empty list if 'partition_predict' is set to 'all surveys'
- 'color_list' should be a list of color strings that will be used for the precision-recall curves. Cannot be empty if 'zarr' format is selected
- 'eval_mode' can be 'all' (Consider all pixels), 'region' (Exclude all pixels not in a neighborhood of a labeled School) or 'fish' (only evaluate the discrimination on species)
Run the following program: /pipeline_train_predict/evaluate.py
The program will then compute and plot evaluation metrics for assessing the quality of the predictions obtained with a trained model. The results will be saved to disk (possibility to use path_for_saving_figs indicated in setpyenv.json).

Train the model

Set the following configuration options:
- 'data_mode' should be 'memm'
- 'unit_frequency' should be set to 'kHz' (as 'memm' is selected)
- [Optional] Change hyper-parameters (lr, lr_reduction, data partition, etc.)
Run the following program: /pipeline_train_predict/train.py
The program will train the model and store the parameters to disk (possibility to use path_to_trained_model indicated in setpyenv.json).

NB: The training procedure is not yet adjusted to the pre-processed 'zarr' data since the format of the labels may continue changing and sampling the data for training depends on this.

Docker Pull Command

docker pull crimac/unet