crimac/unet

Sponsored OSS

By crimac

Updated almost 4 years ago

CRIMAC WP4 UNet Classifier

Image
Data Science
Machine Learning & AI

516

Marine Acoustic Classification: Supervised Semantic Segmentation of Echosounder Data using CNNs

Introduction

  • The main objective of this repository is to classify acoustic backscatter in echosounder data.
  • The current implementation is adapted to Sandeel surveys.
  • This repository is developed by the Norwegian Computing Center and the Norwegian Institute of Marine Research as part of the research projects COGMAR and CRIMAC.

This repository contains scripts for:

  • Preprocessing acoustic backscatter data into a machine learning friendly format.
  • Supervised training of a convolutional neural network for semantic segmentation on echosounder data.
  • Making predictions with the trained network.

Prerequisites

  • The code on this repository was tested using Python 3.8 and the requirements which are listed in the requirements.txt document.

  • Create the file setpyenv.json in the local root directory:

          ### setpyenv.json ###
          ### Replace each "/dir_path/" with appropriate directory path.
          
          {
            "path_to_echograms": "/dir_path/"
            "path_to_zarr_files": "/dir_path/"          
            "path_to_trained_model": "/dir_path/"    
            "path_for_saving_preds_labels": "/dir_path/"
            "path_for_saving_figs": "/dir_path/"
            "path_to_korona_data": "/dir_path/"
            "path_to_korona_transducer_depths": "/dir_path/"
          }
    
    • "path_to_echograms": Directory path to echogram folders stored in memmap format (optional if working with zarr files is wished)
    • "path_to_zarr_files": Directory path to echogram folders stored in zarr format (optional if working with memmap files is wished)
    • "path_to_trained_model": Directory path to the trained model
    • "path_for_saving_preds_labels": [Optional] Directory path for saving predictions and labels after training
    • "path_for_saving_figs": [Optional] Directory path for saving figures related to the evaluation of the model
    • "path_to_korona_data": [Optional] Directory path to Korona predictions (only used when working with memmap files)
    • "path_to_korona_transducer_depths": [Optional] Directory path to Korona transducer depths (only used when working with memmap files)

Make predictions with a trained model and save the results

  1. Set the following configuration options in the pipeline_config.yaml file:

    • 'data_mode' can be either 'zarr' (if working with zarr files) or 'memm' (if working with memmap files)
    • 'unit_frequency' should be set to 'Hz' if 'zarr' mode is selected and to 'kHz' if 'memm' is selected
    • 'partition_predict' can be 'selected surveys', 'single survey' or 'all surveys'
    • 'selected_surveys' should be a list of the names of the selected surveys. Should not be an empty list if the previous parameter ('partition_predict') is 'selected surveys' OR 'single survey'.
    • 'dir_save_preds_labels': should not be None
    • 'save_labels': if set to True the labels are assumed to exist (option 'labels_available'=True) and they will also be saved to disk. Otherwise the labels will not be saved.
    • 'eval_mode' can be 'all' (Consider all pixels), 'region' (Exclude all pixels not in a neighborhood of a labeled School) or 'fish' (only evaluate the discrimination on species). Note that the saved labels will look different depending on the chosen configuration parameter 'eval_mode'.
    • 'resume_writing': if set to True it is assumed that a zarr directory of predictions exists and if new raw files are detected, predictions will be appended to the zarr directory
  2. Run the following program: /pipeline_train_predict/save_predict.py

  3. The program will then make predictions with the trained model and save the predictions (and labels) to disk (possibility to use path_for_saving_preds_labels indicated in setpyenv.json)

Make predictions with a trained model without saving the results

  1. Set the following configuration options in the pipeline_config.yaml file:
    • 'data_mode' can be either 'zarr' (if working with zarr files) or 'memm' (if working with memmap files)
    • 'unit_frequency' should be set to 'Hz' if 'zarr' mode is selected and to 'kHz' if 'memm' is selected
    • 'partition_predict' can be 'selected surveys', 'single survey' or 'all surveys'
    • 'selected_surveys' should be a list of the names of the selected surveys. Should not be an empty list if the previous parameter ('partition_predict') is 'selected surveys' OR 'single survey'.
    • 'labels_available' can be set to True if the labels are wished to be visualized with the predictions
  2. Run the following program: /pipeline_train_predict/predict.py
  3. The program will then make predictions with the trained model and visualize the output (without saving the predictions), together with a couple of data frequency channels.

Evaluate the quality of the predictions obtained with a trained model

  1. Set the following configuration options in the pipeline_config.yaml file:

    • 'data_mode' can be either 'zarr' (if working with zarr files) or 'memm' (if working with memmap files)
    • 'partition_predict' can be 'selected surveys', 'single survey' or 'all surveys'.
    • 'selected_surveys' should be a list of the names of the selected surveys. Should not be an empty list if the previous parameter ('partition_predict') is 'selected surveys' OR 'single survey'. Should be an empty list if 'partition_predict' is set to 'all surveys'
    • 'color_list' should be a list of color strings that will be used for the precision-recall curves. Cannot be empty if 'zarr' format is selected
    • 'eval_mode' can be 'all' (Consider all pixels), 'region' (Exclude all pixels not in a neighborhood of a labeled School) or 'fish' (only evaluate the discrimination on species)
  2. Run the following program: /pipeline_train_predict/evaluate.py

  3. The program will then compute and plot evaluation metrics for assessing the quality of the predictions obtained with a trained model. The results will be saved to disk (possibility to use path_for_saving_figs indicated in setpyenv.json).

Train the model

  1. Set the following configuration options:
    • 'data_mode' should be 'memm'
    • 'unit_frequency' should be set to 'kHz' (as 'memm' is selected)
    • [Optional] Change hyper-parameters (lr, lr_reduction, data partition, etc.)
  2. Run the following program: /pipeline_train_predict/train.py
  3. The program will train the model and store the parameters to disk (possibility to use path_to_trained_model indicated in setpyenv.json).

NB: The training procedure is not yet adjusted to the pre-processed 'zarr' data since the format of the labels may continue changing and sampling the data for training depends on this.

Docker Pull Command

docker pull crimac/unet