Public Repository

Last pushed: 3 months ago
Short Description
Image for running Phenotype_Prediction_Pipeline
Full Description

KnowEnG's Phenotype Prediction Pipeline

This is the Knowledge Regression for Genomics (KnowEnG), an NIH BD2K Center of Excellence, Phenotype Prediction Pipeline that will be used to infer an 'omic'-drug association.

The user will need to do of the following in order for the system to learn how to predict the phenotype through regression:

  • User must submit an ‘omic’ spreadsheet with samples as columns and genes as rows.
  • User will also need to submit a phenotype value for each sample.

This will allow one to:

  • Identify the best drug for a patient.

Given an omic spreadsheet of a collection of genes as well as the supplied phenotype value, the user will need to choose one of these options:

Options Method Parameters
Elastic Net Elastic elastic_net
Lasso Lasso Lasso

How to run this pipeline with our data

1. Install Docker - follow the instructions in this link


Install Docker Engine

2. Get the current Docker Image. On the command line with internet connection:

docker pull knowengdev/phenotype_prediction_pipeline:08_01_2017

3. Create or change to a directory to hold the output data.


mkdir local_results
and / or
cd local_results

4. Start the docker image connected to the (local_results) directory.

docker run -v pwd:/home/test/run_dir/ -it knowengdev/phenotype_prediction_pipeline:08_01_2017

5. At the docker image command prompt change to the test directory.

.../home# cd test

6. Set up the environment.

.../home/test# make env_setup

7. Use one of the following "make" commands to select and run either of the following:


Command Option
make run_elastic_net Elastic Net
make run_lasso Lasso

How to run this pipeline with Your data.

Perform steps 1-3 as described above and use the local_results directory

Create a custom YAML file, move that and your spreadsheet to the local_results directory

see the git hub directions
Phenotype Prediction ReadMe on github

make sure file named by YAML key "spreadsheet_name:" is in your local_results directory
the path-names inside docker depend on the way you "mount" the run directory in step 4
therefore you may have to change the YAML path's to ../../ instead of ../
make sure your custom YAML file is in the local_results directory

  1. Start docker with the container connected to your (local_results) directory.

    docker run -v local_results:/home/test/run_dir -it knowengdev/phenotype_prediction_pipeline:06_02_2017

  2. Change to the directory mounted in step 4.

    .../home# cd ./test/run_dir
    .../home/test/run_dir# ls

the files in the local_results directory will be visible in this directory

  1. Run Samples Clustering with the options in your custom YAML file.

    python3 ../../src/ -run_directory ./ -run_file your_custom_run_file.yml

Description of "run_parameters" file

Key Value Comments
Elastic Net Method
Lasso Method
results_directory Directory Directory to save the output files

spreadsheet_name = features_train_clean.df</br>
response_name = response_train_clean.df</br>
test_spreadsheet_name = features_test_clean.df

Description of Output files are saved in results directory

Gene Name Prediction
User Gene 1 Float
... ...
User Gene n Float
Docker Pull Command