KnowEnG's Phenotype Prediction Pipeline
This is the Knowledge Regression for Genomics (KnowEnG), an NIH BD2K Center of Excellence, Phenotype Prediction Pipeline that will be used to infer an 'omic'-drug association.
The user will need to do of the following in order for the system to learn how to predict the phenotype through regression:
- User must submit an ‘omic’ spreadsheet with samples as columns and genes as rows.
- User will also need to submit a phenotype value for each sample.
This will allow one to:
- Identify the best drug for a patient.
Given an omic spreadsheet of a collection of genes as well as the supplied phenotype value, the user will need to choose one of these options:
How to run this pipeline with our data
1. Install Docker - follow the instructions in this link
2. Get the current Docker Image. On the command line with internet connection:
docker pull knowengdev/phenotype_prediction_pipeline:08_01_2017
3. Create or change to a directory to hold the output data.
and / or
4. Start the docker image connected to the (local_results) directory.
docker run -v
pwd:/home/test/run_dir/ -it knowengdev/phenotype_prediction_pipeline:08_01_2017
5. At the docker image command prompt change to the test directory.
.../home# cd test
6. Set up the environment.
.../home/test# make env_setup
7. Use one of the following "make" commands to select and run either of the following:
|make run_elastic_net||Elastic Net|
How to run this pipeline with Your data.
Perform steps 1-3 as described above and use the local_results directory
Create a custom YAML file, move that and your spreadsheet to the local_results directory
see the git hub directions
Phenotype Prediction ReadMe on github
make sure file named by YAML key "spreadsheet_name:" is in your local_results directory
the path-names inside docker depend on the way you "mount" the run directory in step 4
therefore you may have to change the YAML path's to ../../ instead of ../
make sure your custom YAML file is in the local_results directory
4. Start docker with the container connected to your (local_results) directory.
docker run -v local_results:/home/test/run_dir -it knowengdev/phenotype_prediction_pipeline:06_02_2017
5. Change to the directory mounted in step 4.
.../home# cd ./test/run_dir
the files in the local_results directory will be visible in this directory
6. Run Samples Clustering with the options in your custom YAML file.
python3 ../../src/phenotype_prediction.py -run_directory ./ -run_file your_custom_run_file.yml
Description of "run_parameters" file
|results_directory||Directory||Directory to save the output files|
spreadsheet_name = features_train_clean.df</br>
response_name = response_train_clean.df</br>
test_spreadsheet_name = features_test_clean.df
Description of Output files are saved in results directory
|User Gene 1||Float|
|User Gene n||Float|