Public Repository

Last pushed: 2 years ago
Short Description
Docker image of fast-lmm version 2.07
Full Description
Mannul: https://pods.iplantcollaborative.org/wiki/download/attachments/8406418/user-manual.pdf?version=1&modificationDate=1341586517000

Usage:

FastLmmC v2.07.20140304 - Factored Spectrally Transformed Linear Mixed Models [Release]
  Copyright Microsoft Corporation -- Licensed Only for Non-Commercial use.
  Compiled Mar  4 2014 at 10:20:47 by erg00lx for Linux v3.x kernel
  using MKL v11.00.04 - Build: 20130517

 ++    Start Processing CommandLine:

Description:
  FastLmmC computes Linear Mixed Model GWAS

Usage:
  FastLmmC -option <parameter>

  -file basefilename
       basename for PLINK's .map and .ped files

  -bfile basefilename
       basename for PLINK's binary .fam, .bim, and .bed files

  -tfile basefilename
       basename for PLINK's transposed .tfam and .tped files

  -dosage1 basefilename
  -dosage2 basefilename
       Dosage supports PLINK dosage file formats 1 and 2.  Use this
       option to define the appropriate format to read and the basename
       for PLINK's dosage .dat, .fam, and optional .map files

  -afile basefilename [** In Development **]
       basename for imputed genotype files .fam, .bim, and .big
       These files are similar format to PLINK binary files above.

  -pheno filename
       name of phenotype file

  -mpheno <int>
       Specifies the index for phenotype in -pheno file to process,
       starting at 1 for the first phenotype column. Cannot be used
       together with -pheno-name.
       Default: 1.

  -pheno-name phenotypeName
       phenotype name for phenotype in -pheno file to process.  If
       this option is used, the phenotype name must be specified in
       the header row.  Cannot be used together with -mpheno.

  -NoZeroMeanTestSNPs
       Do not subtract the mean from SNPs tested during pre-processing
       Default: false

  -NoUnitVarTestSNPs
       Do not divide SNPs tested by the standard deviation during
       pre-processing
       Default: false

  -numjobs <int>
       Partion the SNPS into <int> groups and run FaSTLMM on the
       partition specified by -thisjob
       Requires thisjob to be set too.

  -thisjob <int>
       Specifies which partition of SNPS created by -numjobs to
       process with FaSTLMM
       Requires numjobs to be set too.

  -extract filename
       This is a SNP filter option.  FaSTLMM will only analyze the
       SNPs explicitly listed in the 'filename'

  -extractSim filename
       This is a genetic similarity SNP filter option.  FaSTLMM will
       only use SNPs explicitly listed in the 'filename' for computing
       genetic similarity.

  -extractSimTopK filename <int>
       Similar to -extractSim, this is a genetic similarity SNP filter
       option.  FaSTLMM will only use the first <int> SNPs explicitly
       listed in the 'filename' for computing genetic similarity.

  -fileSim basefilename
       basename for PLINK's .map and .ped files for computing
       genetic similarity

  -bfileSim basefilename
       basename for PLINK's binary .fam, .bim, and .bed files for
       building genetic similarity

  -tfileSim basefilename
       basename for PLINK's transposed .tfam and .tped files for
       building genetic similarity

  -dosage1Sim basefilename
  -dosage2Sim basefilename
       Dosage supports PLINK dosage file formats 1 and 2.  Use this
       option to define the appropriate format to read and the basename
       for PLINK's dosage .dat, .fam, and optional .map files for
       building genetic similarity

  -afilesim basefilename
       basename for imputed genotype files .fam, .bim, and .big used
       to build genetic similarity.
       These files are similar to PLINK binary files above.

  -sim filename
       file containing the genetic similarity matrix in tab-delimited
       ascii double values (overrides -fileSim)

  -simOut filename
       write out genetic similarity matrix to filename in tab-delimited
       ascii double values

  -NoZeroMeanSimSNPs
       Do not subtract the mean from similarity SNPs during pre-processing
       Default: false

  -NoUnitVarSimSNPs
       Do not divide similarity SNPs by the standard deviation during
       pre-processing
       Default: false

  -autoSelect filename
       Preprocess the SNP data to determine the best SNPs to include in
       the analysis. Results are written to two files using names derived
       from 'filename' in the following manner.
       Using 'Experiment1.txt' as the 'filename' will produce
          'Experiment1.xval.txt'   contains the details of the cross
                                   validation selection process
          'Experiment1.snps.txt'   contains the list of SNPs selected which
                                   can be passed to the -extractsim option.

  -autoSelectFolds <int>
       During AutoSelect divide the individuals into <int> number of
       groups during the cross validation process.
       Valid range is: 2 <= <int> <= count_of_individuals
       Default: 10

  -autoSelectSearchValues <int_list>
       When running AutoSelect, specify which values to use to identify
       the region to search.  The <int_list> is either a filename or a
       list of integers separated by a whitespace or comma.  The list
       should be enclosed in double quotes.
       Default: 0,1,2,3,4,5,6,7,8,9,10,20,30,40,50,60,70,80,90,
                100,125,160,200,250,320,400,500,630,800,1000

  -autoSelectCriterionLL
       Directs AutoSelect to use out-of-sample log likelihood for the
       selection criteria. [default]

  -autoSelectCriterionMSE
       Directs AutoSelect to use out-of-sample mean-squared error for the
       selection criteria.

  -autoSelectApplyThinning
       Signal the autoSelect process to apply thinning to the search
       values used in AutoSelect.

  -autoSelectMaxSizeForThin <int>
       [NYI]Specify the largest autoSelectSearchValue where thinning will
       apply.
       Default: 2000

  -autoSelectRegionDistance <dbl>
       Filter SNPs from the same 'region' during the autoselection process.
       The <dbl> defines a 'radius' in cMorgans around the SNP to 'thin'
       other SNPs during the autoSelect computation.
       Default: 0.50

  -covar filename
       optional file containing the covariates

  -missingPhenotype <dbl>
       identifier for missing values.  If the phenotype for an
       individual is missing, then the individual is ignored.
       If a covariate value for an individual is missing, then
       it is mean imputed.

  -out filename
       the name of the output file.
       Default: [basefilename].out.txt

  -verboseOutput
       Write more detailed information to the output file.
       This includes things like logDelta, GeneticVar, etc.

  -ML
       use maximum likelihood parameter learning (default REML)

  -REML
       use restricted Maximum likelihood parameter learning (default REML)

  -simLearnType [FULL/ONCE]
       if set to Once, then delta, the ratio of residual
       to genetic covariance, is optimized only for the null model and
       used for each alternate model. If set to Full (the default), then the ratio
        is re-estimated for each alternative model.

  -simType [RRM/COVARIANCE]
       if set to RRM (the default), then the RRM is used for genetic
       similarity.  If set to COVARIANCE, then the empirical SNP
       covariance matrix is used.

  -useHeritability
       perform optimization of variance components in heritability h^2
       instead of delta  (h^2 = sigg2/(sigg2+sige2))

  -brentStarts <int>
       number of interval boundary points for optimization of delta
       (see Section 2.1 of the Supplemental Information).
       Default: 10.

  -brentMaxIter <int>
       maximum  number of iterations per interval for the
       optimization of delta.  Default: 1e5.

  -brentMinLogVal <dbl>
       lower interval threshold for (log) delta optimization.
       Default: -5.

  -brentMaxLogVal <dbl>
       upper interval treshold for (log) delta optimization.
       Default: 5.

  -brentTol <dbl>
       convergence tolerance of Brent?s method used to optimize delta.
       Default: 1e-6.

  -HardyWeinberg
       use the Hardy-Weinberg estimate for the SNP variance instead
       of empirical estimate. This option is recommended for sample
       sizes that are too small for accurate variance estimation.

  -Beta <dbl_A> <dbl_B>
       Add more detail on Beta here
       Both dbl_A and dbl_B are optional and default to 1.0.  However
       if either dbl_A or dbl_B are specified, both must be entered.

  -runGwasType [RUN/NORUN]
       run GWAS or exit after computing the spectral decomposition of
       the genetic similarity matrix. Use NORUN, to cache the spectral
       decomposition.  This option, in combination with the next, is
       useful for parallelizing the tests of many SNPs. Default: RUN.

  -eigen directoryname
       load the spectral decomposition object from the directoryname.
       When specified, the computations leading to the spectral
       decomposition of the genetic similarity matrix are skipped
       (note that that SNP file specifying the genetic similarities
       must still be given).

  -eigenOut directoryname
       save the spectral decomposition object to the directoryname.
       May be used with -runGwasType option.

  -logDelta <dbl>
       use a pre-specified log(delta) for both null and alternative models

  -linReg
       use a linear regression model instead of a linear mixed model

  -logReg
       use a logistic regression model instead of a linear mixed model

  -groups [filename]
       specifies the filename

  -simGroups [filename]
       file containing the group-by-group similarity matrix

  -SnpPairs
         Compute GWAS using SNP pairs rather than individual SNPs.
         SnpPairs sets the -ML option.  The default behaior computes
         all pairs of SNPs in the input.
         NOTE:  Very compute intensive so use with care.

  -SnpId1 <snpId>
         This option modifies the behavior of SnpPairs to compute
         GWAS using SNP pairs but with Snp1 fixed as 'snpId1'.
         -SnpId1 implies -SnpPairs but is not compatible with
         -BlockSNPs or -Tasks.

  -BlockSNPs <NumberOfBlocks> <Block1> <Block2>
         BlockSNPs is used in conjunction with -SnpPairs flag to partition
         the SNPs into NumberOfBlocks groups and then perform the SNP pair
         analysis by pairing each SNP in Block1 with each SNP in Block2
         Block1 and Block2 are 0 based and 0<=Block1<=Block2<NumberOfBlocks
         must hold.  To cover the entire space, FastLmmC should be invoked
         once with each unique block pair e.g.
           for ( BlockI=0; BlockI<NumberOfBlocks; ++BlockI ) {
             for ( BlockJ=BlockI; BlockJ<NumberOfBlocks; ++BlockJ ) {
               fastlmmc -BlockSNPs NumberOfBlocks BlockI BlockJ
             }
           }

  -Tasks <NumberOfTasks> <ThisTask>
         Tasks reshapes the BlockSNPs option to run more cleanly as a
         parametric sweep.  Enter the number of tasks you want break
         the job up into, the which job to run in this invocation and
         FastLmmC will figure out how to partition it.  There will likely
         be 'empty' files produced by the last few tasks at the number
         of tasks is defined by the forumlay (n*(n+1))/2 where n is the
         number of partitions

  Configuration Options:

  -maxChromosomeValue <int>
       PLINK format specified human chromosome numbers and are thus
       limited to 0-26 (0=missing, 1-23, x=23, y=24, xy=25, mt=26).
       Use -MaxChromosomeValue to specify a different limit for the
       maximum chromosome number FastLmm accepts.

  -maxThreads <int>
       Suggests the number of threads for the math library to use

  -maximizeWorkingSet
       Set program options to use most of available physical ram
       This will penalize other programs running on the system.

  -emitSnpIndexOnly
       Some large outputs can reduce file size by not writing the
       SNP id to the file, but only writing the snp index

  -noDosageRangeCheck 
       Turn off range checking for dosage data and allow any real
       as the dosage value.

  -setOutputPrecision <int>
       Set the report out floating point precision to be <int>
       digits to the right of the decimal point.  3 <= <int> <= 18
       Default: 16

  -pvaluePrintThreshold <dbl>
       Filter output to only report SNPs with p-values less than
       or equal to <dbl>
       Default: 1.0 (no filtering))

  -randomSeed <int>
       Set the seed value for the random number generator.
       Default: 1

  Debug Options:

  -verbose
       optionally print out log-likelihoods of null and alternative
       models to the standard output

  -log
       Log tells the program to write more progress and debug data
       to a series of *.log files.

  -logDir [directory name]
       LogDir implies -log and sets the directory to write the output
       Default: .\Logs

  -Cluster
       Set when running on a cluster to reduce output and flush i/o
       more frequently to give visibility to headless cluster machines.

-------------------------------------------------------------------
A sample command would be:
  fastlmmc -tfile geno_test -tfileSim geno_cov -pheno pheno.txt -covar covariate.txt

  This requires the following input files:

    geno_test.tfam, geno_test.tped
      These use PLINKs transposed file format for the test SNPs

    geno_cov.tfam, geno_cov.tped
      Same format for the SNPs used to compute the RRM

    pheno.txt
      PLINKs alternate phenotype file

    covariate.txt
      The covariate file follows PLINKs covariate file format.
      No header line is allowed.

  This produces geno_test.output.txt in the same location
  as geno_test files.
Docker Pull Command
Owner
octavianus90

Comments (0)