Introduction

Single marker models that detecting one locus at a time are subject to many problems in genome-wide association studies (GWAS) and quantitative trait locus (QTL) mapping, which includes large matrix inversion, over-conservativeness after Bonferroni correction and difficulty in evaluation of total genetic contribution. Such problems can be solved by a multiple locus model which includes all markers in the same model with effects being estimated simultaneously. The sparse Bayesian learning method (SBL), implemented in sbl package, is a multiple locus model that can handle extremely large sample size (>100,000) and outcompetes other multiple locus GWAS methods in terms of detection power and computing time.

sbl package installation

sbl can be downloaded and installed locally. The download link is here.

install.packages('sbl_0.1.0.tar.gz', repos=NULL, type='source')

SBL for QTL mapping and GWAS

The usage of sbl to perform QTL mapping and GWAS is very simple (Note: Please remove the markers without variation before running the program):

  1. Load input data
    • A vector of response variables (observations of the trait).
    • A design matrix for fixed effects, which represents non-genetic factors that contribute to the phenotypic variance. This can be the intercept only.
    • A matrix storing genotype indicators. The original three genotypes (\(a_1a_1\), \(a_1a_2\), \(a_2a_2\)) should be coded to numeric indicator as (-1, 0, 1) or (0, 1, 2).
library('sbl')

# load example data
data(phe)
data(intercept)
data(gen)
  1. Call sblgwas() function to perform regression and detect significant markers.
# A minimal invocation of "sblgwas()" function looks like: 
fit1<-sblgwas(x = intercept, y = phe, z = gen)

# Restuls of markers surrounding the second simulated QTL with non-zero effect in the example data
fit1$blup[c(17:21),]
##        gamma        vg     wald       p_wald
## 17  0.000000 0.0000000  0.00000 1.000000e+00
## 18  2.146942 0.1789244 25.76150 3.863190e-07
## 19 -2.050378 0.1539640 27.30541 1.737251e-07
## 20  0.000000 0.0000000  0.00000 1.000000e+00
## 21  0.000000 0.0000000  0.00000 1.000000e+00

Users can arbitrarily set the value of t between [0, 2] to control the sparseness of model, default is -1.

# Setting t = 0 leads to the most sparse model  
fit2<-sblgwas(x = intercept, y = phe, z = gen, t = 0)
fit2$parm
##   iter        error       s2     beta       df
## 1   22 5.930216e-07 2.827957 19.78626 4.612319
# Setting t = -2 leads to the least sparse model  
fit3<-sblgwas(x = intercept, y = phe, z = gen, t = -2)
fit3$parm
##   iter       error       s2     beta       df
## 1   31 8.46504e-07 1.003592 20.03221 9.707904

max.iter and min.err are two thresholds to stop the program when either of them is met. max.iter defines the maximum number of iterations that the program is allowed to run, default is 200; min.err defines the minimum threshold of mean squared error of random effects estimated from the current and the previous iteration, default is 1e-6.

# Set max.iter and min.err to control the convergence of the program 
fit4<-sblgwas(x = intercept, y = phe, z = gen, t = -1, max.iter = 300, min.err = 1e-8)
fit4$parm
##   iter        error      s2     beta       df
## 1   18 4.675823e-09 2.30688 19.70123 5.332911