In single-step genomic evaluations using breeding-value based models, the genomic relationship matrix is required as one part of the mixed model equations. In this notebook, we use the data set from exercise 12 to show how the genomic relationship matrix is computed.

s_data_path <- "https://charlotte-ngs.github.io/lbgfs2021/data/geno_data.csv"
tbl_geno_data <- readr::read_csv(file = s_data_path)
Rows: 10 Columns: 4
── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
dbl (4): Animal, SNP A, SNP B, Observation

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
tbl_geno_data

The genomic relationship matrix can be computed as shown on the slides

How To Compute \(G\)

Step 1: Matrix \(W\)

Here we obtain matrix \(W\) from the data

W <- as.matrix(tbl_geno_data[,c(2,3)])
W
      SNP A SNP B
 [1,]     0     0
 [2,]     1     0
 [3,]     0     1
 [4,]     1     0
 [5,]    -1     0
 [6,]    -1     1
 [7,]     0    -1
 [8,]     1     1
 [9,]     1     0
[10,]     0     0

Step 2: Frequency

To compute te frequency of the positive allele it is better to change the genotype encoding from \(-1\), \(0\), \(1\) to \(0\), \(1\) , \(2\). This is done by adding \(1\) to all entries of \(W\).

W <- W + 1
W
      SNP A SNP B
 [1,]     1     1
 [2,]     2     1
 [3,]     1     2
 [4,]     2     1
 [5,]     0     1
 [6,]     0     2
 [7,]     1     0
 [8,]     2     2
 [9,]     2     1
[10,]     1     1

The frequency of the positive alleles can be computed as the column mean divided by \(2\). Hence

freq <- apply(W, 2, mean) / 2
freq
SNP A SNP B 
  0.6   0.6 

The matrix \(S\) is computed as

S <- matrix(2*freq - 1, nrow = nrow(W), ncol = ncol(W), byrow = TRUE)
S
      [,1] [,2]
 [1,]  0.2  0.2
 [2,]  0.2  0.2
 [3,]  0.2  0.2
 [4,]  0.2  0.2
 [5,]  0.2  0.2
 [6,]  0.2  0.2
 [7,]  0.2  0.2
 [8,]  0.2  0.2
 [9,]  0.2  0.2
[10,]  0.2  0.2

The sum over the crossproducts is computed as

sumpq <- 2 * sum(freq * (1-freq))
sumpq
[1] 0.96

The genomic relationsip matrix is the

U = W - S - 1
U
      SNP A SNP B
 [1,]  -0.2  -0.2
 [2,]   0.8  -0.2
 [3,]  -0.2   0.8
 [4,]   0.8  -0.2
 [5,]  -1.2  -0.2
 [6,]  -1.2   0.8
 [7,]  -0.2  -1.2
 [8,]   0.8   0.8
 [9,]   0.8  -0.2
[10,]  -0.2  -0.2
G <- tcrossprod(U) / sumpq
G
             [,1]        [,2]        [,3]        [,4]        [,5]        [,6]        [,7]       [,8]        [,9]       [,10]
 [1,]  0.08333333 -0.12500000 -0.12500000 -0.12500000  0.29166667  0.08333333  0.29166667 -0.3333333 -0.12500000  0.08333333
 [2,] -0.12500000  0.70833333 -0.33333333  0.70833333 -0.95833333 -1.16666667  0.08333333  0.5000000  0.70833333 -0.12500000
 [3,] -0.12500000 -0.33333333  0.70833333 -0.33333333  0.08333333  0.91666667 -0.95833333  0.5000000 -0.33333333 -0.12500000
 [4,] -0.12500000  0.70833333 -0.33333333  0.70833333 -0.95833333 -1.16666667  0.08333333  0.5000000  0.70833333 -0.12500000
 [5,]  0.29166667 -0.95833333  0.08333333 -0.95833333  1.54166667  1.33333333  0.50000000 -1.1666667 -0.95833333  0.29166667
 [6,]  0.08333333 -1.16666667  0.91666667 -1.16666667  1.33333333  2.16666667 -0.75000000 -0.3333333 -1.16666667  0.08333333
 [7,]  0.29166667  0.08333333 -0.95833333  0.08333333  0.50000000 -0.75000000  1.54166667 -1.1666667  0.08333333  0.29166667
 [8,] -0.33333333  0.50000000  0.50000000  0.50000000 -1.16666667 -0.33333333 -1.16666667  1.3333333  0.50000000 -0.33333333
 [9,] -0.12500000  0.70833333 -0.33333333  0.70833333 -0.95833333 -1.16666667  0.08333333  0.5000000  0.70833333 -0.12500000
[10,]  0.08333333 -0.12500000 -0.12500000 -0.12500000  0.29166667  0.08333333  0.29166667 -0.3333333 -0.12500000  0.08333333

The same computation can also be done using the following function

computeMatGrm <- function(pmatData) {
  matData <- pmatData
  # check the coding, if matData is -1, 0, 1 coded, then add 1 to get to 0, 1, 2 coding
  if (min(matData) < 0) matData <- matData + 1
  # Allele frequencies, column vector of P and sum of frequency products
  freq <- apply(matData, 2, mean) / 2
  P <- 2 * (freq - 0.5)
  sumpq <- sum(freq*(1-freq))
  # Changing the coding from (0,1,2) to (-1,0,1) and subtract matrix P
  Z <- matData - 1 - matrix(P, nrow = nrow(matData), 
                             ncol = ncol(matData), 
                             byrow = TRUE)
  # Z%*%Zt is replaced by tcrossprod(Z)
  return(tcrossprod(Z)/(2*sumpq))
}

Once the function is defined, the matrix \(G\) can be computed as

G <- computeMatGrm(pmatData = as.matrix(tbl_geno_data[,c(2,3)]))
G
             [,1]        [,2]        [,3]        [,4]        [,5]        [,6]        [,7]       [,8]        [,9]       [,10]
 [1,]  0.08333333 -0.12500000 -0.12500000 -0.12500000  0.29166667  0.08333333  0.29166667 -0.3333333 -0.12500000  0.08333333
 [2,] -0.12500000  0.70833333 -0.33333333  0.70833333 -0.95833333 -1.16666667  0.08333333  0.5000000  0.70833333 -0.12500000
 [3,] -0.12500000 -0.33333333  0.70833333 -0.33333333  0.08333333  0.91666667 -0.95833333  0.5000000 -0.33333333 -0.12500000
 [4,] -0.12500000  0.70833333 -0.33333333  0.70833333 -0.95833333 -1.16666667  0.08333333  0.5000000  0.70833333 -0.12500000
 [5,]  0.29166667 -0.95833333  0.08333333 -0.95833333  1.54166667  1.33333333  0.50000000 -1.1666667 -0.95833333  0.29166667
 [6,]  0.08333333 -1.16666667  0.91666667 -1.16666667  1.33333333  2.16666667 -0.75000000 -0.3333333 -1.16666667  0.08333333
 [7,]  0.29166667  0.08333333 -0.95833333  0.08333333  0.50000000 -0.75000000  1.54166667 -1.1666667  0.08333333  0.29166667
 [8,] -0.33333333  0.50000000  0.50000000  0.50000000 -1.16666667 -0.33333333 -1.16666667  1.3333333  0.50000000 -0.33333333
 [9,] -0.12500000  0.70833333 -0.33333333  0.70833333 -0.95833333 -1.16666667  0.08333333  0.5000000  0.70833333 -0.12500000
[10,]  0.08333333 -0.12500000 -0.12500000 -0.12500000  0.29166667  0.08333333  0.29166667 -0.3333333 -0.12500000  0.08333333

In most cases the computed genomic relationship matrix is singular and can therefore not be inverted. The solution to that is to add \(0.1\) times the numeratore relationship matrix \(A\) to the matrix \(G\). If there is no information about the pedigree available as in this dataset, then we can set the numerator relationship matrix equal to the identity matrix.

# the following results in an error
solve(G)
Gstar <- G + 0.1 * diag(1, nrow = nrow(G))
Gstar
             [,1]        [,2]        [,3]        [,4]        [,5]        [,6]        [,7]       [,8]        [,9]       [,10]
 [1,]  0.18333333 -0.12500000 -0.12500000 -0.12500000  0.29166667  0.08333333  0.29166667 -0.3333333 -0.12500000  0.08333333
 [2,] -0.12500000  0.80833333 -0.33333333  0.70833333 -0.95833333 -1.16666667  0.08333333  0.5000000  0.70833333 -0.12500000
 [3,] -0.12500000 -0.33333333  0.80833333 -0.33333333  0.08333333  0.91666667 -0.95833333  0.5000000 -0.33333333 -0.12500000
 [4,] -0.12500000  0.70833333 -0.33333333  0.80833333 -0.95833333 -1.16666667  0.08333333  0.5000000  0.70833333 -0.12500000
 [5,]  0.29166667 -0.95833333  0.08333333 -0.95833333  1.64166667  1.33333333  0.50000000 -1.1666667 -0.95833333  0.29166667
 [6,]  0.08333333 -1.16666667  0.91666667 -1.16666667  1.33333333  2.26666667 -0.75000000 -0.3333333 -1.16666667  0.08333333
 [7,]  0.29166667  0.08333333 -0.95833333  0.08333333  0.50000000 -0.75000000  1.64166667 -1.1666667  0.08333333  0.29166667
 [8,] -0.33333333  0.50000000  0.50000000  0.50000000 -1.16666667 -0.33333333 -1.16666667  1.4333333  0.50000000 -0.33333333
 [9,] -0.12500000  0.70833333 -0.33333333  0.70833333 -0.95833333 -1.16666667  0.08333333  0.5000000  0.80833333 -0.12500000
[10,]  0.08333333 -0.12500000 -0.12500000 -0.12500000  0.29166667  0.08333333  0.29166667 -0.3333333 -0.12500000  0.18333333
solve(Gstar)
              [,1]      [,2]       [,3]      [,4]       [,5]         [,6]      [,7]       [,8]      [,9]        [,10]
 [1,]  9.804866991  0.196971  0.3884280  0.196971 -0.5872370 -0.003675975 -0.778694  0.7805320  0.196971 -0.195133009
 [2,]  0.196970997  8.820012  0.5890750 -1.179988  1.5739300  1.966033991 -0.195133 -0.7878840 -1.179988  0.196970997
 [3,]  0.388428031  0.589075  8.2456409  0.589075  0.1877811 -1.955006065  2.531215 -1.5537121  0.589075  0.388428031
 [4,]  0.196970997 -1.179988  0.5890750  8.820012  1.5739300  1.966033991 -0.195133 -0.7878840 -1.179988  0.196970997
 [5,] -0.587237015  1.573930  0.1877811  1.573930  7.2515960 -1.973385941 -1.362255  2.3489481  1.573930 -0.587237015
 [6,] -0.003675975  1.966034 -1.9550061  1.966034 -1.9733859  6.075283969  1.947654  0.0147039  1.966034 -0.003675975
 [7,] -0.778694049 -0.195133  2.5312152 -0.195133 -1.3622551  1.947654115  5.911397  3.1147762 -0.195133 -0.778694049
 [8,]  0.780532036 -0.787884 -1.5537121 -0.787884  2.3489481  0.014703900  3.114776  6.8778719 -0.787884  0.780532036
 [9,]  0.196970997 -1.179988  0.5890750 -1.179988  1.5739300  1.966033991 -0.195133 -0.7878840  8.820012  0.196970997
[10,] -0.195133009  0.196971  0.3884280  0.196971 -0.5872370 -0.003675975 -0.778694  0.7805320  0.196971  9.804866991
LS0tCnRpdGxlOiAiR2Vub21pYyBSZWxhdGlvbnNoaXAgTWF0cml4IgpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCgpgYGB7ciBzZXR1cCwgaW5jbHVkZT1GQUxTRX0Ka25pdHI6Om9wdHNfY2h1bmskc2V0KGVjaG8gPSBUUlVFLCByZXN1bHRzID0gJ21hcmt1cCcpCmBgYAoKSW4gc2luZ2xlLXN0ZXAgZ2Vub21pYyBldmFsdWF0aW9ucyB1c2luZyBicmVlZGluZy12YWx1ZSBiYXNlZCBtb2RlbHMsIHRoZSBnZW5vbWljIHJlbGF0aW9uc2hpcCBtYXRyaXggaXMgcmVxdWlyZWQgYXMgb25lIHBhcnQgb2YgdGhlIG1peGVkIG1vZGVsIGVxdWF0aW9ucy4gSW4gdGhpcyBub3RlYm9vaywgd2UgdXNlIHRoZSBkYXRhIHNldCBmcm9tIGV4ZXJjaXNlIDEyIHRvIHNob3cgaG93IHRoZSBnZW5vbWljIHJlbGF0aW9uc2hpcCBtYXRyaXggaXMgY29tcHV0ZWQuCgpgYGB7ciwgd2FybmluZz1GQUxTRSwgbWVzc2FnZT1GQUxTRX0Kc19kYXRhX3BhdGggPC0gImh0dHBzOi8vY2hhcmxvdHRlLW5ncy5naXRodWIuaW8vbGJnZnMyMDIxL2RhdGEvZ2Vub19kYXRhLmNzdiIKdGJsX2dlbm9fZGF0YSA8LSByZWFkcjo6cmVhZF9jc3YoZmlsZSA9IHNfZGF0YV9wYXRoKQp0YmxfZ2Vub19kYXRhCmBgYAoKClRoZSBnZW5vbWljIHJlbGF0aW9uc2hpcCBtYXRyaXggY2FuIGJlIGNvbXB1dGVkIGFzIHNob3duIG9uIHRoZSBzbGlkZXMKCiMgSG93IFRvIENvbXB1dGUgJEckCiogUmVhZCBtYXRyaXggJFckIAoqIEZvciBlYWNoIGNvbHVtbiAkaiQgb2YgJFckIGNvbXB1dGUgZnJlcXVlbmN5ICRwX2okCiogQ29tcHV0ZSBtYXRyaXggJFMkIGFuZCAkXHN1bV97aj0xfV5rKDEtMnBfaigxLXBfaikpJCBmcm9tICRwX2okCiogQ29tcHV0ZSAkVSQgZnJvbSAkVyQgYW5kICRTJAoqIENvbXB1dGUgJEckCgoKIyMgU3RlcCAxOiBNYXRyaXggJFckCkhlcmUgd2Ugb2J0YWluIG1hdHJpeCAkVyQgZnJvbSB0aGUgZGF0YQoKYGBge3J9ClcgPC0gYXMubWF0cml4KHRibF9nZW5vX2RhdGFbLGMoMiwzKV0pClcKYGBgCgoKIyMgU3RlcCAyOiBGcmVxdWVuY3kKVG8gY29tcHV0ZSB0ZSBmcmVxdWVuY3kgb2YgdGhlIHBvc2l0aXZlIGFsbGVsZSBpdCBpcyBiZXR0ZXIgdG8gY2hhbmdlIHRoZSBnZW5vdHlwZSBlbmNvZGluZyBmcm9tICQtMSQsICQwJCwgJDEkIHRvICQwJCwgJDEkICwgJDIkLiBUaGlzIGlzIGRvbmUgYnkgYWRkaW5nICQxJCB0byBhbGwgZW50cmllcyBvZiAkVyQuIAoKYGBge3J9ClcgPC0gVyArIDEKVwpgYGAKClRoZSBmcmVxdWVuY3kgb2YgdGhlIHBvc2l0aXZlIGFsbGVsZXMgY2FuIGJlIGNvbXB1dGVkIGFzIHRoZSBjb2x1bW4gbWVhbiBkaXZpZGVkIGJ5ICQyJC4gSGVuY2UKCmBgYHtyfQpmcmVxIDwtIGFwcGx5KFcsIDIsIG1lYW4pIC8gMgpmcmVxCmBgYAoKVGhlIG1hdHJpeCAkUyQgaXMgY29tcHV0ZWQgYXMKCmBgYHtyfQpTIDwtIG1hdHJpeCgyKmZyZXEgLSAxLCBucm93ID0gbnJvdyhXKSwgbmNvbCA9IG5jb2woVyksIGJ5cm93ID0gVFJVRSkKUwpgYGAKClRoZSBzdW0gb3ZlciB0aGUgY3Jvc3Nwcm9kdWN0cyBpcyBjb21wdXRlZCBhcwoKYGBge3J9CnN1bXBxIDwtIDIgKiBzdW0oZnJlcSAqICgxLWZyZXEpKQpzdW1wcQpgYGAKCgpUaGUgZ2Vub21pYyByZWxhdGlvbnNpcCBtYXRyaXggaXMgdGhlCgpgYGB7cn0KVSA9IFcgLSBTIC0gMQpVCmBgYAoKYGBge3J9CkcgPC0gdGNyb3NzcHJvZChVKSAvIHN1bXBxCkcKYGBgCgpUaGUgc2FtZSBjb21wdXRhdGlvbiBjYW4gYWxzbyBiZSBkb25lIHVzaW5nIHRoZSBmb2xsb3dpbmcgZnVuY3Rpb24KCmBgYHtyfQpjb21wdXRlTWF0R3JtIDwtIGZ1bmN0aW9uKHBtYXREYXRhKSB7CiAgbWF0RGF0YSA8LSBwbWF0RGF0YQogICMgY2hlY2sgdGhlIGNvZGluZywgaWYgbWF0RGF0YSBpcyAtMSwgMCwgMSBjb2RlZCwgdGhlbiBhZGQgMSB0byBnZXQgdG8gMCwgMSwgMiBjb2RpbmcKICBpZiAobWluKG1hdERhdGEpIDwgMCkgbWF0RGF0YSA8LSBtYXREYXRhICsgMQogICMgQWxsZWxlIGZyZXF1ZW5jaWVzLCBjb2x1bW4gdmVjdG9yIG9mIFAgYW5kIHN1bSBvZiBmcmVxdWVuY3kgcHJvZHVjdHMKICBmcmVxIDwtIGFwcGx5KG1hdERhdGEsIDIsIG1lYW4pIC8gMgogIFAgPC0gMiAqIChmcmVxIC0gMC41KQogIHN1bXBxIDwtIHN1bShmcmVxKigxLWZyZXEpKQogICMgQ2hhbmdpbmcgdGhlIGNvZGluZyBmcm9tICgwLDEsMikgdG8gKC0xLDAsMSkgYW5kIHN1YnRyYWN0IG1hdHJpeCBQCiAgWiA8LSBtYXREYXRhIC0gMSAtIG1hdHJpeChQLCBucm93ID0gbnJvdyhtYXREYXRhKSwgCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgbmNvbCA9IG5jb2wobWF0RGF0YSksIAogICAgICAgICAgICAgICAgICAgICAgICAgICAgIGJ5cm93ID0gVFJVRSkKICAjIFolKiVadCBpcyByZXBsYWNlZCBieSB0Y3Jvc3Nwcm9kKFopCiAgcmV0dXJuKHRjcm9zc3Byb2QoWikvKDIqc3VtcHEpKQp9CmBgYAoKT25jZSB0aGUgZnVuY3Rpb24gaXMgZGVmaW5lZCwgdGhlIG1hdHJpeCAkRyQgY2FuIGJlIGNvbXB1dGVkIGFzCgpgYGB7cn0KRyA8LSBjb21wdXRlTWF0R3JtKHBtYXREYXRhID0gYXMubWF0cml4KHRibF9nZW5vX2RhdGFbLGMoMiwzKV0pKQpHCmBgYAoKCkluIG1vc3QgY2FzZXMgdGhlIGNvbXB1dGVkIGdlbm9taWMgcmVsYXRpb25zaGlwIG1hdHJpeCBpcyBzaW5ndWxhciBhbmQgY2FuIHRoZXJlZm9yZSBub3QgYmUgaW52ZXJ0ZWQuIFRoZSBzb2x1dGlvbiB0byB0aGF0IGlzIHRvIGFkZCAkMC4xJCB0aW1lcyB0aGUgbnVtZXJhdG9yZSByZWxhdGlvbnNoaXAgbWF0cml4ICRBJCB0byB0aGUgbWF0cml4ICRHJC4gSWYgdGhlcmUgaXMgbm8gaW5mb3JtYXRpb24gYWJvdXQgdGhlIHBlZGlncmVlIGF2YWlsYWJsZSBhcyBpbiB0aGlzIGRhdGFzZXQsIHRoZW4gd2UgY2FuIHNldCB0aGUgbnVtZXJhdG9yIHJlbGF0aW9uc2hpcCBtYXRyaXggZXF1YWwgdG8gdGhlIGlkZW50aXR5IG1hdHJpeC4gCgpgYGB7ciwgZXZhbD1GQUxTRX0KIyB0aGUgZm9sbG93aW5nIHJlc3VsdHMgaW4gYW4gZXJyb3IKc29sdmUoRykKYGBgCgpgYGB7cn0KR3N0YXIgPC0gRyArIDAuMSAqIGRpYWcoMSwgbnJvdyA9IG5yb3coRykpCkdzdGFyCmBgYAoKYGBge3J9CnNvbHZlKEdzdGFyKQpgYGAKCg==