Problem 1: Fixed Linear Effects Model

We want to analyse a dataset with genetic information using a fixed linear effects model. The dataset is taken from the course notes and is shown in Table @ref(tab:datatable).

We assume that the SNP loci have a purely additive effect on the trait. That means for a SNP locus \(L\) the absolute value of the genotypic value of the homozygous genotypes (\(L_1L_1\) and \(L_2L_2\)) is taken to be \(a_L\) and the genotypic value of the heterozygous genotype (\(L_1L_2\)) is taken to be \(0\). The fixed linear effects model contains the observation in Table @ref(tab:datatable) as the response variable, an intercept and the genotypic values of the the genotypes at the two SNP Loci \(G\) and \(H\) as predictor variables.

For the observation \(y_i\) of animal \(i\), we can specify the model as

\[y_i = \beta_0 + W_i \cdot a + \epsilon_i\]

where \(\beta_0\) is the intercept, \(a\) is the vector of additive SNP-effects, \(W_i\) is a row vector denoting the SNP-Genotypes and \(\epsilon_i\) is the random error term.

The data can be read from https://charlotte-ngs.github.io/gelasmss2021/data/ex03p01_data.csv. The address from where the data can be downloaded is assigned to a variable.

### # specify path to data file depending on online status
s_data_file <- "https://charlotte-ngs.github.io/gelasmss2021/data/ex03p01_data.csv"

It can be read using the following statement

### # read the data into a tibble
tbl_geno_data <- readr::read_csv(file = s_data_file)
Genotypic Data Used for Fitting a Fixed Linear Effect Model
Animal SNP G SNP H Observation
1 1 0 510
2 0 1 528
3 0 1 505
4 1 -1 539
5 1 1 530
6 0 0 489
7 0 -1 486
8 -1 1 485
9 0 -1 478
10 -1 0 479
11 1 0 520
12 1 1 521
13 -1 0 473
14 -1 0 457
15 0 1 497
16 0 0 516
17 1 0 524
18 1 0 502
19 1 -1 508
20 0 0 506

Your Tasks

  • Specify the fixed linear effects model in matrix-vector notation by putting the information from the dataset into the model. Use the same parametrization as shown in the course notes where the intercept \(\beta_0\) and the vector \(a\) are combined into a single parameter vector \(b\). The design matrix that links elements in \(b\) to observations \(y\) is then called \(X\).
  • Use the function Matrix::rankMatrix() from the Matrix package on the matrix \(X\) to find out the rank of the design matrix.
  • Depending on the rank of \(X\) compute an estimate for \(b\), if the rank of the matrix is equal to the number of columns of matrix \(X\), then the same forumla as was used in the regression model can be used
  • Verify your results using the lm() function

Hints

  • Read the data using the function readr::read_csv()

Your Solution

Start your solution by reading the data. Then specify the design matrix \(X\)

# read data with readr::read_csv

# specify the matrix X

# compute the solution for the intercept and the a-values

Problem 2: Genomic Relationship Matrix

From the given dataset that can be obtained from

https://charlotte-ngs.github.io/gelasmss2021/data/ex03p02_data.csv,

compute the genomic relationship matrix \(G\). The dataset is organised such that animals are in rows and SNPs are in columns.

Hints

  • Read the data using the function readr::read_csv()
  • Convert the input data with the function as.matrix() to a matrix
  • Use the function apply(mat, 2, mean) to compute the columnwise mean of matrix mat
  • Use the matrix function to construct the matrix \(P\) from the vector of SNP allele frequencies

Your Solution

Start your solution by reading the data

# read the data using readr::read_csv

# convert intput data to a matrix

# compute the mean across columns

# compute the matrix P

Additional Problem

Write a function in R that accepts a matrix of genotypes and that computes the genomic relationship matrix. Verify your results from Problem 2.

Hints

  • A function in R can be declared using the keyword function
  • A function in R consists of three parts
    1. the name of the function
    2. the function arguments and
    3. the body of the function.

The following figure shows the general structure of an R-function

Your Solution

Start your solution by reading the data

# read the data using readr::read_csv

# convert intput data to a matrix

# compute the mean across columns

# compute the matrix P
LS0tCnRpdGxlOiBBcHBsaWVkIFN0YXRpc3RpY2FsIE1ldGhvZHMgLSBOb3RlYm9vayAzCmF1dGhvcjogUGV0ZXIgdm9uIFJvaHIKZGF0ZTogMjAyMS0wMy0xNQpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCgpgYGB7ciBzZXR1cCwgaW5jbHVkZT1GQUxTRX0Ka25pdHI6Om9wdHNfY2h1bmskc2V0KGVjaG8gPSBUUlVFLCByZXN1bHRzID0gJ2FzaXMnLCBmaWcucG9zID0gJ0gnKQprbml0cjo6a25pdF9ob29rcyRzZXQoaG9va19jb252ZXJ0X29kZyA9IHJtZGhlbHA6Omhvb2tfY29udmVydF9vZGcpCmBgYAoKCiMjIFByb2JsZW0gMTogRml4ZWQgTGluZWFyIEVmZmVjdHMgTW9kZWwgey19CldlIHdhbnQgdG8gYW5hbHlzZSBhIGRhdGFzZXQgd2l0aCBnZW5ldGljIGluZm9ybWF0aW9uIHVzaW5nIGEgZml4ZWQgbGluZWFyIGVmZmVjdHMgbW9kZWwuIFRoZSBkYXRhc2V0IGlzIHRha2VuIGZyb20gdGhlIGNvdXJzZSBub3RlcyBhbmQgaXMgc2hvd24gaW4gVGFibGUgXEByZWYodGFiOmRhdGF0YWJsZSkuIAoKV2UgYXNzdW1lIHRoYXQgdGhlIFNOUCBsb2NpIGhhdmUgYSBwdXJlbHkgYWRkaXRpdmUgZWZmZWN0IG9uIHRoZSB0cmFpdC4gVGhhdCBtZWFucyBmb3IgYSBTTlAgbG9jdXMgJEwkIHRoZSBhYnNvbHV0ZSB2YWx1ZSBvZiB0aGUgZ2Vub3R5cGljIHZhbHVlIG9mIHRoZSBob21venlnb3VzIGdlbm90eXBlcyAoJExfMUxfMSQgYW5kICRMXzJMXzIkKSBpcyB0YWtlbiB0byBiZSAkYV9MJCBhbmQgdGhlIGdlbm90eXBpYyB2YWx1ZSBvZiB0aGUgaGV0ZXJvenlnb3VzIGdlbm90eXBlICgkTF8xTF8yJCkgaXMgdGFrZW4gdG8gYmUgJDAkLiBUaGUgZml4ZWQgbGluZWFyIGVmZmVjdHMgbW9kZWwgY29udGFpbnMgdGhlIG9ic2VydmF0aW9uIGluIFRhYmxlIFxAcmVmKHRhYjpkYXRhdGFibGUpIGFzIHRoZSByZXNwb25zZSB2YXJpYWJsZSwgYW4gaW50ZXJjZXB0IGFuZCB0aGUgZ2Vub3R5cGljIHZhbHVlcyBvZiB0aGUgdGhlIGdlbm90eXBlcyBhdCB0aGUgdHdvIFNOUCBMb2NpICRHJCBhbmQgJEgkIGFzIHByZWRpY3RvciB2YXJpYWJsZXMuCgpGb3IgdGhlIG9ic2VydmF0aW9uICR5X2kkIG9mIGFuaW1hbCAkaSQsIHdlIGNhbiBzcGVjaWZ5IHRoZSBtb2RlbCBhcwoKJCR5X2kgPSBcYmV0YV8wICsgV19pIFxjZG90IGEgKyBcZXBzaWxvbl9pJCQKCndoZXJlICRcYmV0YV8wJCBpcyB0aGUgaW50ZXJjZXB0LCAkYSQgaXMgdGhlIHZlY3RvciBvZiBhZGRpdGl2ZSBTTlAtZWZmZWN0cywgJFdfaSQgaXMgYSByb3cgdmVjdG9yIGRlbm90aW5nIHRoZSBTTlAtR2Vub3R5cGVzIGFuZCAkXGVwc2lsb25faSQgaXMgdGhlIHJhbmRvbSBlcnJvciB0ZXJtLgoKYGBge3IgZGF0YXVybGRlY2xhcmUsIGVjaG89RkFMU0V9CiMjIyAjIHNwZWNpZnkgcGF0aCB0byBkYXRhIGZpbGUgZGVwZW5kaW5nIG9uIG9ubGluZSBzdGF0dXMKc19kYXRhX2ZpbGUgPC0gImh0dHBzOi8vY2hhcmxvdHRlLW5ncy5naXRodWIuaW8vZ2VsYXNtc3MyMDIxL2RhdGEvZXgwM3AwMV9kYXRhLmNzdiIKYGBgCgpUaGUgZGF0YSBjYW4gYmUgcmVhZCBmcm9tIGByIHNfZGF0YV9maWxlYC4gVGhlIGFkZHJlc3MgZnJvbSB3aGVyZSB0aGUgZGF0YSBjYW4gYmUgZG93bmxvYWRlZCBpcyBhc3NpZ25lZCB0byBhIHZhcmlhYmxlLiAKCmBgYHtyIGRhdGF1cmxkZWNsYXJlLCBlY2hvPVRSVUV9CmBgYAoKCkl0IGNhbiBiZSByZWFkIHVzaW5nIHRoZSBmb2xsb3dpbmcgc3RhdGVtZW50CgpgYGB7ciBkYXRhaW5wdXQsIG1lc3NhZ2U9RkFMU0V9CiMjIyAjIHJlYWQgdGhlIGRhdGEgaW50byBhIHRpYmJsZQp0YmxfZ2Vub19kYXRhIDwtIHJlYWRyOjpyZWFkX2NzdihmaWxlID0gc19kYXRhX2ZpbGUpCmBgYAoKYGBge3IgZGF0YXRhYmxlLCBlY2hvPUZBTFNFLCByZXN1bHRzPSdhc2lzJ30Ka25pdHI6OmthYmxlKHRibF9nZW5vX2RhdGEsCiAgICAgICAgICAgICBib29rdGFicyA9IFRSVUUsIAogICAgICAgICAgICAgbG9uZ3RhYmxlID0gVFJVRSwgCiAgICAgICAgICAgICBjYXB0aW9uID0gIkdlbm90eXBpYyBEYXRhIFVzZWQgZm9yIEZpdHRpbmcgYSBGaXhlZCBMaW5lYXIgRWZmZWN0IE1vZGVsIikKYGBgCgojIyMgWW91ciBUYXNrcyB7LX0KKiBTcGVjaWZ5IHRoZSBmaXhlZCBsaW5lYXIgZWZmZWN0cyBtb2RlbCBpbiBtYXRyaXgtdmVjdG9yIG5vdGF0aW9uIGJ5IHB1dHRpbmcgdGhlIGluZm9ybWF0aW9uIGZyb20gdGhlIGRhdGFzZXQgaW50byB0aGUgbW9kZWwuIFVzZSB0aGUgc2FtZSBwYXJhbWV0cml6YXRpb24gYXMgc2hvd24gaW4gdGhlIGNvdXJzZSBub3RlcyB3aGVyZSB0aGUgaW50ZXJjZXB0ICRcYmV0YV8wJCBhbmQgdGhlIHZlY3RvciAkYSQgYXJlIGNvbWJpbmVkIGludG8gYSBzaW5nbGUgcGFyYW1ldGVyIHZlY3RvciAkYiQuIFRoZSBkZXNpZ24gbWF0cml4IHRoYXQgbGlua3MgZWxlbWVudHMgaW4gJGIkIHRvIG9ic2VydmF0aW9ucyAkeSQgaXMgdGhlbiBjYWxsZWQgJFgkLiAKKiBVc2UgdGhlIGZ1bmN0aW9uIGBNYXRyaXg6OnJhbmtNYXRyaXgoKWAgZnJvbSB0aGUgYE1hdHJpeGAgcGFja2FnZSBvbiB0aGUgbWF0cml4ICRYJCB0byBmaW5kIG91dCB0aGUgcmFuayBvZiB0aGUgZGVzaWduIG1hdHJpeC4KKiBEZXBlbmRpbmcgb24gdGhlIHJhbmsgb2YgJFgkIGNvbXB1dGUgYW4gZXN0aW1hdGUgZm9yICRiJCwgaWYgdGhlIHJhbmsgb2YgdGhlIG1hdHJpeCBpcyBlcXVhbCB0byB0aGUgbnVtYmVyIG9mIGNvbHVtbnMgb2YgbWF0cml4ICRYJCwgdGhlbiB0aGUgc2FtZSBmb3J1bWxhIGFzIHdhcyB1c2VkIGluIHRoZSByZWdyZXNzaW9uIG1vZGVsIGNhbiBiZSB1c2VkCiogVmVyaWZ5IHlvdXIgcmVzdWx0cyB1c2luZyB0aGUgYGxtKClgIGZ1bmN0aW9uCgoKIyMjIEhpbnRzIHstfQoqIFJlYWQgdGhlIGRhdGEgdXNpbmcgdGhlIGZ1bmN0aW9uIGByZWFkcjo6cmVhZF9jc3YoKWAKCiMjIyBZb3VyIFNvbHV0aW9uClN0YXJ0IHlvdXIgc29sdXRpb24gYnkgcmVhZGluZyB0aGUgZGF0YS4gVGhlbiBzcGVjaWZ5IHRoZSBkZXNpZ24gbWF0cml4ICRYJAoKYGBge3J9CiMgcmVhZCBkYXRhIHdpdGggcmVhZHI6OnJlYWRfY3N2CgojIHNwZWNpZnkgdGhlIG1hdHJpeCBYCgojIGNvbXB1dGUgdGhlIHNvbHV0aW9uIGZvciB0aGUgaW50ZXJjZXB0IGFuZCB0aGUgYS12YWx1ZXMKCmBgYAoKCgoKIyMgUHJvYmxlbSAyOiBHZW5vbWljIFJlbGF0aW9uc2hpcCBNYXRyaXggey19CmBgYHtyIGRlY2xhcmVkYXRhZXgwM3AwMiwgZWNobz1GQUxTRX0Kc19kYXRhX2V4MDNwMDIgPC0gImh0dHBzOi8vY2hhcmxvdHRlLW5ncy5naXRodWIuaW8vZ2VsYXNtc3MyMDIxL2RhdGEvZXgwM3AwMl9kYXRhLmNzdiIKYGBgCgpGcm9tIHRoZSBnaXZlbiBkYXRhc2V0IHRoYXQgY2FuIGJlIG9idGFpbmVkIGZyb20gCgpgciBzX2RhdGFfZXgwM3AwMmAsIAoKY29tcHV0ZSB0aGUgZ2Vub21pYyByZWxhdGlvbnNoaXAgbWF0cml4ICRHJC4gVGhlIGRhdGFzZXQgaXMgb3JnYW5pc2VkIHN1Y2ggdGhhdCBhbmltYWxzIGFyZSBpbiByb3dzIGFuZCBTTlBzIGFyZSBpbiBjb2x1bW5zLgoKCiMjIyBIaW50cyB7LX0KKiBSZWFkIHRoZSBkYXRhIHVzaW5nIHRoZSBmdW5jdGlvbiBgcmVhZHI6OnJlYWRfY3N2KClgCiogQ29udmVydCB0aGUgaW5wdXQgZGF0YSB3aXRoIHRoZSBmdW5jdGlvbiBgYXMubWF0cml4KClgIHRvIGEgbWF0cml4CiogVXNlIHRoZSBmdW5jdGlvbiBgYXBwbHkobWF0LCAyLCBtZWFuKWAgdG8gY29tcHV0ZSB0aGUgY29sdW1ud2lzZSBtZWFuIG9mIG1hdHJpeCBgbWF0YAoqIFVzZSB0aGUgYG1hdHJpeGAgZnVuY3Rpb24gdG8gY29uc3RydWN0IHRoZSBtYXRyaXggJFAkIGZyb20gdGhlIHZlY3RvciBvZiBTTlAgYWxsZWxlIGZyZXF1ZW5jaWVzCgoKIyMjIFlvdXIgU29sdXRpb24KU3RhcnQgeW91ciBzb2x1dGlvbiBieSByZWFkaW5nIHRoZSBkYXRhCgpgYGB7cn0KIyByZWFkIHRoZSBkYXRhIHVzaW5nIHJlYWRyOjpyZWFkX2NzdgoKIyBjb252ZXJ0IGludHB1dCBkYXRhIHRvIGEgbWF0cml4CgojIGNvbXB1dGUgdGhlIG1lYW4gYWNyb3NzIGNvbHVtbnMKCiMgY29tcHV0ZSB0aGUgbWF0cml4IFAKCmBgYAoKCgoKIyMgQWRkaXRpb25hbCBQcm9ibGVtIHstfQpXcml0ZSBhIGZ1bmN0aW9uIGluIFIgdGhhdCBhY2NlcHRzIGEgbWF0cml4IG9mIGdlbm90eXBlcyBhbmQgdGhhdCBjb21wdXRlcyB0aGUgZ2Vub21pYyByZWxhdGlvbnNoaXAgbWF0cml4LiBWZXJpZnkgeW91ciByZXN1bHRzIGZyb20gUHJvYmxlbSAyLgoKCiMjIyBIaW50cwogKiBBIGZ1bmN0aW9uIGluIFIgY2FuIGJlIGRlY2xhcmVkIHVzaW5nIHRoZSBrZXl3b3JkIGBmdW5jdGlvbmAgCiAqIEEgZnVuY3Rpb24gaW4gUiBjb25zaXN0cyBvZiB0aHJlZSBwYXJ0cwogICAgIDEuIHRoZSBuYW1lIG9mIHRoZSBmdW5jdGlvbgogICAgIDIuIHRoZSBmdW5jdGlvbiBhcmd1bWVudHMgYW5kIAogICAgIDMuIHRoZSBib2R5IG9mIHRoZSBmdW5jdGlvbi4KICAgICAKVGhlIGZvbGxvd2luZyBmaWd1cmUgc2hvd3MgdGhlIGdlbmVyYWwgc3RydWN0dXJlIG9mIGFuIFItZnVuY3Rpb24KCmBgYHtyIHJmdW5jdGlvbnN0cnVjdHVyZSwgZWNobz1GQUxTRSwgaG9va19jb252ZXJ0X29kZz1UUlVFLCBmaWdfcGF0aD0ib2RnIiwgb3V0LndpZHRoPScxMWNtJ30KI3JtZGhlbHA6OnVzZV9vZGdfZ3JhcGhpYyhwc19wYXRoID0gJ29kZy9yZnVuY3Rpb25zdHJ1Y3R1cmUub2RnJykKa25pdHI6OmluY2x1ZGVfZ3JhcGhpY3MocGF0aCA9ICJvZGcvcmZ1bmN0aW9uc3RydWN0dXJlLnBuZyIpCmBgYAoKCiMjIyBZb3VyIFNvbHV0aW9uClN0YXJ0IHlvdXIgc29sdXRpb24gYnkgcmVhZGluZyB0aGUgZGF0YQoKYGBge3J9CiMgcmVhZCB0aGUgZGF0YSB1c2luZyByZWFkcjo6cmVhZF9jc3YKCiMgY29udmVydCBpbnRwdXQgZGF0YSB0byBhIG1hdHJpeAoKIyBjb21wdXRlIHRoZSBtZWFuIGFjcm9zcyBjb2x1bW5zCgojIGNvbXB1dGUgdGhlIG1hdHJpeCBQCgpgYGAKCiAK