This is the notebook developed in class. It shows how genotype frequencies can be determined using R.

We want to read a dataset with genotype and phenotype information.

s_gen_phen_data <- "https://charlotte-ngs.github.io/lbgfs2022/data/p1_mrk_one_locus.csv"
tbl_gen_phen <- readr::read_csv2(file = s_gen_phen_data)
head(tbl_gen_phen)

The summary of the phenotype column is given by

summary(tbl_gen_phen$Phenotype)

By default the column with phenotypes is interpreted as character data. With the statement below they are converted to numbers.

tbl_gen_phen$Phenotype <- as.numeric(tbl_gen_phen$Phenotype)
head(tbl_gen_phen)

Now the summary of the phenotypes yields a more meaningful result.

summary(tbl_gen_phen$Phenotype)

Genotype Frequencies

Using the function table from the base package

table(tbl_gen_phen$Genotype)

Alternative using the package dplyr

library(dplyr)
gb <- group_by(tbl_gen_phen,Genotype) 
sg <- summarize(gb, geno_count = n())
mutate(sg, geno_freq = geno_count / sum(geno_count))

Using pipes %>% makes it shorter. Remember the convention for pipes: g(f(data)) can be written as data %>% f() %>% g()

tbl_gen_phen %>%
  group_by(Genotype) %>%
  summarize(geno_count = n()) %>%
  mutate(geno_freq = geno_count / sum(geno_count))
LS0tCnRpdGxlOiAiUXVhbnRpdGF0aXZlIEdlbmV0aWNzIERhdGEgQW5hbHlzaXMiCmRhdGU6IDIwMjItMDktMzAKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQoKVGhpcyBpcyB0aGUgbm90ZWJvb2sgZGV2ZWxvcGVkIGluIGNsYXNzLiBJdCBzaG93cyBob3cgZ2Vub3R5cGUgZnJlcXVlbmNpZXMgY2FuIGJlIGRldGVybWluZWQgdXNpbmcgUi4KCldlIHdhbnQgdG8gcmVhZCBhIGRhdGFzZXQgd2l0aCBnZW5vdHlwZSBhbmQgcGhlbm90eXBlIGluZm9ybWF0aW9uLiAKCmBgYHtyfQpzX2dlbl9waGVuX2RhdGEgPC0gImh0dHBzOi8vY2hhcmxvdHRlLW5ncy5naXRodWIuaW8vbGJnZnMyMDIyL2RhdGEvcDFfbXJrX29uZV9sb2N1cy5jc3YiCnRibF9nZW5fcGhlbiA8LSByZWFkcjo6cmVhZF9jc3YyKGZpbGUgPSBzX2dlbl9waGVuX2RhdGEpCmhlYWQodGJsX2dlbl9waGVuKQpgYGAKClRoZSBzdW1tYXJ5IG9mIHRoZSBwaGVub3R5cGUgY29sdW1uIGlzIGdpdmVuIGJ5CgpgYGB7cn0Kc3VtbWFyeSh0YmxfZ2VuX3BoZW4kUGhlbm90eXBlKQpgYGAKCkJ5IGRlZmF1bHQgdGhlIGNvbHVtbiB3aXRoIHBoZW5vdHlwZXMgaXMgaW50ZXJwcmV0ZWQgYXMgY2hhcmFjdGVyIGRhdGEuIFdpdGggdGhlIHN0YXRlbWVudCBiZWxvdyB0aGV5IGFyZSBjb252ZXJ0ZWQgdG8gbnVtYmVycy4KCmBgYHtyfQp0YmxfZ2VuX3BoZW4kUGhlbm90eXBlIDwtIGFzLm51bWVyaWModGJsX2dlbl9waGVuJFBoZW5vdHlwZSkKaGVhZCh0YmxfZ2VuX3BoZW4pCmBgYAoKTm93IHRoZSBzdW1tYXJ5IG9mIHRoZSBwaGVub3R5cGVzIHlpZWxkcyBhIG1vcmUgbWVhbmluZ2Z1bCByZXN1bHQuCgpgYGB7cn0Kc3VtbWFyeSh0YmxfZ2VuX3BoZW4kUGhlbm90eXBlKQpgYGAKCgojIyBHZW5vdHlwZSBGcmVxdWVuY2llcwoKVXNpbmcgdGhlIGZ1bmN0aW9uIGB0YWJsZWAgZnJvbSB0aGUgYmFzZSBwYWNrYWdlCgpgYGB7cn0KdGFibGUodGJsX2dlbl9waGVuJEdlbm90eXBlKQpgYGAKCkFsdGVybmF0aXZlIHVzaW5nIHRoZSBwYWNrYWdlIGRwbHlyCgpgYGB7cn0KbGlicmFyeShkcGx5cikKZ2IgPC0gZ3JvdXBfYnkodGJsX2dlbl9waGVuLEdlbm90eXBlKSAKc2cgPC0gc3VtbWFyaXplKGdiLCBnZW5vX2NvdW50ID0gbigpKQptdXRhdGUoc2csIGdlbm9fZnJlcSA9IGdlbm9fY291bnQgLyBzdW0oZ2Vub19jb3VudCkpCmBgYAoKVXNpbmcgcGlwZXMgYCU+JWAgbWFrZXMgaXQgc2hvcnRlci4gUmVtZW1iZXIgdGhlIGNvbnZlbnRpb24gZm9yIHBpcGVzOiBgZyhmKGRhdGEpKWAgY2FuIGJlIHdyaXR0ZW4gYXMgYGRhdGEgJT4lIGYoKSAlPiUgZygpYAoKYGBge3J9CnRibF9nZW5fcGhlbiAlPiUKICBncm91cF9ieShHZW5vdHlwZSkgJT4lCiAgc3VtbWFyaXplKGdlbm9fY291bnQgPSBuKCkpICU+JQogIG11dGF0ZShnZW5vX2ZyZXEgPSBnZW5vX2NvdW50IC8gc3VtKGdlbm9fY291bnQpKQpgYGAKCg==