This notebook is used to simulate a small genomic dataset that is analysed in the course GELASMSS2019. The material shown here is based on the JuliaCourse held in december 2018 in Munich.
In a first step, packages Distributions
and Random
must be loaded. After that, we are setting a seed.
using Distributions, Random
Random.seed!(2345);
In a first step we take the code from day2/dataSimulation and modify it to a much smaller example.
using XSim
chrLength = 1.0
numChr = 1
numLoci = 100
mutRate = 0.0
locusInt = chrLength/numLoci
mapPos = collect(0:locusInt:(chrLength-0.0001))
geneFreq = fill(0.5,numLoci)
XSim.build_genome(numChr,chrLength,numLoci,geneFreq,mapPos,mutRate)
### #checking the result
XSim.G
We start by generating a small founder population.
### # specify the number of sires and dams
popSizeFounderSire=2
popSizeFounderDam=3
### # sample the founder population
sires = sampleFounders(popSizeFounderSire)
dams = sampleFounders(popSizeFounderDam)
The function sampleFounders()
uses the information from the initialized genome stored in XSim.G
. The returned results are assigned to sires
and dams
which are cohorts.
We do an intermediate check of the sampled genotypes.
animalFounders = concatCohorts(sires,dams)
MFounders = getOurGenotypes(animalFounders)
The function concatCohorts()
is used to combine two cohorts. The function getOurGenotypes()
extracts the genotypes of all animals in the specified cohort.
The founder cohorts are used to generate a first generation via randomly mating the sires and dams from the founder cohort.
ngen,popSize = 1,5
sires1,dams1,gen1 = sampleRan(popSize, ngen, sires, dams);
We use sires1
and dams1
to produce a second generation
ngen, popSize = 1,11
sires2,dams2,gen2 = sampleRan(popSize, ngen, sires1, dams1);
All sampled animals are combined into a single cohort called animals
.
animals=concatCohorts(animalFounders,sires1,dams1,sires2,dams2);
We are extracting the genotypes of all animals into an array
M = getOurGenotypes(animals);
We fix the number of QTL and initialize an index vector with the same length as the number of loci to all FALSE. Then later a random sample in this vector will be set to TRUE. These positions then represent the QTL
nQTL = 5;
selQTL = fill(false,numLoci);
Using the sample()
function to determine the QTL positions
selQTL[sample(1:numLoci, nQTL, replace=false)] .= true;
All positions that are not QTL are set to be markers
selMrk =.!selQTL;
Genotypes of markers and QTL are separated into two different matrices
Q = M[:,selQTL]
X = M[:,selMrk]
We start by setting a fixed number of significant QTL. In our case, this corrsponds to the number of columns of the matrix Q
. These QTL get an associated effect which is then used to generate the breeding values called a
. Then we add an intercept and a random error term.
nSigQTL = size(Q,2)
nObs = size(Q,1)
alphaSd = 1
alpha = rand(Normal(0,alphaSd),nSigQTL)
a = Q*alpha
# scaling breeding values to have variance 25.0
v = var(a)
genVar = 25.0
a *= sqrt(genVar/v)
va = var(a)
# formatted printing
println("genetic variance = $va")
Computation of mean
and sd
require the package Statistics
.
using Statistics
resVar = 75.0
resStd = sqrt(resVar)
e = rand(Normal(0,resStd),nObs)
intercept = 100
y = intercept .+ a + e
yMean = Statistics.mean(y)
yVar = Statistics.var(y)
println("phenotypic mean = $yMean")
println("phenotypic variance = $yVar")
Generated phenotypic values are assigned to the animals
cohort. Starting with a single element.
animals.animalCohort[1].phenVal = y[1]
animals.animalCohort[1].phenVal
In a loop over the vector y
of phenotypic observations, we assign them to the cohort animals
.
for i in 1:nObs
println("Assigning obs: ", i, " to ", y[i])
animals.animalCohort[i].phenVal = y[i]
end
Checking back whether assignment worked with
P = getOurPhenVals(animals, resVar)
Now that we have generated the data, we must write them to files. The data consist of
Before the data is written, we first delete any old files from previous runs. Otherwise new data gets appended to the old files.
outFile = "data_ex04"
# delete old files first
run(`\rm -f $outFile.ped`)
run(`\rm -f $outFile.phe`)
run(`\rm -f $outFile.brc`)
run(`\rm -f $outFile.gen`)
# write new output
outputPedigree(animals, outFile)
;ipython nbconvert SimulateDataEx04.ipynb