Applied Statistical Methods - Solution 8

Author

Peter von Rohr

Published

April 29, 2024

Problem 1: Interactions

Use the following dataset on Breed, Breast.Circumference and Body.Weight and fit a fixed linear effects model with Body.Weight as response and Breed and Breast.Circumference as predictors and include an interaction term between the two predictors. Compute the expected difference in Body.Weight for two animals which differ in Breast.Circumference by \(1cm\) for every Breed.

The dataset is available under

[1] "https://charlotte-ngs.github.io/asmasss2024/data/asm_bw_flem.csv"

Tasks

  • Read the data for fitting the linear model
  • Fitting the linear model
  • Expected difference in body weight for the three breeds:

Angus: The expected difference in body weight (in kg) of one centimeter increase in breast circumference corresponds to the regression coefficient of Breast.Circumference and is

Limousin: Because, for the breed limousin, there is an interaction effect. We have to add the regression coefficient of Breast.Circumference to the interaction effect Breast.Circumference:BreedLimousin. From this we get

Simmental: The same as for limousin, we have for simmental

Problem 2: Simulation

Use the following values for intercept and regression slope for Body.Weight on Breast.Circumference to simulate a dataset of size \(N\). What is the number for \(N\) that has to be chosen such that the regression analysis of the simulated data gives the same result as the true regression slope.

The true values are:

  • Intercept: \(-1070\)
  • Regression slope: \(8.7\)
  • Residual standard error: \(12\)

Hints

  • Start with \(N=10\), simulate a dataset and analyse the data with lm()
  • If the result (rounded to 1 digits after decimal point) is not the same then double the size of the dataset, hence use, \(N=20\)
  • Continue until you get close to the true value.
  • Assume that the random resiudals follow a normal distribution with mean zero and standard devation equal to \(12\)
  • Take breast circumference to be normally distributed with a mean of \(180\) and a standard deviation of \(2.6\)
  • Use a linear regression model with an intercept to model expected body weight based on breast circumference.

Tasks

  • Assign numbers given in problem description into variables
  • Start with \(N=10\) and first generate the matrix \(X\) which consists of a column of all ones and a column of breast circumference values in centimeter taken from the given normal distribution. Whenever, we generate some random numbers it is important to first set the seed with the function set.seed() to which an integer number is passed. This makes sure that when repeating the simulation the same results are generated.
  • Simulate observations of Body.Weight
  • Analyse the simulated data with a regression model
  • Compute absolute value of deviation between regression and simulation
  • Use a loop to iteratively increase the number of observations until the absolute deviation of the estimated slope from the true value becomes smaller than 0.1.