s_data_path <- "https://charlotte-ngs.github.io/pigsciencess2022/data/bw_bc_rand_pred.csv"
tbl_data <- readr::read_csv(file = s_data_path)
Rows: 10 Columns: 4
── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
dbl (4): Animal, Breast Circumference, Body Weight, RandPred
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
tbl_data
pairs(tbl_data)
Full model
lm_bw_bc_rp <- lm(formula = `Body Weight` ~ `Breast Circumference` + RandPred,
data = tbl_data)
summary(lm_bw_bc_rp)
Call:
lm(formula = `Body Weight` ~ `Breast Circumference` + RandPred,
data = tbl_data)
Residuals:
Min 1Q Median 3Q Max
-21.9487 -2.0365 0.3772 6.7539 8.8617
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1464.607 368.944 -3.970 0.005394 **
`Breast Circumference` 8.038 1.408 5.708 0.000729 ***
RandPred 2.845 1.993 1.427 0.196541
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 10.43 on 7 degrees of freedom
Multiple R-squared: 0.8632, Adjusted R-squared: 0.8242
F-statistic: 22.09 on 2 and 7 DF, p-value: 0.0009462
library(dplyr)
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
tbl_ms_vc_olsrr <- tbl_data %>% rename(BW = `Body Weight`, BC = `Breast Circumference`)
lm_ms_vc_olsrr <- lm(BW ~ BC + RandPred, data = tbl_ms_vc_olsrr)
olsrr::ols_step_best_subset(lm_ms_vc_olsrr)
Best Subsets Regression
--------------------------
Model Index Predictors
--------------------------
1 BC
2 BC RandPred
--------------------------
Subsets Regression Summary
----------------------------------------------------------------------------------------------------------------------------------
Adj. Pred
Alternative
MASS::stepAIC(lm_ms_vc_olsrr)
Start: AIC=49.32
BW ~ BC + RandPred
Df Sum of Sq RSS AIC
<none> 760.9 49.320
- RandPred 1 221.5 982.4 49.874
- BC 1 3541.6 4302.5 64.644
Call:
lm(formula = BW ~ BC + RandPred, data = tbl_ms_vc_olsrr)
Coefficients:
(Intercept) BC RandPred
-1464.607 8.038 2.845
aov_bw_bc_rp <- aov(formula = `Body Weight` ~ `Breast Circumference` + RandPred,
data = tbl_data)
(smry_aov_bw_bc_rp <- summary(aov_bw_bc_rp))
Df Sum Sq Mean Sq F value Pr(>F)
`Breast Circumference` 1 4581 4581 42.143 0.000337 ***
RandPred 1 221 221 2.037 0.196541
Residuals 7 761 109
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Sum Sq
)The total sum of squares (\(SSQ_T\)) is computed by summing over the squared deviations of the observed body weights from the mean body weight. This is computed as
n_mean_bw <- mean(tbl_data$`Body Weight`)
ssq_t <- sum((tbl_data$`Body Weight` - n_mean_bw)^2)
ssq_t
[1] 5563.6
This corresponds to the sum of all entries in the column Sum Sq
in the ANOVA-Table shown above.
sum(smry_aov_bw_bc_rp[[1]][, "Sum Sq"])
[1] 5563.6
The sum of squares of the residuals is the sum of the squared deviations of the observed body weights minus the fitted values of the regression model. This corresponds to
ssq_r <- sum((tbl_data$`Body Weight` - fitted(aov_bw_bc_rp))^2)
ssq_r
[1] 760.9409
From the ANOVA table we get
smry_aov_bw_bc_rp[[1]]["Residuals", "Sum Sq"]
[1] 760.9409
The sum of squares that can be attributed to the regression model is computed by the sum of the squared deviations between the fitted values minus the mean body weight.
ssq_m <- sum((fitted(aov_bw_bc_rp) - n_mean_bw)^2)
ssq_m
[1] 4802.659
This corresponds to the sum of the values in column Sum Sq
for Breast Circumference
and RandPred
.
smry_aov_bw_bc_rp[[1]]["`Breast Circumference`", "Sum Sq"] + smry_aov_bw_bc_rp[[1]]["RandPred", "Sum Sq"]
[1] 4802.659
The sum of squares for the predictor Breast Circumference
is computed as the regression sum of squares in a model where only Breast Circumference
is used as the only predictor.
aov_bw_bc <- aov(formula = `Body Weight` ~ `Breast Circumference`, data = tbl_data)
smry_aov_bw_bc <- summary(aov_bw_bc)
ssq_b <- sum((fitted(aov_bw_bc) - n_mean_bw)^2)
ssq_b
[1] 4581.203
From the ANOVA table of the full model
smry_aov_bw_bc_rp[[1]]["`Breast Circumference`", "Sum Sq"]
[1] 4581.203
The sum of squares for the predictor Rand Pred
correspond to the difference between the regression sum of squares minus the sum of squares computed for Breast Circumference
alone
ssq_m - ssq_b
[1] 221.4565
From the ANOVA table, we get
smry_aov_bw_bc_rp[[1]]["RandPred", "Sum Sq"]
[1] 221.4565
The computation of the sum of squares for the single predictor indicate that the order of the predictor matters. The order in which the sum of squares of the single predictors are computed is determined by the order of the predictor in the formula specified in the call to aov()
. Hence the following call results in a different ANOVA table
summary(aov(formula = `Body Weight` ~ RandPred + `Breast Circumference`, data = tbl_data))
Df Sum Sq Mean Sq F value Pr(>F)
RandPred 1 1261 1261 11.60 0.011349 *
`Breast Circumference` 1 3542 3542 32.58 0.000729 ***
Residuals 7 761 109
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Comparing the second version of the ANOVA table to the original one, shows that it is better to order the predictors according to their value in the Sum Sq
column.
A column with the ID of the sire is added.
tbl_ms_vc_sire <- tbl_data
tbl_ms_vc_sire$Sire <- c(rep(1,3), rep(2,3), rep(3,4))
tbl_ms_vc_sire
NA
tbl_ms_vc_sire$Sire <- as.factor(tbl_ms_vc_sire$Sire)
tbl_ms_vc_sire
NA
To have a balanced dataset, we only use the first nine observations.
tbl_ms_vc_sire <- tbl_ms_vc_sire[1:9,]
tbl_ms_vc_sire
Determine sire variance component using aov()
.
aov_ms_vc_sire <- aov(`Body Weight` ~ Sire, data = tbl_ms_vc_sire)
smry_aov_ms_vc_sire <- summary(aov_ms_vc_sire)
smry_aov_ms_vc_sire
Df Sum Sq Mean Sq F value Pr(>F)
Sire 2 2651.6 1325.8 13.68 0.00581 **
Residuals 6 581.3 96.9
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
n_sigma_s2 <- (smry_aov_ms_vc_sire[[1]]["Sire", "Mean Sq"] - smry_aov_ms_vc_sire[[1]]["Residuals", "Mean Sq"]) / 3
n_sigma_s2
[1] 409.6296
REML
lmer_ms_vc_sire <- lme4::lmer(`Body Weight` ~ (1|Sire), data = tbl_ms_vc_sire)
summary(lmer_ms_vc_sire)
Linear mixed model fit by REML ['lmerMod']
Formula: `Body Weight` ~ (1 | Sire)
Data: tbl_ms_vc_sire
REML criterion at convergence: 66.7
Scaled residuals:
Min 1Q Median 3Q Max
-1.62462 -0.20467 -0.03325 0.67790 1.01679
Random effects:
Groups Name Variance Std.Dev.
Sire (Intercept) 409.63 20.239
Residual 96.89 9.843
Number of obs: 9, groups: Sire, 3
Fixed effects:
Estimate Std. Error t value
(Intercept) 490.11 12.14 40.38