s_data_path <- "https://charlotte-ngs.github.io/pigsciencess2022/data/bw_bc_rand_pred.csv"
tbl_data <- readr::read_csv(file = s_data_path)Rows: 10 Columns: 4
── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
dbl (4): Animal, Breast Circumference, Body Weight, RandPred
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.tbl_datapairs(tbl_data)Full model
lm_bw_bc_rp <- lm(formula = `Body Weight` ~ `Breast Circumference` + RandPred, 
                    data = tbl_data)
summary(lm_bw_bc_rp)
Call:
lm(formula = `Body Weight` ~ `Breast Circumference` + RandPred, 
    data = tbl_data)
Residuals:
     Min       1Q   Median       3Q      Max 
-21.9487  -2.0365   0.3772   6.7539   8.8617 
Coefficients:
                        Estimate Std. Error t value Pr(>|t|)    
(Intercept)            -1464.607    368.944  -3.970 0.005394 ** 
`Breast Circumference`     8.038      1.408   5.708 0.000729 ***
RandPred                   2.845      1.993   1.427 0.196541    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 10.43 on 7 degrees of freedom
Multiple R-squared:  0.8632,    Adjusted R-squared:  0.8242 
F-statistic: 22.09 on 2 and 7 DF,  p-value: 0.0009462library(dplyr)
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
    filter, lag
The following objects are masked from ‘package:base’:
    intersect, setdiff, setequal, uniontbl_ms_vc_olsrr <- tbl_data %>% rename(BW = `Body Weight`, BC = `Breast Circumference`)
lm_ms_vc_olsrr <- lm(BW ~ BC + RandPred, data = tbl_ms_vc_olsrr)
olsrr::ols_step_best_subset(lm_ms_vc_olsrr) Best Subsets Regression  
--------------------------
Model Index    Predictors
--------------------------
     1         BC          
     2         BC RandPred 
--------------------------
                                                    Subsets Regression Summary                                                    
----------------------------------------------------------------------------------------------------------------------------------
                       Adj.        Pred                                                              Alternative
MASS::stepAIC(lm_ms_vc_olsrr)Start:  AIC=49.32
BW ~ BC + RandPred
           Df Sum of Sq    RSS    AIC
<none>                   760.9 49.320
- RandPred  1     221.5  982.4 49.874
- BC        1    3541.6 4302.5 64.644
Call:
lm(formula = BW ~ BC + RandPred, data = tbl_ms_vc_olsrr)
Coefficients:
(Intercept)           BC     RandPred  
  -1464.607        8.038        2.845  aov_bw_bc_rp <- aov(formula = `Body Weight` ~ `Breast Circumference` + RandPred, 
                    data = tbl_data)
(smry_aov_bw_bc_rp <- summary(aov_bw_bc_rp))                       Df Sum Sq Mean Sq F value   Pr(>F)    
`Breast Circumference`  1   4581    4581  42.143 0.000337 ***
RandPred                1    221     221   2.037 0.196541    
Residuals               7    761     109                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Sum Sq)The total sum of squares (\(SSQ_T\)) is computed by summing over the squared deviations of the observed body weights from the mean body weight. This is computed as
n_mean_bw <- mean(tbl_data$`Body Weight`)
ssq_t <- sum((tbl_data$`Body Weight` - n_mean_bw)^2)
ssq_t[1] 5563.6This corresponds to the sum of all entries in the column Sum Sq in the ANOVA-Table shown above.
sum(smry_aov_bw_bc_rp[[1]][, "Sum Sq"])[1] 5563.6The sum of squares of the residuals is the sum of the squared deviations of the observed body weights minus the fitted values of the regression model. This corresponds to
ssq_r <- sum((tbl_data$`Body Weight` - fitted(aov_bw_bc_rp))^2)
ssq_r[1] 760.9409From the ANOVA table we get
smry_aov_bw_bc_rp[[1]]["Residuals", "Sum Sq"][1] 760.9409The sum of squares that can be attributed to the regression model is computed by the sum of the squared deviations between the fitted values minus the mean body weight.
ssq_m <- sum((fitted(aov_bw_bc_rp) - n_mean_bw)^2)
ssq_m[1] 4802.659This corresponds to the sum of the values in column Sum Sq for Breast Circumference and RandPred.
smry_aov_bw_bc_rp[[1]]["`Breast Circumference`", "Sum Sq"] + smry_aov_bw_bc_rp[[1]]["RandPred", "Sum Sq"][1] 4802.659The sum of squares for the predictor Breast Circumference is computed as the regression sum of squares in a model where only Breast Circumference is used as the only predictor.
aov_bw_bc <- aov(formula = `Body Weight` ~ `Breast Circumference`, data = tbl_data)
smry_aov_bw_bc <- summary(aov_bw_bc)
ssq_b <- sum((fitted(aov_bw_bc) - n_mean_bw)^2)
ssq_b[1] 4581.203From the ANOVA table of the full model
smry_aov_bw_bc_rp[[1]]["`Breast Circumference`", "Sum Sq"][1] 4581.203The sum of squares for the predictor Rand Pred correspond to the difference between the regression sum of squares minus the sum of squares computed for Breast Circumference alone
ssq_m - ssq_b[1] 221.4565From the ANOVA table, we get
smry_aov_bw_bc_rp[[1]]["RandPred", "Sum Sq"][1] 221.4565The computation of the sum of squares for the single predictor indicate that the order of the predictor matters. The order in which the sum of squares of the single predictors are computed is determined by the order of the predictor in the formula specified in the call to aov(). Hence the following call results in a different ANOVA table
summary(aov(formula = `Body Weight` ~ RandPred + `Breast Circumference`, data = tbl_data))                       Df Sum Sq Mean Sq F value   Pr(>F)    
RandPred                1   1261    1261   11.60 0.011349 *  
`Breast Circumference`  1   3542    3542   32.58 0.000729 ***
Residuals               7    761     109                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Comparing the second version of the ANOVA table to the original one, shows that it is better to order the predictors according to their value in the Sum Sq column.
A column with the ID of the sire is added.
tbl_ms_vc_sire <- tbl_data
tbl_ms_vc_sire$Sire <- c(rep(1,3), rep(2,3), rep(3,4))
tbl_ms_vc_sireNAtbl_ms_vc_sire$Sire <- as.factor(tbl_ms_vc_sire$Sire)
tbl_ms_vc_sireNATo have a balanced dataset, we only use the first nine observations.
tbl_ms_vc_sire <- tbl_ms_vc_sire[1:9,]
tbl_ms_vc_sireDetermine sire variance component using aov().
aov_ms_vc_sire <- aov(`Body Weight` ~ Sire, data = tbl_ms_vc_sire)
smry_aov_ms_vc_sire <- summary(aov_ms_vc_sire)
smry_aov_ms_vc_sire            Df Sum Sq Mean Sq F value  Pr(>F)   
Sire         2 2651.6  1325.8   13.68 0.00581 **
Residuals    6  581.3    96.9                   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1n_sigma_s2 <- (smry_aov_ms_vc_sire[[1]]["Sire", "Mean Sq"] - smry_aov_ms_vc_sire[[1]]["Residuals", "Mean Sq"]) / 3
n_sigma_s2[1] 409.6296REML
lmer_ms_vc_sire <- lme4::lmer(`Body Weight` ~ (1|Sire), data = tbl_ms_vc_sire)
summary(lmer_ms_vc_sire)Linear mixed model fit by REML ['lmerMod']
Formula: `Body Weight` ~ (1 | Sire)
   Data: tbl_ms_vc_sire
REML criterion at convergence: 66.7
Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.62462 -0.20467 -0.03325  0.67790  1.01679 
Random effects:
 Groups   Name        Variance Std.Dev.
 Sire     (Intercept) 409.63   20.239  
 Residual              96.89    9.843  
Number of obs: 9, groups:  Sire, 3
Fixed effects:
            Estimate Std. Error t value
(Intercept)   490.11      12.14   40.38