The first step is to read the data.
s_data_file <- "https://charlotte-ngs.github.io/gelasmss2021/data/gel_model_sel_ex02.csv"
tbl_modsel <- readr::read_csv2(s_data_file)
Using ',' as decimal and '.' as grouping mark. Use read_delim() for more control.
Parsed with column specification:
cols(
Id = col_double(),
sex = col_double(),
slh = col_double(),
hrd = col_double(),
age = col_double(),
cw = col_double(),
day = col_double(),
hum = col_double()
)
tbl_modsel
names(tbl_modsel)
[1] "Id" "sex" "slh" "hrd" "age" "cw" "day" "hum"
lm_fit <- lm(cw ~ sex + slh + hrd, data = tbl_modsel)
summary(lm_fit)
Call:
lm(formula = cw ~ sex + slh + hrd, data = tbl_modsel)
Residuals:
Min 1Q Median 3Q Max
-65.35 -29.00 -11.38 31.40 97.33
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 384.942 12.046 31.955 < 2e-16 ***
sex -76.322 5.988 -12.745 < 2e-16 ***
slh 1.931 0.598 3.228 0.00125 **
hrd 1.257 0.340 3.698 0.00022 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 35.3 on 5321 degrees of freedom
Multiple R-squared: 0.03375, Adjusted R-squared: 0.03321
F-statistic: 61.96 on 3 and 5321 DF, p-value: < 2.2e-16
This result indicates that lm() fits a regression model. The predictors sex
and slh
and hrd
should be treated as fixed effects.
tbl_modsel$sex <- as.factor(tbl_modsel$sex)
tbl_modsel$slh <- as.factor(tbl_modsel$slh)
tbl_modsel$hrd <- as.factor(tbl_modsel$hrd)
lm_fit2 <- lm(cw ~ sex + slh + hrd, data = tbl_modsel)
summary(lm_fit2)
Call:
lm(formula = cw ~ sex + slh + hrd, data = tbl_modsel)
Residuals:
Min 1Q Median 3Q Max
-27.4545 -5.4545 -0.1911 5.5455 29.8794
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 271.6235 1.4317 189.72 <2e-16 ***
sex2 -75.3135 1.4107 -53.39 <2e-16 ***
slh2 22.2634 0.2795 79.66 <2e-16 ***
slh3 3.4590 0.2817 12.28 <2e-16 ***
hrd2 87.9970 0.3604 244.16 <2e-16 ***
hrd3 8.6769 0.3605 24.07 <2e-16 ***
hrd4 58.8812 0.3578 164.58 <2e-16 ***
hrd5 19.8107 0.3574 55.44 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 8.315 on 5317 degrees of freedom
Multiple R-squared: 0.9464, Adjusted R-squared: 0.9464
F-statistic: 1.342e+04 on 7 and 5317 DF, p-value: < 2.2e-16