= 8
eff_2 = 5
eff_3 = data.frame(time = 1:3,
d_eff y = c(0, eff_2, eff_3))
library(ggplot2)
ggplot(d_eff, aes(x = time, y = y)) +
geom_line()
Module 6: Random effects
Introduction
So far, we have only looked at data where all outcome variables are independent. But some data also have dependent outcome variables. Typical examples include:
- Longitudinal data, where we have multiple measurements for each subject. For example, we can measure a patient’s outcome over time (in addition to measuring multiple patients). While measurements taken on different patients may be assumed to be independent, measurements taken on the same patient are not (they are typically positively correlated).
- Hierarchical data, where observations in the same branch of the hierarchical tree are dependent. For example, if we measure exam results for pupils in different classes (where each pupil belongs to only one class), the results within a class are not independent (again, they are typically positively correlated).
Linear models and generalised linear models assume that the outcome variables are independent. A solution for handling the dependency in the above model types is mixed-effects (generalised) linear models.
Example study
We will mainly use simulated data, as this makes it easier to see what’s going on.
Imagine we’ve developed a ‘happy pill’ which (we hope) will make people happier. We measure people’s happiness at three time points – before taking the pill, one month after using the pill and two months after using the pill.
(We can for example measure their happiness by making them fill out a detailed ‘happiness questionnaire’ and calculating a ‘happiness score’ on a 0–100 scale based on their answers.)
Let’s imagine that the pill raises their happiness score by 8 points after one month. After two months, the effect is starting to wear off, and it drops by 3 points, so that their score is only 5 points above the baseline score. The treatment effect looks like this:
But the subjects in our experiment start out with different happiness levels at baseline, so the scores might look like something like this:
= 10 # Number of subject
n_ind = 70 # Mean happiness level
y_mean = 5 # SD for baseline happiness level
sd_ind set.seed(123)
= rnorm(n_ind, mean = y_mean, sd = sd_ind)
y_1 = data.frame(
d subj = factor(seq_len(n_ind)),
y_1,y_2 = y_1 + eff_2,
y_3 = y_1 + eff_3
)
library(tidyr) # For transforming data from wide format to long format
= pivot_longer(d, starts_with("y_"),
d_long names_to = "time", names_prefix = "y_",
names_transform = as.numeric
)
ggplot(d_long, aes(x = time, y = value, colour = subj)) +
geom_line()
Noise (error term)
Our real data will of course not look like this. Even if the effect of the pill is the same for all subjects, we will have ‘noise’ in our observations (and an ‘error term’ in our models to capture this noise). This noise comes from two types of sources:
- For a given person, their happiness level varies randomly from day to day. If it’s their birthday, they may be a bit happier that day. If it’s raining, their happiness level may drop a bit.
- We have measurement error in our instrument. Here, the ‘instrument’ is a questionnaire (and the context in which filling out the questionnaire takes place). If the happiness questionnaire only has a small number of questions, and each question only has a small number of response options (e.g., only ‘yes’ or ‘no’), the measurement error will be greater than if it’s a detailed questionnaire with very many questions (perhaps covering various ‘domains’ of happiness) and with many response options (e.g., a 0–10 scale) for each question.
For our data, we can’t separate these two types of noise, so they all go into an error term in our model.
Let’s create a function to simulate data:
= function(n_ind = 15, y_mean = 70, eff_2 = 8,
sim_ri eff_3 = 5, sd_ind, sd_resid, seed = 640) {
set.seed(seed) # Reproducible results
# ‘Real’ happiness level
= rnorm(n_ind, mean = y_mean, sd = sd_ind)
y_1_mean = y_1_mean + eff_2
y_2_mean = y_1_mean + eff_3
y_3_mean
# ‘Noise’
= rnorm(n_ind, sd = sd_resid)
err_1 = rnorm(n_ind, sd = sd_resid)
err_2 = rnorm(n_ind, sd = sd_resid)
err_3
# Observed/measured happiness level
= y_1_mean + err_1
y_1_obs = y_2_mean + err_2
y_2_obs = y_3_mean + err_3
y_3_obs
data.frame(subj = factor(seq_len(n_ind)),
y_1_mean, y_2_mean, y_3_mean,
y_1_obs, y_2_obs, y_3_obs)
}# Convert wide data to long format
= function(d_wide) {
conv_wdata_to_ldata pivot_longer(d_wide, starts_with("y_"),
names_pattern = "y_(\\d)_(.*)",
names_to = c("time", ".value"),
names_transform = list(time = as.numeric)
) }
If the variation in (baseline) happiness among subjects corresponds to a standard deviation (SD) of 5 and the residual variation corresponds to a SD of 3, our data will look like this:
= sim_ri(sd_ind = 5, sd_resid = 3)
d_wide = conv_wdata_to_ldata(d_wide)
d_long ggplot(d_long, aes(x = time, y = obs, colour = subj)) +
geom_line() +
theme(legend.position = "none")
A basic random intercept model
Now let’s fit a model. This is a linear mixed-effects model. The fixed-effects part is familiar, and we treat time as a categorical variable, so the intercept parameter corresponds to the mean happiness score as baseline (time point 1).
But now we also add a random effect, a term representing the variation between subjects. The ‘intercept’ varies between subjects, so the term is called random intercept.
Basically, we say that the random intercept is a normally distributed random variable with mean 0 and unknown variance (just like the error/residual term), but its value varies with subject. In other words, the value is different from subject to subject, but it’s the same within subject, i.e., for each of the three time points. An important assumption is also that it’s independent of the error/residual term, and also of the predictors (if they‘re random variables, i.e., measured with error).
The syntax for a random intercept is (1|subj)
:
# Random intercept model (a type of mixed-effects model)
library(glmmTMB)
$time_fac = factor(d_long$time)
d_long= glmmTMB(obs ~ time_fac + (1 | subj), data = d_long)
mod_ri summary(mod_ri)
Family: gaussian ( identity )
Formula: obs ~ time_fac + (1 | subj)
Data: d_long
AIC BIC logLik deviance df.resid
273.7 282.8 -131.9 263.7 40
Random effects:
Conditional model:
Groups Name Variance Std.Dev.
subj (Intercept) 26.569 5.155
Residual 9.845 3.138
Number of obs: 45, groups: subj, 15
Dispersion estimate for gaussian family (sigma^2): 9.85
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 70.269 1.558 45.10 < 2e-16 ***
time_fac2 7.539 1.146 6.58 4.69e-11 ***
time_fac3 4.907 1.146 4.28 1.84e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
= fixef(mod_ri)$cond
ko = paste0(f_num(ko, 1), collapse = ",") ko_form
The coefficient estimates (ko_form
) can be interpreted like in a normal linear model (the population values they estimate are 70, 8 and 5). But in the ‘Random effects:’ block we now also see estimates of the random effect variances / standard deviations, one for subject and one for the error/residual term. (That last one is identical to the variance shown on the ‘Dispersion estimate’ line.)
The model has only five parameters: three coefficient estimates (for the three time points) and two variance/SD estimates. We don’t get a parameter for each subject’s intercept. However, we can actually estimate each subject’s intercept:
ranef(mod_ri)
$subj
(Intercept)
1 -0.6390825
2 4.3451828
3 -3.5287717
4 -3.3294055
5 3.2754833
6 5.5366426
7 1.4061163
8 -8.1826633
9 -4.4065830
10 3.1360898
11 -8.8484657
12 -1.2533532
13 2.9663908
14 9.1386768
15 0.3837451
The estimates are called conditional modes (which for Gaussian models like this one are also the conditional means), or sometimes BLUPs (best linear unbiased predictors).
Some people object to calling them estimators/estimates, and they’re actually more similar to residuals. But for all practical purposes, we can treat them as estimates (or predictions, if you prefer) of the ‘subject’ effects.
And now we can plot the subjects’ estimated ‘happiness trajectory’:
$pred_ri = predict(mod_ri)
d_long
ggplot(d_long, aes(x = time, y = pred_ri, colour = subj)) +
geom_line() +
theme(legend.position = "none")
A fixed-effects model with subject-level intercepts
How about a normal linear model with a parameter for each subject? We can try to fit one:
# Fixed-effects model
= glmmTMB(obs ~ time_fac + factor(subj), data = d_long)
mod_fe summary(mod_fe)
Family: gaussian ( identity )
Formula: obs ~ time_fac + factor(subj)
Data: d_long
AIC BIC logLik deviance df.resid
248.4 280.9 -106.2 212.4 27
Dispersion estimate for gaussian family (sigma^2): 6.56
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 69.5507 1.5747 44.17 < 2e-16 ***
time_fac2 7.5395 0.9355 8.06 7.67e-16 ***
time_fac3 4.9073 0.9355 5.25 1.56e-07 ***
factor(subj)2 5.5999 2.0918 2.68 0.00743 **
factor(subj)3 -3.2466 2.0918 -1.55 0.12065
factor(subj)4 -3.0226 2.0918 -1.44 0.14847
factor(subj)5 4.3981 2.0918 2.10 0.03551 *
factor(subj)6 6.9385 2.0918 3.32 0.00091 ***
factor(subj)7 2.2978 2.0918 1.10 0.27200
factor(subj)8 -8.4754 2.0918 -4.05 5.09e-05 ***
factor(subj)9 -4.2329 2.0918 -2.02 0.04302 *
factor(subj)10 4.2415 2.0918 2.03 0.04260 *
factor(subj)11 -9.2234 2.0918 -4.41 1.04e-05 ***
factor(subj)12 -0.6901 2.0918 -0.33 0.74146
factor(subj)13 4.0508 2.0918 1.94 0.05281 .
factor(subj)14 10.9855 2.0918 5.25 1.51e-07 ***
factor(subj)15 1.1492 2.0918 0.55 0.58276
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
One thing to note is that the predicted trajectories are not the same in the mixed-effects (solid lines) and in the fixed-effects (dotted lines) models:
$pred_fe = predict(mod_fe)
d_long
ggplot(d_long, aes(x = time, y = pred_ri, colour = subj)) +
geom_line() +
geom_line(aes(y = pred_fe), linetype = "dotted") +
theme(legend.position = "none")
The estimates from the random intercept model are closer to the overall mean. We say that the estimates have been shrunk towards the mean. The shrinkage is largest for the most extreme values (very high or very low values).
Informally, what’s happening is this: For the subject with the highest line, the model is thinking ‘The observations for this subject are really high. But given what I know (have estimated) about the mean structure and for the standard deviations for subjects and the error term, such values are highly unlikely. So it’s probably a case of a large positive error term. I’ll therefore shrink my ‘predicted values’ for this subject a bit towards the mean.
Shrinkage is a good thing, as it gives better predictions. How much shrinkage occurs depends mainly on three things:
- The size of the estimated SD of the error term. (Higher SD leads to more shrinkage.)
- The size of the estimated SD of the subject term. (Higher SD leads to less shrinkage, since that makes very high or very low observed values more likely to appear. But it’s really the SD for the error term compared to the SD for the subject term that determines the shrinkage.)
- The distance from the overall mean. Observations far from the mean (on either side) will be heavily shrunk towards the mean. Observations close to the mean will get almost no shrinkage.
The values for the obs
panel are actually equivalent to estimates from a saturated fixed-effects model – a model with interactions between all subjects and all time points. So you can think of also the values in this panel as model-based estimates.
= pivot_longer(d_long, c(mean, obs, pred_ri, pred_fe),
d_longlong names_to = "type"
)ggplot(d_longlong, aes(x = time, y = value, group = subj)) +
geom_line(alpha = .3) +
theme(legend.position = "none") +
facet_wrap(~type, nrow = 1)
Are fixed-effect models like these valid?
We fitted fixed-effect and mixed-effect models on the same data. The both gave reasonable estimates of the treatment effect. Is there anything wrong with just fitting fixed-effects models?
Yes. Even though the fixed-effects model worked fine here, such models are in general not valid. Basically, the problem is that when we increase the number of subjects, we also increase the number of parameters estimated. The number of parameters tends to infinity as the number of observations increases, and the asymptotics of the models break down; the parameter estimators are no longer consistent.
For the mixed-effects model, we don’t have this problem, as we only estimate one parameter for the ‘subject effect’, namely the standard deviation of the random intercept.
Random slope data
Until now, we have assumed that the treatment effect is the same for all subjects. We’ll now look at a model where it isn’t. We’ll assume that the effect is linear over time for each subject, but that the linear effect also differs between subjects.
= function(n_ind = 15, y_mean, sd_ind,
sim_rs seed = 640) {
slope, sd_slope, sd_resid, set.seed(seed) # Reproducible results
# Since the effect is linear, the code could be simplified
# and also generalised to handle more than three time points.
# But having it on this form makes it easier to see
# exactly what’s going on.
# Overall mean happiness level
= y_mean
y_1_mean = y_1_mean + 1 * slope
y_2_mean = y_1_mean + 2 * slope
y_3_mean
# Individual mean happiness level
= rnorm(n_ind, mean = 0, sd = sd_ind)
intercept_ind = rnorm(n_ind, mean = 0, sd = sd_slope)
slope_ind = y_1_mean + intercept_ind
y_1_ind = y_2_mean + intercept_ind + 1 * slope_ind
y_2_ind = y_3_mean + intercept_ind + 2 * slope_ind
y_3_ind
# ‘Noise’
= rnorm(n_ind, sd = sd_resid)
err_1 = rnorm(n_ind, sd = sd_resid)
err_2 = rnorm(n_ind, sd = sd_resid)
err_3
# Observed/measured happiness level
= y_1_ind + err_1
y_1_obs = y_2_ind + err_2
y_2_obs = y_3_ind + err_3
y_3_obs
data.frame(subj = factor(seq_len(n_ind)),
y_1_mean, y_2_mean, y_3_mean,
y_1_ind, y_2_ind, y_3_ind,
y_1_obs, y_2_obs, y_3_obs) }
= sim_rs(y_mean = 70, slope = 6, sd_slope = 5,
d_wide sd_ind = 4, sd_resid = 3, seed = 451)
= conv_wdata_to_ldata(d_wide)
d_long = pivot_longer(d_long, c(mean, ind, obs), names_to = "type")
d_longlong $type = relevel(factor(d_longlong$type), "mean") # Reordering for plotting
d_longlongggplot(d_longlong, aes(x = time, y = value, colour = subj)) +
geom_line() +
theme(legend.position = "none") +
facet_wrap(~type, nrow = 1)
The panels in the figure above show from left to right:
- The overall mean trajectory (the expected trajectory for a random subject)
- The individual mean trajectories (where we ignore measurement error)
- The observed data
We see that even though the overall treatment effect is positive (‘on average’, the subjects’ happiness levels increase by 6 points each month), the individual treatment effect varies. For some subjects, the pill actually makes them less happy over time.
A basic random slope model
We can fit a model that takes the variation in slopes into account by including a random slope in addition to the random intercept. For this, the random-effects part of the formula will look like this: (1 + time | subj)
As before, the 1
corresponds to an ‘intercept’ that is allowed to vary from subject to subject (subj
). The time
corresponds to the ‘slope’ associated with the time
variable as is also allowed to vary from subject to subject. It is assumed that the two random effects (which are allowed to be correlated) have a joint zero-mean Gaussian distribution.
(Since the means of the random effects are zero, we include fixed effects to allow the mean intercept and mean slope to be different from zero.)
It’s easy to fit the model:
= glmmTMB(obs ~ time + (1 + time | subj), data = d_long)
mod_rs summary(mod_rs)
Family: gaussian ( identity )
Formula: obs ~ time + (1 + time | subj)
Data: d_long
AIC BIC logLik deviance df.resid
298.4 309.2 -143.2 286.4 39
Random effects:
Conditional model:
Groups Name Variance Std.Dev. Corr
subj (Intercept) 76.37 8.739
time 22.30 4.723 -0.90
Residual 12.32 3.510
Number of obs: 45, groups: subj, 15
Dispersion estimate for gaussian family (sigma^2): 12.3
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 66.540 2.647 25.14 < 2e-16 ***
time 5.675 1.377 4.12 3.8e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
It’s easier to identify the parameters if we use a slightly different parametrisation. Instead of time 1, 2, 3, we parametrise the time as 0, 1, and 2:
$time_0b = d_long$time - 1 # 0-based time
d_long= glmmTMB(obs ~ time_0b + (1 + time_0b | subj), data = d_long)
mod_rs_0b summary(mod_rs_0b)
Family: gaussian ( identity )
Formula: obs ~ time_0b + (1 + time_0b | subj)
Data: d_long
AIC BIC logLik deviance df.resid
298.4 309.2 -143.2 286.4 39
Random effects:
Conditional model:
Groups Name Variance Std.Dev. Corr
subj (Intercept) 24.13 4.912
time_0b 22.30 4.723 -0.65
Residual 12.32 3.510
Number of obs: 45, groups: subj, 15
Dispersion estimate for gaussian family (sigma^2): 12.3
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 72.214 1.514 47.69 < 2e-16 ***
time_0b 5.675 1.377 4.12 3.8e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Note which estimates change and which stay the same in this new (but fully equivalent) model.
With a random intercept model, the correlations between observations from different time points were identical. And the variances for observations from different time points were also identical.
This is not true for a random intercept model. Because of the ‘fan’ shape of the individual trajectories (easiest to see if you set both sd_ind
and sd_resid
to small values). the correlations will decrease over time (e.g., the correlation between the time 1 and time 2 observations will be higher than the correlation between the time 1 and time 3 observations) and the standard deviations will increase over time (e.g., the standard deviation at time 2 will be higher than the standard deviation at time 1, and the standard deviation at time 3 will be higher still).
In our random intercept model, we had three time points. Can we also estimate the same model with only two time points?
The answer is yes and no, but mostly no …
The answer is yes, because everything is possible if we make enough assumptions. 😊 And fitting the model using only time points 1 and 2 actually works well. We can estimate the mean treatment effect (slope) and the variation in treatment effect (random slope).
But imagine an alternative (random intercept) model where 1) the treatment effect is the same for each subject, and 2) the residual standard deviation is allowed to be different (higher) at the second time point than at the first. This will actually give us the exact same fit as the above random slope model.
So we have two alternative models, one where the treatment effect differs between subjects, and one where it doesn’t. Using data from only two time points, we can’t distinguish between the two models. So if we’re interested in estimating the variation in treatment effect, not just the mean treatment effect, we need (at least) three time points.
That’s why the answer to the question is really mostly no.
Simplicity and real life
For simplicity, we have fitted very simple models on simplified datasets in this exercise. And the interpretation of our effect estimates are based on many assumptions.
For one, we pretended that the our estimates corresponded to the effect of taking the ‘happiness pill’. But if there were other things happening at the same time, this would not be true.
For example, if the first measurement was taken in March, the second in April and the third May, the (apparent) increase in mean happiness might be caused by the weather getting warmer and sunnier, not (only) by the effect of the pill.
For a real study, we would instead randomise the subjects to either a) get the pill, or b) get a placebo. And we would then compare the effect of the pill over time against the effect of the placebo over time. We can use a mixed-effect model for this too. (If the first time point is a baseline measurement taken before getting the pill/placebo, we would preferably force the treatment effect to be 0 at this time point.)
Multiple random intercepts or slopes
We can also fit models with multiple random intercepts (and/or random slopes). This is typically used for hierarchical data, where we have one random intercept for each level. Looking back to our exam results example, we can have:
(1 | county) + (1 | school) + (1 | class)
Here we have one random intercept for county, one for school and one for class. (If we had multiple exam results for each pupil, we could also have a ‘pupil effect’, (1 | pupil)
.)
Exam results for pupils in the same class are dependent due to a ‘class’ effect. For example, they all share the same teacher. A good teacher might raise the exam results of all the pupils (and a bad teacher might lower them). And if the class has a few noisy pupils, this will affect the learning environment and thus the exam results of all the pupils in that class.
Similarly, the ‘school’ effect induces a dependence between pupils at the same school. (For example, different schools may have different teaching strategies or different qualities of teaching equipment.)
And similarly, there could also be a ‘county’ effect.
The correlation between exam results will be greatest for two pupils in the same class, smaller (but still positive) for two pupils in the same school but in different classes, and smaller yet (and still positive) for two pupils in the same county but in different schools.
The random effects are assumed to be independent, and the factors levels of the corresponding variables must be globally unique. (So if the classes are named ‘a’, ‘b’, ‘c’, … in multiple schools, we would have to rename them to be unique, e.g., to ‘school1_a,
school2_a,
school3_a`, … ‘school1_b’, ‘school2_b’, ‘school3_b’ …)
A generalised mixed-effects model
We can also combine generalised linear models (GLMs) and random effects. Then we get generalised linear mixed-effects models (GLMMs).
Note that even though the conditional response is no longer Gaussian, the random effects are still assumed to be so. And they are included on link scale, just like the fixed-effects variables.
We’ll fit a simple model on inflorescence counts from this week’s videos. We include three separate random intercepts.
load(url("https://raw.githubusercontent.com/embruna/Brooks_etal_2019_HeliconiaData/master/hdat.RData"))
= glmmTMB(infl ~ lsize * habitat + (1 | plot) + (1 | year) + (1 | ID),
mod_pois family = poisson(), data = hdat
)summary(mod_pois)
Family: poisson ( log )
Formula: infl ~ lsize * habitat + (1 | plot) + (1 | year) + (1 | ID)
Data: hdat
AIC BIC logLik deviance df.resid
11153.7 11213.5 -5569.8 11139.7 37749
Random effects:
Conditional model:
Groups Name Variance Std.Dev.
plot (Intercept) 0.09197 0.3033
year (Intercept) 0.28542 0.5342
ID (Intercept) 0.78465 0.8858
Number of obs: 37756, groups: plot, 10; year, 12; ID, 5290
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -6.8558 0.3084 -22.231 < 2e-16 ***
lsize 2.6336 0.1485 17.740 < 2e-16 ***
habitatcontinuous 0.6373 0.3109 2.050 0.040403 *
lsize:habitatcontinuous -0.6175 0.1602 -3.856 0.000115 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1