Module 6: Random effects

Author

Karl Ove Hufthammer

Introduction

So far, we have only looked at data where all outcome variables are independent. But some data also have dependent outcome variables. Typical examples include:

  • Longitudinal data, where we have multiple measurements for each subject. For example, we can measure a patient’s outcome over time (in addition to measuring multiple patients). While measurements taken on different patients may be assumed to be independent, measurements taken on the same patient are not (they are typically positively correlated).
  • Hierarchical data, where observations in the same branch of the hierarchical tree are dependent. For example, if we measure exam results for pupils in different classes (where each pupil belongs to only one class), the results within a class are not independent (again, they are typically positively correlated).

Linear models and generalised linear models assume that the outcome variables are independent. A solution for handling the dependency in the above model types is mixed-effects (generalised) linear models.

Example study

We will mainly use simulated data, as this makes it easier to see what’s going on.

Imagine we’ve developed a ‘happy pill’ which (we hope) will make people happier. We measure people’s happiness at three time points – before taking the pill, one month after using the pill and two months after using the pill.

(We can for example measure their happiness by making them fill out a detailed ‘happiness questionnaire’ and calculating a ‘happiness score’ on a 0–100 scale based on their answers.)

Let’s imagine that the pill raises their happiness score by 8 points after one month. After two months, the effect is starting to wear off, and it drops by 3 points, so that their score is only 5 points above the baseline score. The treatment effect looks like this:

eff_2 = 8
eff_3 = 5
d_eff = data.frame(time = 1:3,
                   y = c(0, eff_2, eff_3))

library(ggplot2)
ggplot(d_eff, aes(x = time, y = y)) +
  geom_line()

But the subjects in our experiment start out with different happiness levels at baseline, so the scores might look like something like this:

n_ind = 10 # Number of subject
y_mean = 70 # Mean happiness level
sd_ind = 5 # SD for baseline happiness level
set.seed(123)
y_1 = rnorm(n_ind, mean = y_mean, sd = sd_ind)
d = data.frame(
  subj = factor(seq_len(n_ind)),
  y_1,
  y_2 = y_1 + eff_2,
  y_3 = y_1 + eff_3
)

library(tidyr) # For transforming data from wide format to long format
d_long = pivot_longer(d, starts_with("y_"),
  names_to = "time", names_prefix = "y_",
  names_transform = as.numeric
)

ggplot(d_long, aes(x = time, y = value, colour = subj)) +
  geom_line()

Noise (error term)

Our real data will of course not look like this. Even if the effect of the pill is the same for all subjects, we will have ‘noise’ in our observations (and an ‘error term’ in our models to capture this noise). This noise comes from two types of sources:

  • For a given person, their happiness level varies randomly from day to day. If it’s their birthday, they may be a bit happier that day. If it’s raining, their happiness level may drop a bit.
  • We have measurement error in our instrument. Here, the ‘instrument’ is a questionnaire (and the context in which filling out the questionnaire takes place). If the happiness questionnaire only has a small number of questions, and each question only has a small number of response options (e.g., only ‘yes’ or ‘no’), the measurement error will be greater than if it’s a detailed questionnaire with very many questions (perhaps covering various ‘domains’ of happiness) and with many response options (e.g., a 0–10 scale) for each question.

For our data, we can’t separate these two types of noise, so they all go into an error term in our model.

Let’s create a function to simulate data:

sim_ri = function(n_ind = 15, y_mean = 70, eff_2 = 8,
                  eff_3 = 5, sd_ind, sd_resid, seed = 640) {
  set.seed(seed) # Reproducible results
  
  # ‘Real’ happiness level
  y_1_mean = rnorm(n_ind, mean = y_mean, sd = sd_ind)
  y_2_mean = y_1_mean + eff_2
  y_3_mean = y_1_mean + eff_3
  
  # ‘Noise’
  err_1 = rnorm(n_ind, sd = sd_resid)
  err_2 = rnorm(n_ind, sd = sd_resid)
  err_3 = rnorm(n_ind, sd = sd_resid)
  
  # Observed/measured happiness level
  y_1_obs = y_1_mean + err_1
  y_2_obs = y_2_mean + err_2
  y_3_obs = y_3_mean + err_3
  
  data.frame(subj = factor(seq_len(n_ind)),
             y_1_mean, y_2_mean, y_3_mean,
             y_1_obs, y_2_obs, y_3_obs)
}
# Convert wide data to long format
conv_wdata_to_ldata = function(d_wide) {
  pivot_longer(d_wide, starts_with("y_"),
    names_pattern = "y_(\\d)_(.*)",
    names_to = c("time", ".value"),
    names_transform = list(time = as.numeric)
  )
}

If the variation in (baseline) happiness among subjects corresponds to a standard deviation (SD) of 5 and the residual variation corresponds to a SD of 3, our data will look like this:

d_wide = sim_ri(sd_ind = 5, sd_resid = 3)
d_long = conv_wdata_to_ldata(d_wide)
ggplot(d_long, aes(x = time, y = obs, colour = subj)) +
  geom_line() +
  theme(legend.position = "none")

Correlated data

Having a model might make it easier to extract the signal from the noise in data. But first let’s make an observation: Our measurements (happiness scores) are not independent random variables. Scores taken from the same subjects are correlated:

library(dplyr) # For select()
d_y = select(d_wide, ends_with("_obs"))
head(d_y)
   y_1_obs  y_2_obs  y_3_obs
1 65.02578 76.41156 79.66142
2 79.37217 78.75038 79.77595
3 62.69571 76.25443 72.40877
4 67.73079 74.56501 69.73508
5 77.41136 82.65234 74.22932
6 77.32814 83.87062 80.71562
round(cor(d_y), 2)
        y_1_obs y_2_obs y_3_obs
y_1_obs    1.00    0.73    0.78
y_2_obs    0.73    1.00    0.71
y_3_obs    0.78    0.71    1.00

The sample correlations between observations at time points (1,2), (1,3) and (2,3) are all similar. And it’s not difficult to show that the corresponding population correlations are all:

\[\frac{\mathrm{Var(\text{subject})}}{\mathrm{Var(\text{total})}} = \frac{\mathrm{Var(\text{subject})}}{\mathrm{Var(\text{subject})} +\mathrm{Var(\text{noise})}}\]

Since variance is the square of the standard deviation, we have:

var_subj = 5^2 # sd_ind^2
var_total = 5^2 + 3^2 # sd_ind^2 + sd_resid^2
var_subj/var_total
[1] 0.7352941

So the sample correlations were quite close. (We were lucky. The variance estimates are not very precise for such a small sample.)

A basic random intercept model

Now let’s fit a model. This is a linear mixed-effects model. The fixed-effects part is familiar, and we treat time as a categorical variable, so the intercept parameter corresponds to the mean happiness score as baseline (time point 1).

But now we also add a random effect, a term representing the variation between subjects. The ‘intercept’ varies between subjects, so the term is called random intercept.

Basically, we say that the random intercept is a normally distributed random variable with mean 0 and unknown variance (just like the error/residual term), but its value varies with subject. In other words, the value is different from subject to subject, but it’s the same within subject, i.e., for each of the three time points. An important assumption is also that it’s independent of the error/residual term, and also of the predictors (if they‘re random variables, i.e., measured with error).

The syntax for a random intercept is (1|subj):

# Random intercept model (a type of mixed-effects model)
library(glmmTMB)
d_long$time_fac = factor(d_long$time)
mod_ri = glmmTMB(obs ~ time_fac + (1 | subj), data = d_long)
summary(mod_ri)
 Family: gaussian  ( identity )
Formula:          obs ~ time_fac + (1 | subj)
Data: d_long

     AIC      BIC   logLik deviance df.resid 
   273.7    282.8   -131.9    263.7       40 

Random effects:

Conditional model:
 Groups   Name        Variance Std.Dev.
 subj     (Intercept) 26.569   5.155   
 Residual              9.845   3.138   
Number of obs: 45, groups:  subj, 15

Dispersion estimate for gaussian family (sigma^2): 9.85 

Conditional model:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   70.269      1.558   45.10  < 2e-16 ***
time_fac2      7.539      1.146    6.58 4.69e-11 ***
time_fac3      4.907      1.146    4.28 1.84e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ko = fixef(mod_ri)$cond
ko_form = paste0(f_num(ko, 1), collapse = ",")

The coefficient estimates (ko_form) can be interpreted like in a normal linear model (the population values they estimate are 70, 8 and 5). But in the ‘Random effects:’ block we now also see estimates of the random effect variances / standard deviations, one for subject and one for the error/residual term. (That last one is identical to the variance shown on the ‘Dispersion estimate’ line.)

The model has only five parameters: three coefficient estimates (for the three time points) and two variance/SD estimates. We don’t get a parameter for each subject’s intercept. However, we can actually estimate each subject’s intercept:

ranef(mod_ri)
$subj
   (Intercept)
1   -0.6390825
2    4.3451828
3   -3.5287717
4   -3.3294055
5    3.2754833
6    5.5366426
7    1.4061163
8   -8.1826633
9   -4.4065830
10   3.1360898
11  -8.8484657
12  -1.2533532
13   2.9663908
14   9.1386768
15   0.3837451
Note

The estimates are called conditional modes (which for Gaussian models like this one are also the conditional means), or sometimes BLUPs (best linear unbiased predictors).

Some people object to calling them estimators/estimates, and they’re actually more similar to residuals. But for all practical purposes, we can treat them as estimates (or predictions, if you prefer) of the ‘subject’ effects.

And now we can plot the subjects’ estimated ‘happiness trajectory’:

d_long$pred_ri = predict(mod_ri)

ggplot(d_long, aes(x = time, y = pred_ri, colour = subj)) +
  geom_line() +
  theme(legend.position = "none")

A fixed-effects model with subject-level intercepts

How about a normal linear model with a parameter for each subject? We can try to fit one:

# Fixed-effects model
mod_fe = glmmTMB(obs ~ time_fac + factor(subj), data = d_long)
summary(mod_fe)
 Family: gaussian  ( identity )
Formula:          obs ~ time_fac + factor(subj)
Data: d_long

     AIC      BIC   logLik deviance df.resid 
   248.4    280.9   -106.2    212.4       27 


Dispersion estimate for gaussian family (sigma^2): 6.56 

Conditional model:
               Estimate Std. Error z value Pr(>|z|)    
(Intercept)     69.5507     1.5747   44.17  < 2e-16 ***
time_fac2        7.5395     0.9355    8.06 7.67e-16 ***
time_fac3        4.9073     0.9355    5.25 1.56e-07 ***
factor(subj)2    5.5999     2.0918    2.68  0.00743 ** 
factor(subj)3   -3.2466     2.0918   -1.55  0.12065    
factor(subj)4   -3.0226     2.0918   -1.44  0.14847    
factor(subj)5    4.3981     2.0918    2.10  0.03551 *  
factor(subj)6    6.9385     2.0918    3.32  0.00091 ***
factor(subj)7    2.2978     2.0918    1.10  0.27200    
factor(subj)8   -8.4754     2.0918   -4.05 5.09e-05 ***
factor(subj)9   -4.2329     2.0918   -2.02  0.04302 *  
factor(subj)10   4.2415     2.0918    2.03  0.04260 *  
factor(subj)11  -9.2234     2.0918   -4.41 1.04e-05 ***
factor(subj)12  -0.6901     2.0918   -0.33  0.74146    
factor(subj)13   4.0508     2.0918    1.94  0.05281 .  
factor(subj)14  10.9855     2.0918    5.25 1.51e-07 ***
factor(subj)15   1.1492     2.0918    0.55  0.58276    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

One thing to note is that the predicted trajectories are not the same in the mixed-effects (solid lines) and in the fixed-effects (dotted lines) models:

d_long$pred_fe = predict(mod_fe)

ggplot(d_long, aes(x = time, y = pred_ri, colour = subj)) +
  geom_line() +
  geom_line(aes(y = pred_fe), linetype = "dotted") +
  theme(legend.position = "none")

The estimates from the random intercept model are closer to the overall mean. We say that the estimates have been shrunk towards the mean. The shrinkage is largest for the most extreme values (very high or very low values).

Informally, what’s happening is this: For the subject with the highest line, the model is thinking ‘The observations for this subject are really high. But given what I know (have estimated) about the mean structure and for the standard deviations for subjects and the error term, such values are highly unlikely. So it’s probably a case of a large positive error term. I’ll therefore shrink my ‘predicted values’ for this subject a bit towards the mean.

Shrinkage is a good thing, as it gives better predictions. How much shrinkage occurs depends mainly on three things:

  • The size of the estimated SD of the error term. (Higher SD leads to more shrinkage.)
  • The size of the estimated SD of the subject term. (Higher SD leads to less shrinkage, since that makes very high or very low observed values more likely to appear. But it’s really the SD for the error term compared to the SD for the subject term that determines the shrinkage.)
  • The distance from the overall mean. Observations far from the mean (on either side) will be heavily shrunk towards the mean. Observations close to the mean will get almost no shrinkage.
Exercise 1
  • Use sim_ri() to simulate data from 100 subjects (n_ind = 100), where the inter-subject SD is 3 (sd_ind = 3) and the noise/residual SD is 10 (sd_resid = 30). We now have very much noise, and would expect much shrinkage. Store the results as d_wide.

  • Use conv_wdata_to_ldata() to convert the data to long format. Store the results as d_long.

  • Fit random intercept and fixed-effects models for the d_long data. Store the models as mod_ri and mod_fe.

  • Run the code above to compare:

    • mean: the real subject mean/expected values (which we‘re trying to estimate/predict)
    • obs: the observed, noisy values
    • pred_fe: predictions/estimates from the fixed-effect model – pred_ri: predictions/estimates from the random intercept model
  1. Do you observe shrinkage in the random intercept model?
  2. Which of the three last panels best reflect the true values in the first panel? Which model would you prefer?
Note

The values for the obs panel are actually equivalent to estimates from a saturated fixed-effects model – a model with interactions between all subjects and all time points. So you can think of also the values in this panel as model-based estimates.

d_longlong = pivot_longer(d_long, c(mean, obs, pred_ri, pred_fe),
  names_to = "type"
)
ggplot(d_longlong, aes(x = time, y = value, group = subj)) +
  geom_line(alpha = .3) +
  theme(legend.position = "none") +
  facet_wrap(~type, nrow = 1)

Are fixed-effect models like these valid?

We fitted fixed-effect and mixed-effect models on the same data. The both gave reasonable estimates of the treatment effect. Is there anything wrong with just fitting fixed-effects models?

Yes. Even though the fixed-effects model worked fine here, such models are in general not valid. Basically, the problem is that when we increase the number of subjects, we also increase the number of parameters estimated. The number of parameters tends to infinity as the number of observations increases, and the asymptotics of the models break down; the parameter estimators are no longer consistent.

For the mixed-effects model, we don’t have this problem, as we only estimate one parameter for the ‘subject effect’, namely the standard deviation of the random intercept.

Random slope data

Until now, we have assumed that the treatment effect is the same for all subjects. We’ll now look at a model where it isn’t. We’ll assume that the effect is linear over time for each subject, but that the linear effect also differs between subjects.

sim_rs = function(n_ind = 15, y_mean, sd_ind,
                   slope, sd_slope, sd_resid, seed = 640) {
  set.seed(seed) # Reproducible results
  
  # Since the effect is linear, the code could be simplified
  # and also generalised to handle more than three time points.
  # But having it on this form makes it easier to see
  # exactly what’s going on.
  
  # Overall mean happiness level
  y_1_mean = y_mean
  y_2_mean = y_1_mean + 1 * slope
  y_3_mean = y_1_mean + 2 * slope
  
  # Individual mean happiness level
  intercept_ind = rnorm(n_ind, mean = 0, sd = sd_ind)
  slope_ind = rnorm(n_ind, mean = 0, sd = sd_slope)
  y_1_ind = y_1_mean + intercept_ind
  y_2_ind = y_2_mean + intercept_ind + 1 * slope_ind
  y_3_ind = y_3_mean + intercept_ind + 2 * slope_ind
  
  # ‘Noise’
  err_1 = rnorm(n_ind, sd = sd_resid)
  err_2 = rnorm(n_ind, sd = sd_resid)
  err_3 = rnorm(n_ind, sd = sd_resid)
  
  # Observed/measured happiness level
  y_1_obs = y_1_ind + err_1
  y_2_obs = y_2_ind + err_2
  y_3_obs = y_3_ind + err_3
  
  data.frame(subj = factor(seq_len(n_ind)),
             y_1_mean, y_2_mean, y_3_mean,
             y_1_ind, y_2_ind, y_3_ind,
             y_1_obs, y_2_obs, y_3_obs)
}
d_wide = sim_rs(y_mean = 70, slope = 6, sd_slope = 5,
                sd_ind = 4, sd_resid = 3, seed = 451)
d_long = conv_wdata_to_ldata(d_wide)
d_longlong = pivot_longer(d_long, c(mean, ind, obs), names_to = "type")
d_longlong$type = relevel(factor(d_longlong$type), "mean") # Reordering for plotting
ggplot(d_longlong, aes(x = time, y = value, colour = subj)) +
  geom_line() +
  theme(legend.position = "none") +
  facet_wrap(~type, nrow = 1)

The panels in the figure above show from left to right:

  • The overall mean trajectory (the expected trajectory for a random subject)
  • The individual mean trajectories (where we ignore measurement error)
  • The observed data

We see that even though the overall treatment effect is positive (‘on average’, the subjects’ happiness levels increase by 6 points each month), the individual treatment effect varies. For some subjects, the pill actually makes them less happy over time.

A basic random slope model

We can fit a model that takes the variation in slopes into account by including a random slope in addition to the random intercept. For this, the random-effects part of the formula will look like this: (1 + time | subj)

As before, the 1 corresponds to an ‘intercept’ that is allowed to vary from subject to subject (subj). The time corresponds to the ‘slope’ associated with the time variable as is also allowed to vary from subject to subject. It is assumed that the two random effects (which are allowed to be correlated) have a joint zero-mean Gaussian distribution.

(Since the means of the random effects are zero, we include fixed effects to allow the mean intercept and mean slope to be different from zero.)

It’s easy to fit the model:

mod_rs = glmmTMB(obs ~ time + (1 + time | subj), data = d_long)
summary(mod_rs)
 Family: gaussian  ( identity )
Formula:          obs ~ time + (1 + time | subj)
Data: d_long

     AIC      BIC   logLik deviance df.resid 
   298.4    309.2   -143.2    286.4       39 

Random effects:

Conditional model:
 Groups   Name        Variance Std.Dev. Corr  
 subj     (Intercept) 76.37    8.739          
          time        22.30    4.723    -0.90 
 Residual             12.32    3.510          
Number of obs: 45, groups:  subj, 15

Dispersion estimate for gaussian family (sigma^2): 12.3 

Conditional model:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   66.540      2.647   25.14  < 2e-16 ***
time           5.675      1.377    4.12  3.8e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

It’s easier to identify the parameters if we use a slightly different parametrisation. Instead of time 1, 2, 3, we parametrise the time as 0, 1, and 2:

d_long$time_0b = d_long$time - 1 # 0-based time
mod_rs_0b = glmmTMB(obs ~ time_0b + (1 + time_0b | subj), data = d_long)
summary(mod_rs_0b)
 Family: gaussian  ( identity )
Formula:          obs ~ time_0b + (1 + time_0b | subj)
Data: d_long

     AIC      BIC   logLik deviance df.resid 
   298.4    309.2   -143.2    286.4       39 

Random effects:

Conditional model:
 Groups   Name        Variance Std.Dev. Corr  
 subj     (Intercept) 24.13    4.912          
          time_0b     22.30    4.723    -0.65 
 Residual             12.32    3.510          
Number of obs: 45, groups:  subj, 15

Dispersion estimate for gaussian family (sigma^2): 12.3 

Conditional model:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   72.214      1.514   47.69  < 2e-16 ***
time_0b        5.675      1.377    4.12  3.8e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Note which estimates change and which stay the same in this new (but fully equivalent) model.

Exercise 2

The summary of the last model fit shows many parameter estimates. Identify the estimates of the following:

  • The overall mean value at time: 70
  • The SD corresponding to variation in mean value at the first time point: 4
  • The mean treatment effect / slope: 6
  • The SD corresponding to variation in treatment effect: 5
  • The SD corresponding to the residual noise: 3

Hint: If you’re unsure, try running the model with more subjects (n_ind = 200). Then the parameter estimates will be much closer to the real parameters.

With a random intercept model, the correlations between observations from different time points were identical. And the variances for observations from different time points were also identical.

This is not true for a random intercept model. Because of the ‘fan’ shape of the individual trajectories (easiest to see if you set both sd_ind and sd_resid to small values). the correlations will decrease over time (e.g., the correlation between the time 1 and time 2 observations will be higher than the correlation between the time 1 and time 3 observations) and the standard deviations will increase over time (e.g., the standard deviation at time 2 will be higher than the standard deviation at time 1, and the standard deviation at time 3 will be higher still).

Exercise 3

Fit the same model as above, but with 500 observations.

  1. Estimate all correlations between observations at time points 1, 2 and 3.

  2. Estimate the standard deviations for each time point.

Hint: You don’t have to fit a model for this. And just like we used cor() to extract correlations earlier, we can use var() to extract the variance-covariance matrix. The variances are on the main diagonal of this matrix.

Note

In our random intercept model, we had three time points. Can we also estimate the same model with only two time points?

The answer is yes and no, but mostly no …

The answer is yes, because everything is possible if we make enough assumptions. 😊 And fitting the model using only time points 1 and 2 actually works well. We can estimate the mean treatment effect (slope) and the variation in treatment effect (random slope).

But imagine an alternative (random intercept) model where 1) the treatment effect is the same for each subject, and 2) the residual standard deviation is allowed to be different (higher) at the second time point than at the first. This will actually give us the exact same fit as the above random slope model.

So we have two alternative models, one where the treatment effect differs between subjects, and one where it doesn’t. Using data from only two time points, we can’t distinguish between the two models. So if we’re interested in estimating the variation in treatment effect, not just the mean treatment effect, we need (at least) three time points.

That’s why the answer to the question is really mostly no.

Simplicity and real life

For simplicity, we have fitted very simple models on simplified datasets in this exercise. And the interpretation of our effect estimates are based on many assumptions.

For one, we pretended that the our estimates corresponded to the effect of taking the ‘happiness pill’. But if there were other things happening at the same time, this would not be true.

For example, if the first measurement was taken in March, the second in April and the third May, the (apparent) increase in mean happiness might be caused by the weather getting warmer and sunnier, not (only) by the effect of the pill.

For a real study, we would instead randomise the subjects to either a) get the pill, or b) get a placebo. And we would then compare the effect of the pill over time against the effect of the placebo over time. We can use a mixed-effect model for this too. (If the first time point is a baseline measurement taken before getting the pill/placebo, we would preferably force the treatment effect to be 0 at this time point.)

Multiple random intercepts or slopes

We can also fit models with multiple random intercepts (and/or random slopes). This is typically used for hierarchical data, where we have one random intercept for each level. Looking back to our exam results example, we can have:

(1 | county) + (1 | school) + (1 | class)

Here we have one random intercept for county, one for school and one for class. (If we had multiple exam results for each pupil, we could also have a ‘pupil effect’, (1 | pupil).)

Exam results for pupils in the same class are dependent due to a ‘class’ effect. For example, they all share the same teacher. A good teacher might raise the exam results of all the pupils (and a bad teacher might lower them). And if the class has a few noisy pupils, this will affect the learning environment and thus the exam results of all the pupils in that class.

Similarly, the ‘school’ effect induces a dependence between pupils at the same school. (For example, different schools may have different teaching strategies or different qualities of teaching equipment.)

And similarly, there could also be a ‘county’ effect.

The correlation between exam results will be greatest for two pupils in the same class, smaller (but still positive) for two pupils in the same school but in different classes, and smaller yet (and still positive) for two pupils in the same county but in different schools.

The random effects are assumed to be independent, and the factors levels of the corresponding variables must be globally unique. (So if the classes are named ‘a’, ‘b’, ‘c’, … in multiple schools, we would have to rename them to be unique, e.g., to ‘school1_a,school2_a,school3_a`, … ‘school1_b’, ‘school2_b’, ‘school3_b’ …)

A generalised mixed-effects model

We can also combine generalised linear models (GLMs) and random effects. Then we get generalised linear mixed-effects models (GLMMs).

Note that even though the conditional response is no longer Gaussian, the random effects are still assumed to be so. And they are included on link scale, just like the fixed-effects variables.

We’ll fit a simple model on inflorescence counts from this week’s videos. We include three separate random intercepts.

load(url("https://raw.githubusercontent.com/embruna/Brooks_etal_2019_HeliconiaData/master/hdat.RData"))

mod_pois = glmmTMB(infl ~ lsize * habitat + (1 | plot) + (1 | year) + (1 | ID),
  family = poisson(), data = hdat
)
summary(mod_pois)
 Family: poisson  ( log )
Formula:          infl ~ lsize * habitat + (1 | plot) + (1 | year) + (1 | ID)
Data: hdat

     AIC      BIC   logLik deviance df.resid 
 11153.7  11213.5  -5569.8  11139.7    37749 

Random effects:

Conditional model:
 Groups Name        Variance Std.Dev.
 plot   (Intercept) 0.09197  0.3033  
 year   (Intercept) 0.28542  0.5342  
 ID     (Intercept) 0.78465  0.8858  
Number of obs: 37756, groups:  plot, 10; year, 12; ID, 5290

Conditional model:
                        Estimate Std. Error z value Pr(>|z|)    
(Intercept)              -6.8558     0.3084 -22.231  < 2e-16 ***
lsize                     2.6336     0.1485  17.740  < 2e-16 ***
habitatcontinuous         0.6373     0.3109   2.050 0.040403 *  
lsize:habitatcontinuous  -0.6175     0.1602  -3.856 0.000115 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Exercise 4

The inflorescence counts vary between different plots, years and individuals. Based on the model (and after taking into account the fixed effects), which of these three factors is the largest contributor to variation (on the link scale)?

Exercise 5
  1. Use the plot_predictions() function from the marginaleffects package to plot the expected/predicted number of inflorescences as a function of log-size, plot and habitat.

It’s possible to arrange the estimates/predictions in many ways, but for this example, create the plot so that you get log-size on the x-axis, the number of inflorescences on the y-axis, a line for each plot (the plot variable) and a panel for each type of habitat.

  1. Repeat the exercise, but now with a line for each year (instead of a line for each plot).

Note: Since year is coded as a numeric variable, in order to plot all the years, you have to write year = unique instead of just "year" when specifying the variables to plot (in the condition argument). (Alternatively, you could also convert the year to a factor before fitting the model. The model is treating year as a factor in any case.)