Earlier this year I wrote a post on calculating R squared values for mixed models.
It turned out a lot of people had been having the same problem that I had been having – basically we didn’t know how well our mixed models fit our data.
Thankfully a paper in Methods in Ecology and Evolution by Nakagawa & Schielzeth helped to get close to solving this problem.
The only problem was that the code was a little difficult to implement.
Now Kamil Barton has written a function for this calculation in my current favourite R package – MuMIn. (Edit – as Erika Berenguer pointed out on twitter this doesn’t work for all versions of MuMin only for versions later than 1.19. So if you’ve got an older version use the update.packages() function.)
Basically the function works like this:
- You give it your mixed model
- It spits out marginal and conditional R squared values
The marginal R squared values are those associated with your fixed effects, the conditional ones are those of your fixed effects plus the random effects. Usually we will be interested in the marginal effects.
To show how this works we’ll use the example given in the paper using beetle body length.
First get the dataset we will use from here.
#first load in the packages you want require(lme4) require(MuMIn) #Next the data BeetleBody<-read.csv("Wherever_you_put_the_data") # Fit a model including fixed and all random effects mF <- lmer(BodyL ~ Sex + Treatment + (1 | Population) + (1 | Container), data = BeetleBody) r.squaredGLMM(mF)
This should spit out the following:
R2m R2c
0.3913021 0.7406447
Showing that your marginal R squared is 0.39 and your conditional R squared is 0.74 – not bad.
Of course you should do model simplification or model averaging in an attempt to get a parsimonious model before you do any of this, but I just wanted to flag this up.
Apparently this function is still in it’s ‘experimental stages’ so if you manage to break it let the people who put MuMIn together know.
If you have any comments, as ever put them below.
I’ll get back to writing a non-stats blog post soon.
Jon Lefcheck @jslefche Has coded a really neat function for this too.
http://jslefche.wordpress.com/2013/03/13/r2-for-linear-mixed-effects-models/
Produces a handy table of R2GLMMm, R2GLMMc and AIC. Works well, but requires lme4 v.1.0-4 for use with glmers. Have yet to test the output against that from the MuMIn function – has anyone done this?
I tried both functions (MuMIn and Jon Lefcheck’s), both gave the same results reassuringly!
Simple and helpful, thanks!
Sweet stats god, this just ensured the next week in lab will be glorious! THANK YOU
Glad to be of service 🙂
Hi,
I have one question about formula.
(1 | Population), you used this, is this sample formula or we should use sex+treatment instead of 1. Such as (sex=treatment | Population)
Thanks
Sorry I am away at the moment so haven’t been keeping on top of my blog. (1|population) tells r to fit a different intercept for each different population. You can’t fit the second random effect you specify I don’t think.
Thanks, i understand it. Very helpful
How to get the P-value for each parameter to know their significance in the model (REML). Additionaly, how to get their R^2 pseudo for each model.
Thanks
Is it possible to get conditional and marginal R2 in model averaging? If so, what is the code in R.
Hi, I’m not sure whether you can get the R squared from model averaging. What I have done in the past is to give the R squared values of the top models (for me those with a delta AICc<7). Have you had any ideas on how to do this since you made this comment?
Sorry if this is a dumb question but this function can only be used for “generalized” mixed models and not mixed models with a continuous dependent variable, correct?
Generalised refers to whether your model assumes that the errors around your predictions are normal or not. This method can be used for mixed models that contain continuous, or categorical variables, or a mixture of the two.
Thanks for clarifying!!!
I am getting the same values for R2m and R2c. Does anyone have any thoughts on why this would be? The random effect I am using is individual (i.e. (1 | subject)).
The only difference between R2m and R2c is in the numerator, which is Vfixed for R2m and Vfixed + Vrandom for R2c. If they are equal, Vrandom, which is the intersubject variance in your case, must have been estimated at zero. This suggests that the clustering of observations within subjects is either non-existent or too weak to detect.
Dear all
I am dealing in lmer4 with a binomial(percentages, proportions) GLMM that is a little complex.
My most parsimonius only has one fixed variable. It has two levels A and B with 60 and 6 observations respectively (I think tht this is a aim problem but I am not sure).
The R^2 values are really strange:
R2m R2c
0.009962046 0.009962046
I had a 2.1 overdispersion value. To solve this and due to glmer doesn´t work with quasi families, I use this:
baba$obs<- 1:nrow(baba)
model_OD<-glmer(propNb~Trap+(1|Nest)+(1|obs),data=baba,family="binomial")
This improve the AIC and overdispersion values but not R^2. The most strange thin is thtat the same model works very well with other similar variables
Has somebody anyidea about what could be happening? There is solution? It is the model so bad than looks or I can present itcarefully?
Yhank you so much and sorrif the question is so obvius.
Eneko
Hey Eneko. I’d recommend posting on somewhere like stackoverflow about this as I can’t really help with this problem. See the link on how to make a good, repeatable example if you decide to take this route. http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
Hi Eneko,
I’ve had a look inside r.squaredGLMM and it looks like the function adds an observation-level random effect when it isn’t already present. That’s why you get the same values with and without the obs random effect. you can see this by calculating R2m and R2c both manually and with r.squaredGLMM. I’ve done this with the gm1 and gm2 examples from ?glmer (code below).
I’m surprised that r.squaredGLMM does this. I’ll contact the package developer, Kamil Bartoń, and ask him if this is intended. It also looks like r.squaredGLMM adds the OD random effect where the outcome is binary rather than binomial, which it shouldn’t do.
(It’s my opinion that most Poisson and binomial GLMM fits would be improved by adding an observational random effect, but I think that should be the modellers choice.)
Best wishes,
Paul
PS What is your response propNb? If it’s a proportion then glmer should generate a warning about non-integer successes. However I’m assuming that it has the form cbind(successes, failures).
# without obs random effect
(gm1 <- glmer(cbind(incidence, size – incidence) ~ period + (1 | herd),
data = cbpp, family = binomial))
Vf <- var(model.matrix(gm1) %*% fixef(gm1))
Vherd <- as.vector(VarCorr(gm1)$herd)
Ve <- 0 # because no OD effect
Vdist <- pi^2 / 3
Vtotal <- Vf + Vherd + Ve + Vdist
# R2 values are 9% and 19%:
(R2m <- Vf / (Vtotal))
(R2c <- (Vf + Vherd) / (Vtotal))
# with obs random effect
cbpp$obs <- 1:nrow(cbpp)
(gm2 <- glmer(cbind(incidence, size – incidence) ~ period + (1 | herd) + (1|obs),
data = cbpp, family = binomial))
Vf <- var(model.matrix(gm2) %*% fixef(gm2))
Vherd <- as.vector(VarCorr(gm2)$herd)
Ve <- as.vector(VarCorr(gm2)$obs)
Vdist <- pi^2 / 3
Vtotal <- Vf + Vherd + Ve + Vdist
# R2 values are now 11% and 11%:
(R2m <- Vf / (Vtotal))
(R2c <- (Vf + Vherd) / (Vtotal))
# However using r.squaredGLMM R2 values are 11% & 11% for both models
# because r.squaredGLMM automatically adds the obs random effect
# to gm1 making it equivalent to gm2:
r.squaredGLMM(gm1)
r.squaredGLMM(gm2)
thank you, super helpgul!
However, when trying to fit the following mixed model logistic regression with only intercept as random .
mod6 <- glmer(alert ~ Habitat+NDVI+ (1 | bird.ID), family=binomial , data=alert_df, control=glmerControl(optimizer="bobyqa"),nAGQ=10)
I get this error:
"fitting model with the observation-level random effect term failed. Add the term manually"
Trying Paul Johnson formula description helped, but I cam not sure I have done it correctlly, since I wanted to exclude each row-observation random effect:
Vf <- var(model.matrix(alert_new6) %*% fixef(alert_new6))
Vbird <- as.vector(VarCorr(alert_new6)$bird.ID)
Vdist <- pi^2 / 3
Vtotal <- Vf + Vbird + Vdist
did I get it all wrong?
thank you
This looks right to me. (Note that this simple code is only suitable for random intercepts models, not random slopes models.)
Hiii, Iam just asking if there is a similar method for computing R2m and R2c in spss instead of calculate them manually. In fact, I don’t Know how to use the method of Nakagawa and Schielzeth to calculate R2 by myself.
I don’t know anything about spss I’m afraid. It’s been 10+ years since I have used it!
Thanks
I think you’d have to do it manually using the formulae in N&S. If your model is a simple Gaussian LMM without random slopes, you need the fixed effects variance (Vf), the sum of the random effects variances (Vr) and the residual variance (Ve). Then R2m = Vf / (Vf + Vr + Ve) and R2c = (Vf + Vr) / (Vf + Vr + Ve). Vr and Ve should be in your model output. Vf you could get by making a new column of the fitted (= predicted) values for your data, and taking the variance of these. Alternatively, calculate the variance of the responses (called the dependent variable in SPSS I think) as Vtot, then using Vf = Vtot – Vr – Ve should give you a very similar result. If your model isn’t Gaussian then it’ll be a little more complicated — see Table 2 of N&S.
Thanks alot. This was really very helpful. But when I calculated r2 m and r2 c, both were increasing after adding predictors at the 1st 3 models then only r2m was increasing in the subsequent models while r2 c was decreasing. What dose that mean? Am I wrong in something?
I used linear mixed models analysis in spss for my data.My models are 2 level with one random factor, actually it is growth curves for repeated measures analysis. Do I need calculation both R2 m and R2 c or only R2 m?
Bearing in mind that R2m = Vf / (Vf + Vr + Ve) and R2c = (Vf + Vr) / (Vf + Vr + Ve), then increasing R2m means that Vf is increasing (the fixed effects are explaining more variation, although not necessarily significantly more), while decreasing R2c means that the unexplained variation between individuals (estimated by Vr) is decreasing more than Vf is increasing, meaning that Ve must be increasing. This suggests that the new predictor is explaining variation between individuals (Vr) rather than between observations within individuals (Ve). That’s what you’d expect if the new predictor is an individual-level variable. It seems a bit strange that Ve is increasing, but I’m not sure that’s a problem. If the model fits well, i.e. passes the usual checks, it’s probably ok.
Hello, thanks so much for such a helpful post. There seems to be an error in the beetle dataset–could you please fix it so that we can take a look at it as well?