R squared for mixed models – the easy way

Earlier this year I wrote a post on calculating R squared values for mixed models.

It turned out a lot of people had been having the same problem that I had been having – basically we didn’t know how well our mixed models fit our data.

Thankfully a paper in Methods in Ecology and Evolution  by Nakagawa & Schielzeth helped to get close to solving this problem.

The only problem was that the code was a little difficult to implement.

Now Kamil Barton has written a function for this calculation in my current favourite R package – MuMIn. (Edit – as Erika Berenguer pointed out on twitter this doesn’t work for all versions of MuMin only for versions later than 1.19. So if you’ve got an older version use the update.packages() function.)

Basically the function works like this:

  • You give it your mixed model
  • It spits out marginal and conditional R squared values

The marginal R squared values are those associated with your fixed effects, the conditional ones are those of your fixed effects plus the random effects. Usually we will be interested in the marginal effects.

To show how this works we’ll use the example given in the paper using beetle body length.

First get the dataset we will use from here.


#first load in the packages you want
require(lme4)
require(MuMIn)

#Next the data
BeetleBody<-read.csv(&quot;Wherever_you_put_the_data&quot;)

# Fit a model including fixed and all random effects
mF <- lmer(BodyL ~ Sex + Treatment + (1 | Population) + (1 | Container), data = BeetleBody)

r.squaredGLMM(mF)

This should spit out the following:

R2m R2c
0.3913021 0.7406447

Showing that your marginal R squared is 0.39 and your conditional R squared is 0.74 – not bad.

Of course you should do model simplification or model averaging in an attempt to get a parsimonious model before you do any of this, but I just wanted to flag this up.

Apparently this function is still in it’s ‘experimental stages’ so if you manage to break it let the people who put MuMIn together know.

If you have any comments, as ever put them below.

I’ll get back to writing a non-stats blog post soon.

Advertisements

R squared for mixed models

Goodness of fit

As a disclaimer, this is probably the geekiest post I have written on here yet. It’s about statistics. You have been warned.

Don’t disappear just yet. It does get better.

For a while I have been dissatisfied with mixed models and how they have been used in ecology. On the one hand they are wonderful and they allow you to deal with nasty problems like pseudoreplication. On the other hand I have always been concerned about the lack of any measure of goodness of fit. Many papers now use mixed models without reporting something like an R2 statistic and so we have little idea how good (or bad) these models are at describing the patterns you are looking at. R2 statistics are important (as Jeremy Fox (sorry I misatributed credit here – thanks to Nick Golding on Twitter for the correction) Brian McGill has pointed out at the ever excellent dynamic ecology blog) and if our R2 is small we may well be barking up the wrong tree when we’re looking to explain how our system works. In his blog post Jeremy Brian points out research that suggests ecologists work tends to have an R2 of 0.02-0.05. We should be ashamed of this.

Anyway, papers with mixed models often report AIC values which as a means of model selection are really useful. However, they don’t inherently tell you much about the fit of your model.  This means that for a whole swathe of new papers we have little idea how good the fit of models really is.

This is where a new paper by Shinichi Nakagawa and Holger Schielzeth published in Methods in Ecology and Evolution comes in.

They have come up with R2 equivalents for mixed models. And I love them for it.

These new methods allow us to calculate the R2 values for fixed effects in our models and fixed effects and random effects combined. Being statistical types they gave them catchy names like R2GLMM(m) and R2GLMM(c). Even though the names are clunky, if you are an ecologist (or even non-ecologist) that uses mixed models you must read this paper. It includes good worked examples and importantly an R script which will allow you to calculate the statistic (find this in the supplementary bit which you can access for free, I have included it below as well but please go and read the paper).

I hope that this will lead to papers which use mixed models to calculate their goodness of fit more regularly so we can assess how good people’s models actually are. I for one will certainly be using it.

*Update – I have added a blog post here showing how to calculate R2 for mixed models more easily, here.