Using metafoR and ggplot together:part 1

I really like meta-analysis, as some of you might already  know.

It has it’s critics but used well it can help us achieve one of the major goals of applied ecology: generalisation. This way we can say how we think things generally work and draw conclusions bigger than  individual studies.

However, for some reason people have got it into their head  that meta-analysis is hard. It’s not. The hardest thing is the same as always, coming up with a good question/problem.

However, for a first timer it can seem daunting. Luckily there are a few good R packages for doing meta-analysis, my favourite of which is metafor.

The aim of these posts is to take some of the mystery out of how to do this and was prompted by a comment from Jarret Byrnes on a previous blog post.

In the first post I will deal with effect sizes, basic meta-analysis, how to draw a few plots and some analyses to explore bias. In the second I will go through meta-regression/sub-group analysis and a few more plots. It shouldn’t be too hard to understand.

For this I will use a dataset based on some work I am doing at the moment looking at the effects of logging on forest species richness. However, I don’t really feel like sharing unpublished data on the web so I’ve made some up based on my hypotheses prior to doing the work.

First of all we need to install the right packages and open them

install.packages("metafor")
install.packages("ggplot2")

library(metafor)
library(ggplot2)

Next we need the data, download it from here and get r to load it up:

logging<-read.csv("C:whereveryouputthefile/logging.csv")

This file contains data on species richness of unlogged (G1) and logged (G2) forests, it’s not perfect as all of the data is made up – it should be useful for an exercise though.

First we need to calculate the effect size. This is essentially a measure of the difference between a treatment and control group, in our case logged (treatment) and unlogged (control) forests. We can do this in different ways, the most commonly used in ecology are the standardised mean difference (SMD) and log mean ratios (lnRR).

The SMD is calculated like this:

\frac{\bar{X_{1}}-\bar{X_{2}}} {SD_{pooled}}

where \bar{X_{1}}and \bar{X_{2}} are the different group means, SD_{pooled} is the pooled standard deviation ( I won’t explain this here but there are good explanations elsewhere). Essentially what this does is to give an effect size based on the difference between groups with more imprecise studies having a smaller effect size.

The lnRR is calculated as:

ln{\bar{X_{1}}-ln\bar{X_{2}}}

which gives the log of the proportional difference between the groups. I quite like this as it is easier to interpret than the SMD as the units can be back-transformed to percentages.

In metafor you calculate SMD by doing this:

SMD<-escalc(data=logging,measure="SMD",m2i=G1,sd2i=G1SD,n2i=G1SS,m1i=G2,sd1i=G2SD,n1i=G2SS,append=T)

and lnRR like this:

ROM<-escalc(data=logging,measure="ROM",m2i=G1,sd2i=G1SD,n2i=G1SS,m1i=G2,sd1i=G2SD,n1i=G2SS,append=T)

From now on in this post I will use the lnRR effect size.

Next we carry out a meta-analysis using the variability and effects sizes of each set of comparisons to calculate an overall mean effect size, using the idea that we want to give most weight to the most precise studies.

In metafor you do that by doing this:

SMD.ma<-rma.uni(yi,vi,method="REML",data=ROM)
summary(SMD.ma)

This is the code for a mixed effects meta-analysis, which basically assumes there are real differences between study effect sizes and then calculates a mean effect size. This seems sensible in ecology where everything is variable, even in field sites which quite close to each other.

The code it spits out can be a bit daunting. it looks like this:

Random-Effects Model (k = 30; tau^2 estimator: REML)

  logLik  deviance       AIC       BIC
-27.8498   55.6997   59.6997   62.4343

tau^2 (estimated amount of total heterogeneity): 0.3859 (SE = 0.1044)
tau (square root of estimated tau^2 value):      0.6212
I^2 (total heterogeneity / total variability):   98.79%
H^2 (total variability / sampling variability):  82.36

Test for Heterogeneity:
Q(df = 29) = 2283.8442, p-val < .0001

Model Results:

estimate       se     zval     pval    ci.lb    ci.ub
 -0.4071   0.1151  -3.5363   0.0004  -0.6327  -0.1815      ***

Essentially what it’s saying is that logged forests tend to have lower species richness than unlogged forests (the model results bit), and that there is quite a lot of heterogeneity between studies (the test for heterogeneity bit).

We can calculate the percentage difference between groups by doing taking the estimate part of the model results along with the se and doing this:

(exp(-0.4071)-1)*100
(exp(0.1151*1.96)-1)*100

to get things in percentages. So, now we can say that logged forests have 33% less species than unlogged forests  +/- a confidence interval of 25%.

We can summarise this result using a forrest plot in metafor, but this is a bit ugly.

forest.rma(SMD.ma)

Forest_plot

What did I tell you? Pretty ugly right?

Much better to use ggplot2 to do this properly. But this is a bit of a pain in the arse:

theme_set(theme_bw(base_size=10))
forrest_data<-rbind(data.frame(ES=ROM.ma$yi,SE=sqrt(ROM.ma$vi),Type="Study",Study=logging$Study),data.frame(ES=ROM.ma$b,SE=ROM.ma$se,Type="Summary",Study="Summary"))
forrest_data$Study2<-factor(forrest_data$Study, levels=rev(levels(forrest_data$Study)) )
levels(forrest_data$Study2)
plot1<-ggplot(data=forrest_data,aes(x=Study2,y=ES,ymax=ES+(1.96*SE),ymin=ES-(1.96*SE),size=factor(Type),colour=factor(Type)))+geom_pointrange()
plot2<-plot1+coord_flip()+geom_hline(aes(x=0), lty=2,size=1)+scale_size_manual(values=c(0.5,1))
plot3<-plot2+xlab("Study")+ylab("log response ratio")+scale_colour_manual(values=c("grey","black"))
plot3+theme(legend.position="none")

better_forrest

There. Much better.

With a little tweak we can even have it displayed as percentage change, though your confidence intervals will not be symmetrical any more.

theme_set(theme_bw(base_size=10))
plot1<-ggplot(data=forrest_data,aes(x=Study2,y=(exp(ES)-1)*100,ymax=(exp(ES+(1.96*SE))-1)*100,ymin=(exp(ES-(1.96*SE))-1)*100,size=factor(Type),colour=factor(Type)))+geom_pointrange()
plot2<-plot1+coord_flip()+geom_hline(aes(x=0), lty=2,size=1)+scale_size_manual(values=c(0.5,1))
plot3<-plot2+xlab("Study")+ylab("Percentage change in richness\n as a result of logging")+scale_colour_manual(values=c("grey","black"))
plot3+theme(legend.position="none")+ scale_y_continuous(breaks=c(-50,0,50,100,150,200))

better_forrest_percentage

There that’s a bit more intuitively understandable, isn’t it?

Right, I will sort out the next post on meta-regression and sub-group analysis soon.

In the meantime feel free to make any comments you like.

The role of ecologists in influencing policy

Policy not based on evidence is dangerous.

Or at this is what Ben Goldacre would have you believe. Goldacre, a doctor, writer and journalist claims that much policy making is ad-hoc and that we should be carrying out more randomised control trials of policy relevant questions.

I agree with him.

However, there is much to be positive about as well.

The  UK government recognises the importance of science in forming policy. It acknowledges that evidence-based environmental policy is important and there are various independent bodies which help politicians access this evidence. There have also been positive moves  in following expert advice with the recent approval of Nature Improvement Areas, as suggested in the Making Space for Nature report. In addition at the recent British Ecological Society Annual Conference I heard about the government funding for work into Ash dieback in an attempt to combat the disease.

So obviously the government does value science in policy making. We are actually reasonably lucky in this country compared to some less enlightened places.

Now the bad.

The real problem for a lot of environmental issues is not that there is not enough evidence, but rather that they are fringe issues. While reducing crime will always be a priority, many issues related to conservation and environmental policy making are lower down the pecking order.

These kind of things need work to make them an issue even before we try to work out what to do about them. Even policies on climate change, which is widely seen as a disaster waiting to happen, have shown little meaningful progress over the last 20 years.

This is simply because any changes would require a change in more or less everyone’s lifestyle. The public don’t want this and governments around the world want to stay in power, so they don’t do anything.

Even when issues don’t conflict with people’s day to day lives getting good environmental policy can take a long time. Ecologists often want action quickly on their pet topic but this rarely happens.

Just how long things can take was emphasised when I saw Hugh Possingham speak in Glasgow this year. He has been working on designing  protected areas so they are as representative of the marine environment as possible for 15 years. In the last year or two the Australian government has started taking this seriously and is now planning the largest MPA network in the world (though Possingham doesn’t think it’s perfect). Although he wasn’t the only one involved in bringing this to fruition it has been a long road for Possingham and the MPA network. This is just one example of how long things can take to develop into policies.

So how can we put environmental concerns higher up the political pecking order? Firstly, we need to talk to people more about why we care about the environment.

We need to work in schools and do more outreach projects. The public need to care before most politicians will do much about things.

Secondly, we should be willing to communicate policy relevant work to politicians directly. We should be happy to suggest policies to politicians – though it is down to them as to what they do with this advice.

There have been many column inches filled about the lack of scientific education amongst politicians. This is true but there is also a lack of political education amongst scientists. We should be looking to create the biggest impact possible with our work. We won’t do this by sitting in rooms talking to each other.

Testing environmental policies with randomised control trials

How do we know that what we do to manage biodiversity is doing what we want?

The sad truth is that a lot of the time we don’t. A lot conservation management is done in particular ways because of a mate once told the ranger a certain method was good or because they wish to maintain the status quo, but very little is actually based on evidence.

The best way we have of understanding the effects of management on biodiversity is through properly controlled studies. Despite the increasing complexity in ecology, which is increasingly resembling a branch of applied mathematics, many conservation questions are about whether treatments a or b are better than doing nothing at all.

This is one of the few ways we can attribute causation in ecology and is held as the gold standard for other disciplines. It bears repeating that correlation does not equal causation and most ecology is observational in nature, rather than manipulative.

Although there is a push by some  to answer these questions and compile evidence for particular management techniques (notably the excellent conservationevidence.com), much remains unevaluated. There is a feeling by some publishers that such experiments are boring and this stops researchers pursuing this type of research as well as reducing replication even when they are published.

Though we sometimes have little idea about how to manage biodiversity at the site scale, environment policy can be equally difficulat to implement  Randomised controlled trials (RCTs) have the potential to be powerful tools to aid this decision making. Many policy questions relate to how people interact with their environment. For example, is it best to pay up-front for agri-environment schemes or to pay based on the results they deliver?

Would it be better to pay farmers up front for planting wildflower margins or should we pay them based on delivery?

Questions like this could feasibly be addressed by a large RCT. However, these would need government backing and funding. Some people might see the use of RCTs, when government is cutting such large parts of it’s budget, as an unnecessary indulgence. However, without these trials we are fumbling around in the dark trying to find a sticking plaster when we might be in need of a visit to the hospital. It’s a clumsy analogy but without RCTs of such policy options we do not know whether what we are doing is the best thing to do, or even if it is working at all.

This type of thing is being championed by Ben Goldacre and Tim Harford, who argue we should have RCTs for almost everything. Goldacre has also written a report for Cabinet Office in the UK, so it looks like this idea is gaining a bit of traction. RCTs have been identified as having potential for social policies such as the use of different teaching techniques in aiding reading. However, there is no reason why similar trials couldn’t be designed to test environmental policy.

This big idea would put an end to uninformed policy making on the environment. It would mean that we wouldn’t have to guess at which policy delivers the best results. Politicians would still get it wrong some of the time but at least they’d be presented with good evidence to help them make these decisions.

Politics would still however come first. Making decisions is a messy process and science doesn’t have the last say on such issues. As seen with the recent badger cull debate even the best evidence can have little value when faced with the might of industry lobbyists. But without this evidence, policy making will continue to fumble around in the dark.