One of the exercises in Ecological Statistics
fits a species area curve to data from the Galapogos Islands. I ask students
to compare the estimated power coefficient \(z\) with the value 0.27 reported in a meta-analysis by Drakare et al. 2006. But Drakare et al. only reported the mean, not the standard error.^{1}

To do the comparison properly, I need to know how uncertain the average \(z\) is. I looked through the paper carefully, but couldn’t find a standard error anywhere! The individual estimates used in the meta-analysis are in an appendix, but the standard errors are not reported there either. They do exist, because Drakare et al. discuss the value of a “weighted” average, where poor estimates (large SE) of \(z\) get reduced weight. Oh well, I’ll do the best I can. I think if I ignore the variation between studies I’ll get an overestimate of the significance of the difference; inflated Type I error probability.

```
library("tidyverse")
library("NRES803") #devtools::install_github("atyre2/NRES803")
library("faraway") # for the data
drakare <- read_csv("https://drewtyre.rbind.io/post/data/Drakare_etal_2006_species-area-curves-review.csv")
```

The data come from Johnson and Raven (1973); I should check if any of the estimates in the Drakare data come from there.

`grep("Raven", drakare$Oref)`

`## [1] 194 196`

`drakare[193:197,"Oref"]`

```
## # A tibble: 5 x 1
## Oref
## <chr>
## 1 Hulten E. (1960) Flora of the Aleutian Islands. Cramer, Weinheim.
## 2 Johnson M.P. & Raven P.H. (1973) Species number and endemism: the galapagos A~
## 3 Johnson M.P. & Simberloff D.S. (1974) Environmental determinants of island sp~
## 4 Johnson M.P., Mason L.G. & Raven P.H. (1968) Ecological parameters and plant ~
## 5 <NA>
```

OK, we need to remove row 194. I included rows before and after the hits because sometimes more than one estimate is taken from a single reference, and the references are missing for those extra rows.

`drakare <- drakare[-194,]`

OK. Now I’ll get a mean and standard error for all the studies.

```
drakare %>%
summarize(mean_z = mean(z_power),
se_z = sd(z_power)/sqrt(n()))
```

```
## # A tibble: 1 x 2
## mean_z se_z
## <dbl> <dbl>
## 1 NA NA
```

Oh right, there are some studies that didn’t report z for the power method.
I could use `na.rm = TRUE`

for both `mean()`

and `sd()`

, but then
`n()`

would be reporting the incorrect number.

```
drakare %>%
filter(!is.na(z_power)) %>%
summarize(mean_z = mean(z_power),
se_z = sd(z_power)/sqrt(n()),
n = n())
```

```
## # A tibble: 1 x 3
## mean_z se_z n
## <dbl> <dbl> <int>
## 1 0.297 0.00936 784
```

Cooool. The SE is tiny because there are so many observations. Also, weighting the observations by their variance makes quite a difference – the weighted value reported is 0.27 compared to 0.30 unweighted.

OK, but before I get carried away, Drakare et al. did find differences by method (nested vs. independent) and “realm” (aquatic vs. terrestrial).

```
z_estimates <- drakare %>%
filter(!is.na(z_power)) %>%
group_by(method, Realm) %>%
summarize(mean_z = mean(z_power),
se_z = sd(z_power)/sqrt(n()),
n = n())
```

`## `summarise()` has grouped output by 'method'. You can override using the `.groups` argument.`

`z_estimates`

```
## # A tibble: 8 x 5
## # Groups: method [4]
## method Realm mean_z se_z n
## <chr> <chr> <dbl> <dbl> <int>
## 1 both aqu 0.122 0.0218 5
## 2 both ter 0.143 0.0203 6
## 3 ind aqu 0.255 0.0201 121
## 4 ind ter 0.281 0.0135 433
## 5 nes aqu 0.285 0.0189 41
## 6 nes ter 0.367 0.0184 127
## 7 <NA> aqu 0.267 0.0490 12
## 8 <NA> ter 0.442 0.0618 39
```

The Galapagos data is terrestrial and independent, because each observation is a distinct island. Now get the estimated coefficient of z for the Galapagos islands.

```
gala.1 <- lm(log(Species)~log(Area),data=gala)
summary(gala.1)
```

```
##
## Call:
## lm(formula = log(Species) ~ log(Area), data = gala)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.5442 -0.4001 0.0941 0.5449 1.3752
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.9037 0.1571 18.484 < 2e-16 ***
## log(Area) 0.3886 0.0416 9.342 4.23e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7842 on 28 degrees of freedom
## Multiple R-squared: 0.7571, Adjusted R-squared: 0.7484
## F-statistic: 87.27 on 1 and 28 DF, p-value: 4.23e-10
```

Consulting wikipedia I believe I want

\[ t = \frac{\bar{X_1}-\bar{X_2}}{s_{\bar{\Delta}}} \]

where

\[ s_{\bar{\Delta}} = \sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}} \]

I already have the things inside the square root; they are the square of the estimated standard errors. So …

```
gala.coefs <- tidy(gala.1)
s_combined <- sqrt(z_estimates[4, "se_z"]^2 + gala.coefs[2, "std.error"]^2)
s_combined
```

```
## se_z
## 1 0.04372116
```

which is ever so slightly bigger than the se from the model alone.

```
t <- (gala.coefs[2, "estimate"] - z_estimates[4, "mean_z"])/s_combined
t
```

```
## estimate
## 1 2.452262
```

OK, to get a p value for that, I need to get the degrees of freedom … it’s ugly.

```
(z_estimates[4, "se_z"]^2 + gala.coefs[2, "std.error"]^2)^2 /
(z_estimates[4, "se_z"]^4/433 + gala.coefs[2, "std.error"]^4/28)
```

```
## se_z
## 1 34.15
```

That makes sense … it is a little bigger than the sample size for the larger variance.

```
# have to coax t into a naked number, many ways to do this
pt(t[1,1, drop = TRUE], df = 34.15, lower.tail = FALSE)
```

`## [1] 0.009730172`

Nice – so that is a test of the hypothesis that the estimated coefficient
from Galapagos plants is the same as the unweighted average for all independently sampled
terrestrial systems. The p value is the probability of getting an estimate
as large as 0.39 *or larger*, given the null hypothesis was true. So the
Galapagos Islands plant communities are not like the other things. I’m not
sure what to make of that ecologically, but it’s cool.