Page 11 - Lovison_alii_2010
P. 11
G. LOVISON ET AL.
5.2. GLMM’s as a solution to the ‘‘pseudo-replication’’ problem
Hurlbert (1984) in a seminal paper, drew attention on the risks of applying standard statistical techniques to non-independent observations,
which he called ‘‘pseudo-replications’’.
The pseudo-replication problem is defined by Hurlbert as ‘‘ .. .the use of inferential statistics to test for treatment effects with data from
experiments where either treatments are not replicated (though samples may be) or replicates are not statistically independent’’.
Pseudo-replication most commonly results from wrongly treating multiple samples from one experimental unit as (independent) multiple
experimental units. This improper statistical treatment of such data implies an overestimate of the true variation, an increase of Type II error
risk and, as a consequence, the danger of invalid resulting conclusions.
A partial, and often inadequate, solution to the problem is represented by ‘‘sub-sampling’’: given repeated measurements on the same unit,
only a random sub-sample of such measurements is analyzed, in order to attenuate correlation and obtain approximately independently
distributed observations, to which standard statistical methods can then be applied. In its most extreme version, only one measurement is
randomly drawn for each unit, i.e., the sub-sampling size is one. As will be shown in Section 5.3, if on one hand sub-sampling attenuates
correlation, on the other it implies a loss of information (due to the reduction of the total sample size) and then requires a higher number of
sampling units to ensure a specified level of efficiency (in estimation) and power (in testing).
5.3. Longitudinal analysis of Sicily PosiData-1
To illustrate the different results that can be reached by dealing with dependence in longitudinal data through different approaches, we
selected the data on the Petrosino site. At this site there are 112 shoots available, for a total of 880 observations, which gives an average
number of about 7.86 lepidochronological years per shoot; actually, the number of years, i.e., the length of the longitudinal series, ranges
from minimum of 1 to a maximum of 21. The three sampled meadows are located at 6, 15, and 25 m of depth, respectively.
We set out to model the dependence of Rhizome elongation on Year, Age of the shoot and Depth.
We first do it ignoring completely the longitudinal nature of the data, and treating them as a sample of 880 independent observations, by
fitting a standard GLM with Gamma distribution and logarithmic link. We end up with the results in Table 5.
These results suggest there are significant (negative) main effects of Year and Depth, while Age has no significant effect, neither by itself
nor in interaction with Year.
But if we take dependence into account, we get a completely different picture. The results of fitting a GLMM with Gamma distribution and
logarithmic link, which accounts for dependence assuming random intercepts for the 112 shoots, are reported in Table 6.
Not only the main effects of all the three explanatory variables are now highly significant, but also the interactive effects of Age with Year
and Depth with Year result to be significant, suggesting that environmental conditions have worsened over time and have a negative effect on
the growth performance of P. oceanica, not only directly but through the interaction with Age.
The striking difference of the results obtained ignoring and considering intra-shoot dependence not only shows that such dependence
exists, but also that its inclusion in the model strongly affects the substantive interpretation of the data. This conclusion contradicts that of
Table 5. Petrosino—GLM Gamma-Log assuming independence
Estimate Std. Error t-ratio p-value
Intercept 5.0007 0.7046 7.098 0.000
Year 0.0477 0.0164 2.907 0.004
Age 0.0452 0.0655 0.690 0.490
Depth 0.7715 0.3151 2.448 0.014
Year: Age 0.0011 0.0015 0.716 0.474
Year: Depth 0.0109 0.0073 1.499 0.134
Table 6. Petrosino—GLMM Gamma-Log assuming random intercepts
Estimate Std. Error t-ratio p-value
Intercept 6.4224 0.5910 10.866 0.000
Year 0.0824 0.0141 5.833 0.000
Age 0.1032 0.0396 2.603 0.009
Depth 1.5152 0.2237 6.773 0.000
Year: Age 0.0022 0.0008 2.574 0.010
Year: Depth 0.0274 0.0049 5.542 0.000
St. Dev (Intercept) 0.4115
380
wileyonlinelibrary.com/journal/environmetrics Copyright ß 2010 John Wiley & Sons, Ltd. Environmetrics 2011; 22: 370–382