Page 10 - Lovison_alii_2010
P. 10
MODELING POSIDONIA OCEANICA GROWTH DATA
explanatory variable in this case) acts linearly on the expected value of Growth. Although the logarithmic link only prevails in 32
meadow-year combinations, it is interesting to underline that in most of the 213 cases in which the identity link is chosen, the logarithmic link
is the second best, with an AIC very close to that of the identity link. In conclusion, a Generalized Linear Model with Gamma distribution and
identity or log link seems to represent a sensible choice in most cases. Nevertheless, the fact that other choices give sometimes the best fit
suggests that no ‘‘general recipe’’ should be trusted and a preliminary model selection should always be conducted when analyzing
P. oceanica growth data by means of GLM’s.
5. LONGITUDINAL ANALYSIS OF P. OCEANICA GROWTH DATA
We now turn to the statistical problems posed by the full use of complete series of past growth data reconstructed through lepidochronology.
The longitudinal nature of P. oceanica growth data obtained from such reconstructive technique can be considered from two different points
of view.
On one hand, it can be seen as a nuisance, since in making inference (estimation, hypothesis testing, etc.), particularly on the effects of
non-time-varying variables (like, for example, depth or site), we would like to be able to use standard methods available for independent
units. But ignoring dependence in observations within cluster in general will result in inefficient parameter estimates and less powerful tests,
and hence in misleading inference. Then, we must turn to methods suitable to deal with correlated data.
On the other hand, lepidochronology must be considered a powerful technique because it allows the study of dynamic features of
P. oceanica not only at the time of sampling, but also for a number of preceding years, without requiring repeated sampling efforts (Pergent,
1990).
5.1. Extending GLM’s to handle longitudinal data: from Generalized Linear Models to Generalized Linear Mixed Models
In our introduction to GLMs in Section 4 we pointed out that in standard GLM’s the observations are assumed to be independent,
conditionally on the predictors. In order to adapt these models for situations with a form of grouping, or clustering, in the data, and hence to
account for possible intra-cluster correlation, various developments have been put forward in the last 20 years.
There are broadly three approaches for extending the Generalized Linear Model to take dependence into account: marginal (also called
GEE) models (Liang and Zeger, 1986); mixed models (Laird and Ware, 1982), and conditional models (Cox, 1972). A good review can be
found in Fahrmeir and Tutz (2001). Mixed models consider the dependence of observation as a consequence of neglecting heterogeneity
between sampling units. The basic idea here is that units belonging to the same cluster, e.g., to the same longitudinal set, ‘‘resemble’’ each
other more closely than units belonging to different clusters because they share some common traits. In statistical terms, this implies that they
are independent, conditionally on the effects they have in common, but turn out to be dependent when such effects are considered marginally,
i.e., averaging over units. Their application to various contexts characterized by correlated data, such as longitudinal studies and small area
estimation, has been well described in Breslow and Clayton (1993).
This mechanism generating dependence appears to be the most likely in the analysis of longitudinal data of P. oceanica annual growth. It is
reasonable to assume that shoots in the same meadow are heterogeneous, owing to biological and genetic inter-individual diversity
(Tomasello et al., 2009) or other unobservable/unobserved variables (Balestri et al., 2003), and therefore display serial autocorrelation when
such heterogeneity is ignored by marginalization. For this reason, we have chosen to analyze longitudinal data on P. oceanica annual growth
by GLMM, and give now a brief account only of this extension of the standard GLM.
Let Y it denote the t-th response in cluster i, i ¼ 1; 2; ...; m; t ¼ 1; 2; .. .; n i . Let x it denote a column vector of explanatory variables for the
t-th response in cluster i and b the corresponding vector of fixed, unknown parameters. Let z it denote a vector of coefficients of random
effects, and b the q-vector of random parameters for cluster i. A GLMM assumes that:
i
the conditional distribution of Y it , given the random parameter b , belongs to the Natural Exponential family:
i
y it u it bðu it Þ
fðy it jb i Þ¼ exp þ cðy it ; fÞ
f
the conditional mean m ij ¼ E½y ij jb is related to a linear predictor containing both fixed and random effects by the link function gðÞ:
i
T T
gðm it Þ¼ x b þ z b i
it
it
the q-dimensional random parameter b is distributed as a r.v. with mean 0 q and variance–covariance matrix S B .
i
In particular, in our application of GLMM to P. oceanica, we have opted for a simple version of this model, in which again the response
distribution is Gamma, the link function is either log or identity, and the only random parameter is the intercept for each shoot:
T
gðm it Þ¼ x b þ b 0i
it
2
with b 0i Nð0; s Þ.
B
This choice corresponds to assuming that the shoots in the same meadow are heterogeneous, since some tend to have a higher average
annual growth (and hence a higher intercept) and some others a lower growth (and hence a lower intercept), but, once this heterogeneity has
been accounted for via the variance of the distribution of the random intercept, the repeated measures of annual growth for the available
lepidochronological years are independent. This model fits satisfactorily, and gives more accurate estimates of the standard errors of the fixed
effects (age, calendar year, site, depth, etc.), providing more power for the tests of significance of these explanatory variables. 379
Environmetrics 2011; 22: 370–382 Copyright ß 2010 John Wiley & Sons, Ltd. wileyonlinelibrary.com/journal/environmetrics