Page 10 - Lovison_alii_2010
P. 10

MODELING POSIDONIA OCEANICA GROWTH DATA


           explanatory variable in this case) acts linearly on the expected value of Growth. Although the logarithmic link only prevails in 32
           meadow-year combinations, it is interesting to underline that in most of the 213 cases in which the identity link is chosen, the logarithmic link
           is the second best, with an AIC very close to that of the identity link. In conclusion, a Generalized Linear Model with Gamma distribution and
           identity or log link seems to represent a sensible choice in most cases. Nevertheless, the fact that other choices give sometimes the best fit
           suggests that no ‘‘general recipe’’ should be trusted and a preliminary model selection should always be conducted when analyzing
           P. oceanica growth data by means of GLM’s.


           5. LONGITUDINAL ANALYSIS OF P. OCEANICA GROWTH DATA

           We now turn to the statistical problems posed by the full use of complete series of past growth data reconstructed through lepidochronology.
           The longitudinal nature of P. oceanica growth data obtained from such reconstructive technique can be considered from two different points
           of view.
            On one hand, it can be seen as a nuisance, since in making inference (estimation, hypothesis testing, etc.), particularly on the effects of
           non-time-varying variables (like, for example, depth or site), we would like to be able to use standard methods available for independent
           units. But ignoring dependence in observations within cluster in general will result in inefficient parameter estimates and less powerful tests,
           and hence in misleading inference. Then, we must turn to methods suitable to deal with correlated data.
            On the other hand, lepidochronology must be considered a powerful technique because it allows the study of dynamic features of
           P. oceanica not only at the time of sampling, but also for a number of preceding years, without requiring repeated sampling efforts (Pergent,
           1990).


           5.1. Extending GLM’s to handle longitudinal data: from Generalized Linear Models to Generalized Linear Mixed Models
           In our introduction to GLMs in Section 4 we pointed out that in standard GLM’s the observations are assumed to be independent,
           conditionally on the predictors. In order to adapt these models for situations with a form of grouping, or clustering, in the data, and hence to
           account for possible intra-cluster correlation, various developments have been put forward in the last 20 years.
            There are broadly three approaches for extending the Generalized Linear Model to take dependence into account: marginal (also called
           GEE) models (Liang and Zeger, 1986); mixed models (Laird and Ware, 1982), and conditional models (Cox, 1972). A good review can be
           found in Fahrmeir and Tutz (2001). Mixed models consider the dependence of observation as a consequence of neglecting heterogeneity
           between sampling units. The basic idea here is that units belonging to the same cluster, e.g., to the same longitudinal set, ‘‘resemble’’ each
           other more closely than units belonging to different clusters because they share some common traits. In statistical terms, this implies that they
           are independent, conditionally on the effects they have in common, but turn out to be dependent when such effects are considered marginally,
           i.e., averaging over units. Their application to various contexts characterized by correlated data, such as longitudinal studies and small area
           estimation, has been well described in Breslow and Clayton (1993).
            This mechanism generating dependence appears to be the most likely in the analysis of longitudinal data of P. oceanica annual growth. It is
           reasonable to assume that shoots in the same meadow are heterogeneous, owing to biological and genetic inter-individual diversity
           (Tomasello et al., 2009) or other unobservable/unobserved variables (Balestri et al., 2003), and therefore display serial autocorrelation when
           such heterogeneity is ignored by marginalization. For this reason, we have chosen to analyze longitudinal data on P. oceanica annual growth
           by GLMM, and give now a brief account only of this extension of the standard GLM.
            Let Y it denote the t-th response in cluster i, i ¼ 1; 2; ...; m; t ¼ 1; 2; .. .; n i . Let x it denote a column vector of explanatory variables for the
           t-th response in cluster i and b the corresponding vector of fixed, unknown parameters. Let z it denote a vector of coefficients of random
           effects, and b the q-vector of random parameters for cluster i. A GLMM assumes that:
                     i
            the conditional distribution of Y it , given the random parameter b , belongs to the Natural Exponential family:
                                                            i

                        y it u it bðu it Þ
            fðy it jb i Þ¼ exp   þ cðy it ; fÞ
                            f
            the conditional mean m ij ¼ E½y ij jb Š is related to a linear predictor containing both fixed and random effects by the link function gðÞ:
                                      i
                    T    T
            gðm it Þ¼ x b þ z b i
                         it
                    it
            the q-dimensional random parameter b is distributed as a r.v. with mean 0 q and variance–covariance matrix S B .
                                         i
            In particular, in our application of GLMM to P. oceanica, we have opted for a simple version of this model, in which again the response
           distribution is Gamma, the link function is either log or identity, and the only random parameter is the intercept for each shoot:
                  T
           gðm it Þ¼ x b þ b 0i
                  it
                      2
           with b 0i Nð0; s Þ.
                      B
            This choice corresponds to assuming that the shoots in the same meadow are heterogeneous, since some tend to have a higher average
           annual growth (and hence a higher intercept) and some others a lower growth (and hence a lower intercept), but, once this heterogeneity has
           been accounted for via the variance of the distribution of the random intercept, the repeated measures of annual growth for the available
           lepidochronological years are independent. This model fits satisfactorily, and gives more accurate estimates of the standard errors of the fixed
           effects (age, calendar year, site, depth, etc.), providing more power for the tests of significance of these explanatory variables.  379


           Environmetrics 2011; 22: 370–382  Copyright ß 2010 John Wiley & Sons, Ltd.  wileyonlinelibrary.com/journal/environmetrics
   5   6   7   8   9   10   11   12   13