Page 7 - Lovison_alii_2010
P. 7

G. LOVISON ET AL.


            the distribution of the response variable Y, conditional on the covariates, need not be Normal, but can be any distribution belonging to the
            Natural Exponential family (denoted by N:E:F:ðu i ; fÞ); i.e., the probability density function of the response must be of the form:

                          ½y i u i bðu i ފ
            fðy i ; u i ; fÞ¼ exp  þ cðy i ; fÞ  8i                                                         ð1Þ
                            a i ðfÞ
            where the natural parameter u i is a known function of the expected value m i ¼ EðY i jx i Þ, i.e., u ¼ mðm i Þ, while the dispersion parameter f
                                            2
            plays a role similar to that of the variance s in the Normal distribution. It is important to recall that many distributions used in modeling
            biological phenomena, like the Normal, Gamma, Binomial, and Poisson belong to this family;
            the scale on which the explanatory variables act linearly need not be the original one of the expected values m i , but can now be any
            monotonic transformation; in other words, the link function gðÞ, connecting the linear predictor and the expected value of the response,
                                                                                         1
            can be any invertible function, not just the identity. This can be rephrased in terms of the response function g ðÞ ¼ hðÞ, by saying that the
            expected values of m i ¼ EðY i jx i Þ are modeled by a nonlinear (but invertible) function of the linear predictor.
            Summarizing, and following the same scheme used above, any GLM is characterized by:

            error distribution Y i jx i N:E:F:ðu i ; fÞ with u i ¼ mðm i Þ
                             T
            linear predictor h i ¼ x b
                             i
            link function gðm i Þ¼ h i with gðÞ any invertible function
                                  1
            (or, response function m i ¼ g ðh ÞÞ.
                                     i
            Finally, like in classical linear models, standard GLM’s assume independence among observations:
              ƒ   8i 6¼ j
           Y i  Y j

           4.2. GLM versus Linear Models for transformed data
           As recalled in Section 2, an approach that has been extensively used in ecological applications to fit a Gaussian linear model to data with
           nonlinear relationships, and/or unequal variances and/or non-Normal distribution, consists in transforming the data so that the new scores
           satisfy at least approximately the assumptions of the Gaussian linear model (Digby and Kempton, 1987). This ‘‘data transformation’’
           approach has a long history in Statistics, its formal introduction dating back to Box and Cox (1964) and Grizzle et al. (1969).
            In this approach, the goal is to find a function tðÞ such that, at least approximately, the standard methods of inference for Gaussian linear
           models can be applied to tðY i Þ rather than to Y i directly.
            If from the conceptual and computational point of view this approach is very simple and appealing, it is not without drawbacks. The two
           main problems are:
           (1) in general, it is hard to find a unique transformation that satisfies all assumptions simultaneously;
           (2) even when a unique transformation is able to account for all the departures from the Gaussian linear model, the use of transformations can
              still be problematic, due to the increased difficulty of interpretation of the results. If the original scale of the data is ecologically
              meaningful, then it may be required to express the results back on this original scale, but this is not, in general, an easy task.
            The question obviously arises about the relative merits of the ‘‘data transformation’’ approach versus the GLM approach.
            The main advantage of the transformation approach is its conceptual and computational simplicity. Basically, it builds upon the huge and
           widespread body of statistical knowledge on Gaussian linear models, trying to ‘‘force’’ all types of data to fit, at least approximately, into the
           assumptions of such well known models. However, this is also its main weakness: the attempt to find a unique transformation which satisfies
           all such assumptions usually fails because they are often separately violated by real data. In this respect, a Generalized Linear Model is
           superior because it is less rigid, since it addresses the violations separately: e.g., a nonlinear regression function can be combined with a
           Normal distribution for the conditional distribution of the response, or a linear regression with a non-Normal, heteroscedastic distribution,
           etc. Moreover, the interpretation of the results on the original scale is more natural within a GLM than within the transformation approach,
           since the invertibility of the link function provides a straightforward way to transfer the results from the link scale to the response scale. This
           is particularly appealing in the ecological context, since model parameters, predictions, etc. usually have a definite biological meaning only
           on the original scale.


           4.3. Cross-sectional analysis of Sicily PosiData-1
           The methodological issues discussed in the previous Sections are here illustrated with real examples taken from the Sicily PosiData-1 dataset.
           In order to keep the sample sizes reasonably large, the analysis has been carried out on the lepidochronological years from 1991 to 1998, for
           which a larger number of shoots is available. This gives a total of 400 year/meadow combinations, i.e., 8 years for three stations at different
           depths for 16 sites, plus two stations at a site where only two depths were available.
           4.3.1. Exploring violations: the departure from Normality
           In order to check the Normality assumption in our dataset, we performed both informal graphical checks and formal tests systematically on
           all of the year/meadow combinations available in Sicily PosiData-1. We chose to work on Rhizome elongation as response variable and


   376
           wileyonlinelibrary.com/journal/environmetrics  Copyright ß 2010 John Wiley & Sons, Ltd.  Environmetrics 2011; 22: 370–382
   2   3   4   5   6   7   8   9   10   11   12