Page 5 - Bacc_Pagoto_&C_2015
P. 5

Author's personal copy

           4                                                                             Ann Microbiol (2015) 65:1–13

           et al. 1999; Tabacchioni et al. 2000; Mengoni and Bazzicalupo  (http://cran.r-project.org/web/packages/ggplot2/index.html).
           2002; Mengoni et al. 2009;Piniet al. 2012).        Relative abundances were then used to make a clustering
                                                              using the UPGMA algorithm and “Bray-Curtis” distance. At
                                                              phylum level the absolute abundance was resampled in order
           Sequence analysis and bioinformatics
                                                              to check if the differences between the datasets obtained were
                                                              statistically significant. A number of 10,000 resamples was
           Sequence reads (of approx. 100 bp in length, covering the V3
                                                              done for every pair of datasets. Then, the differences obtained
           region of 16S rRNA gene) were subjected to a first quality
                                                              with the 10,000 resamples were tested against the real
           control step in order to eliminate low complexity reads and
                                                              difference between each samples pair using a Wilcoxon
           low quality bases. Three other quality control steps were
                                                              signed rank test with a 95% confidence interval (a p-value
           performed: i) firstly, the presence of Illumina adaptors and
                                                              less than 0.05 was considered significant) (Figure S1). For
           primers was checked by using a standard procedure, as re-
                                                              each phylum a boxplot of the 10,000 differences obtained was
           ported in the FASTQC manual (http://www.bioinformatics.
                                                              plotted for each samples pair (3 boxplots in total) using R
           babraham.ac.uk/projects/fastqc/) that is by collecting
                                                              graphics package ggplot2 (see above) (Figure S2).
           sequences of 50 bp that were overrepresented in the reads
                                                                As previously suggested (Hill et al. 2003; Schloss and
           file and analyzing them using blast against the nt database; ii)
                                                              Handelsman 2006;Shawetal. 2008;Gerber et al. 2012)there
           a second step was performed using a dynamic trimming
                                                              is not a unique indicator of community diversity, consequently
           algorithm to trim the poor quality bases in all the reads of
                                                              four diversity indices were then calculated (Hill et al. 2003):
           the samples, in order to delete possible ambiguous bases. A                       −1
                                                              inverse Simpson index (defined as D ), Shannon-Weaver
           quality cut off of 28 was used to obtain an error percentage at
                                                              index, Richness index (as estimators of alpha diversity),
           least of 0.16 %; and iii) finally, sequences files were analyzed
                                                              Evenness index and beta-diversity (defined as “total number
           using FastQC in order to check for increased quality of reads.
                                                              of taxa in one site”/”means of taxa count in all sites” -1).
             For sequence analysis and taxonomy classification, reads
                                                                All the script and classes used in this work were written in
           with less than 50 bp were deleted from the datasets. Then the
                                                              Java or R and are available upon request.
           resulting reads, with length > 50 bp were analyzed by using
           the RDP Classifier and taxonomically assigned. The results
           obtained by the RDP Classifier were imported into MEGAN
                                                              Results
           (MEta Genome Analyzer, Huson et al. 2007) to calculate a
           taxonomic classification of the reads. The taxonomic classifi-
                                                              T-RFLP profiling of bacterial community function
           cation obtained was collapsed to different taxonomic levels
                                                              and diversity
           (phylum, class, order, family and genus) with the purpose of
           analyze the absolute read abundance attributed to that taxo-
                                                              Application of 16S rRNA gene T-RFLP bacterial community
           nomic level on each dataset. Recommendations for thresholds
                                                              profiling to the 27 total DNA samples (nine samples for three
           as in Mizrahi-Man et al. 2013 were followed, which allow an
                                                              sampling points per locality) allowed the identification of 30
           error rate up to 5% at the genus level using a confidence
                                                              different T-RFs (Terminal-Restriction Fragments), two of
           threshold of 95% (Mizrahi-Man et al. 2013).
                                                              which were present in all samples, while the others were
             To assess taxa richness, rarefaction analyses were per-
                                                              detected in 1 –11 samples. The ribotypic diversity of commu-
           formed using the R package Vegan [http://cc.oulu.fi/
                                                              nities (as number of T-RFs, Table 1) was relatively low and
           ~jarioksa/softhelp/vegan.html], after collapsing reads to the
                                                              varied from 4.7 mean T-RFs (Praja upper-line) to 9.7
           taxonomic levels of genus, family, order and class, with a
                                                              (Faraglioni shore-line). In general shore-line samples had
           probability of assignment of 80% or above.
                                                              more T-RFs than upper-line samples. Bacteria titres (no. of
             Finally, data obtained from MEGAN were analyzed using
                                                              cells/g of sand), estimated by quantitative PCR, were not
           R (package vegan - http://cran.r-project.org/web/packages/
                                                              different (one-way ANOVA) between samples and varied
           vegan/index.html). Firstly, absolute abundances were             5
                                                              from 1.3±0.6 10 cells/g (Faraglioni shore-line) to 1.2±
           transformed to relative ones using this formula:        4
                                                              1.1 10 cells/g (Lido Burrone upper-line). No relationships
           X  norm  ¼  X ij =X jþ                             between organic carbon content, ribotype richness and bacte-
             ij
                                                              rial cell estimates (as 16S rRNA gene copies) were found (data
             Where: i and j are the matrix rows and columns, respec-  not shown).
                                                   norm
           tively, X ij is the value in the row i and the column j, X ij  is the  CCA (Fig. 1a) showed that samples are mainly grouped
           normalized value in the row i and the column j and X j+ is the  according to their position along the Y-axis (from left to right)
           sum of all value in the column j. Then, a heat map was made  and that humidity is clearly related with such a axis, while
           for every different taxonomic level by using custom R scripts  organic carbon content is more related to differences among
           (available upon request) and the graphic package ggplot2  localities. However, also some groupings related to the
   1   2   3   4   5   6   7   8   9   10