Page 6 - Bacci_Pagoto_etali_2015
P. 6
Author's personal copy
4 Ann Microbiol (2015) 65:1–13
et al. 1999; Tabacchioni et al. 2000; Mengoni and Bazzicalupo (http://cran.r-project.org/web/packages/ggplot2/index.html).
2002; Mengoni et al. 2009;Piniet al. 2012). Relative abundances were then used to make a clustering
using the UPGMA algorithm and “Bray-Curtis” distance. At
phylum level the absolute abundance was resampled in order
Sequence analysis and bioinformatics
to check if the differences between the datasets obtained were
statistically significant. A number of 10,000 resamples was
Sequence reads (of approx. 100 bp in length, covering the V3
done for every pair of datasets. Then, the differences obtained
region of 16S rRNA gene) were subjected to a first quality
with the 10,000 resamples were tested against the real
control step in order to eliminate low complexity reads and
difference between each samples pair using a Wilcoxon
low quality bases. Three other quality control steps were
signed rank test with a 95% confidence interval (a p-value
performed: i) firstly, the presence of Illumina adaptors and
less than 0.05 was considered significant) (Figure S1). For
primers was checked by using a standard procedure, as re-
each phylum a boxplot of the 10,000 differences obtained was
ported in the FASTQC manual (http://www.bioinformatics.
plotted for each samples pair (3 boxplots in total) using R
babraham.ac.uk/projects/fastqc/) that is by collecting
graphics package ggplot2 (see above) (Figure S2).
sequences of 50 bp that were overrepresented in the reads
As previously suggested (Hill et al. 2003; Schloss and
file and analyzing them using blast against the nt database; ii)
Handelsman 2006;Shawetal. 2008;Gerber et al. 2012)there
a second step was performed using a dynamic trimming
is not a unique indicator of community diversity, consequently
algorithm to trim the poor quality bases in all the reads of
four diversity indices were then calculated (Hill et al. 2003):
the samples, in order to delete possible ambiguous bases. A −1
inverse Simpson index (defined as D ), Shannon-Weaver
quality cut off of 28 was used to obtain an error percentage at
index, Richness index (as estimators of alpha diversity),
least of 0.16 %; and iii) finally, sequences files were analyzed
Evenness index and beta-diversity (defined as “total number
using FastQC in order to check for increased quality of reads.
of taxa in one site”/”means of taxa count in all sites” -1).
For sequence analysis and taxonomy classification, reads
All the script and classes used in this work were written in
with less than 50 bp were deleted from the datasets. Then the
Java or R and are available upon request.
resulting reads, with length > 50 bp were analyzed by using
the RDP Classifier and taxonomically assigned. The results
obtained by the RDP Classifier were imported into MEGAN
Results
(MEta Genome Analyzer, Huson et al. 2007) to calculate a
taxonomic classification of the reads. The taxonomic classifi-
T-RFLP profiling of bacterial community function
cation obtained was collapsed to different taxonomic levels
and diversity
(phylum, class, order, family and genus) with the purpose of
analyze the absolute read abundance attributed to that taxo-
Application of 16S rRNA gene T-RFLP bacterial community
nomic level on each dataset. Recommendations for thresholds
profiling to the 27 total DNA samples (nine samples for three
as in Mizrahi-Man et al. 2013 were followed, which allow an
sampling points per locality) allowed the identification of 30
error rate up to 5% at the genus level using a confidence
different T-RFs (Terminal-Restriction Fragments), two of
threshold of 95% (Mizrahi-Man et al. 2013).
which were present in all samples, while the others were
To assess taxa richness, rarefaction analyses were per-
detected in 1 –11 samples. The ribotypic diversity of commu-
formed using the R package Vegan [http://cc.oulu.fi/
nities (as number of T-RFs, Table 1) was relatively low and
~jarioksa/softhelp/vegan.html], after collapsing reads to the
varied from 4.7 mean T-RFs (Praja upper-line) to 9.7
taxonomic levels of genus, family, order and class, with a
(Faraglioni shore-line). In general shore-line samples had
probability of assignment of 80% or above.
more T-RFs than upper-line samples. Bacteria titres (no. of
Finally, data obtained from MEGAN were analyzed using
cells/g of sand), estimated by quantitative PCR, were not
R (package vegan - http://cran.r-project.org/web/packages/
different (one-way ANOVA) between samples and varied
vegan/index.html). Firstly, absolute abundances were 5
from 1.3±0.6 10 cells/g (Faraglioni shore-line) to 1.2±
transformed to relative ones using this formula: 4
1.1 10 cells/g (Lido Burrone upper-line). No relationships
X norm ¼ X ij =X jþ between organic carbon content, ribotype richness and bacte-
ij
rial cell estimates (as 16S rRNA gene copies) were found (data
Where: i and j are the matrix rows and columns, respec- not shown).
norm
tively, X ij is the value in the row i and the column j, X ij is the CCA (Fig. 1a) showed that samples are mainly grouped
normalized value in the row i and the column j and X j+ is the according to their position along the Y-axis (from left to right)
sum of all value in the column j. Then, a heat map was made and that humidity is clearly related with such a axis, while
for every different taxonomic level by using custom R scripts organic carbon content is more related to differences among
(available upon request) and the graphic package ggplot2 localities. However, also some groupings related to the