Comparative transcriptome analysis reveals relationship of three major domesticated varieties of Auricularia auricula-judae

Bioinformatics

De novo assembly of the A. auricula-judae transcriptome

Each of the individual RNA samples from Quanjin (n = 3), Banjin (n = 3), and Wujin (n = 3) (Fig. 1) were used to construct cDNA libraries for sequencing on an Illumina HiSeq2000 system. After removal of low quality reads, 69,814,486 reads, 62,918,023 reads, and 69,589,663 reads were obtained for Quanjin, Banjin, and Wujin, respectively (Supplementary Table S1). These high-quality reads were assembled de novo using Trinity23, and 33,316 contigs, 29,523 contigs, and 33,682 contigs were generated for Quanjin, Banjin and Wujin, respectively. To obtain a non-redundant library of consensus sequences, contigs of the three varieties were merged using TGICL24. Primary unigenes (54,320) were obtained containing 20,129 distinct clusters and 34,191 distinct singletons. The sequencing reads were further processed in edgeR25 to obtain FPKM (Fragments Per Kilobase of transcript per Million mapped fragments) values for each primary unigene in each of the biological samples (n = 9). For each variety, median FPKM value of the three replicates was used to represent the expression level. Only unigenes with a FPKM > 1.0 in any of the three varieties were collected for further analysis (Fig. 2). Ultimately, a final unigene library containing 13,937 unigenes (length range 201–30,448 bp) were established. The average and N50 lengths of unigenes were 2,594 and 3,521 bp, respectively (Supplementary Table S2).

Figure 2
Figure 2

Analytical framework of the A. auricula-judae transcriptome. Raw reads of each of the three varieties were filtered and then de novo assembled into contigs by Trinity. Unigenes were obtained by clustering the contigs with TGICL. FPKM values were calculated for each of the unigenes, and only unigenes with FPKM >1.0 in any of the three varieties were used for further analyses and annotation against the public databases.

Annotation of unigenes

To identify if the unigenes were homologues of any known genes, a BLASTn search against the NCBI non-redundant nucleotide (nt) databases was performed and 12,011 unigenes were annotated. These unannotated unigenes were then searched against the Swiss-Prot and NCBI non-redundant protein (nr) databases using BLASTp with an E-value cutoff of 1e-4. Among the remaining 1,926 unigenes, 802 had significant similarities to known proteins. In total, 12,813 unigenes were annotated to known genes/proteins (Supplementary Dataset S1S3). Only 0.0988% (median, n = 3), 0.101% (median, n = 3) and 0.101% (median, n = 3) of the unigenes were annotated to A. auricula-judae from varieties of Quanjin, Banjin and Wujin, respectively (Supplementary Table S3). Instead, the majority of unigenes were annotated to Auricularia subglabra (Fig. 3), highlighting lack of A. auricula-judae reference sequences in the public databases. In total, there were 1,124 unannotated unigenes among the nine samples (Supplementary Dataset S4).

Figure 3
Figure 3

Summary of unigene annotation. Unigenes were divided into five major annotation groups (percentage >1% of total) and the proportion of each group is shown for each of the individual samples of Wujin (n = 3, Panels a–c), Banjin (n = 3, Panels d–f), and Quanjin (n = 3, Panels g–i).

Transcriptomic relationships between the three A. auricula-judae varieties

To determine the gene expression pattern relationships among the three varieties of A. auricula-judae, a cluster analysis using final reads set (Fig. 2) was performed. A Euclidean distance tree revealed that all three biological triplets clustered together in all varieties (Fig. 4a), confirming good sample quality. Quanjin (n = 3) was clearly isolated from Banjin (n = 3) and Wujin (n = 3), indicating an overall difference between their transcriptomic characteristics (Fig. 4a). PCA was also performed to confirm the relationships between the three varieties (Fig. 4b). The PCA revealed that Quanjin (n = 3) was clearly separated from Banjin (n = 3) and Wujin (n = 3) at both PC1 and PC2, whereas Banjin and Wujin appeared distinguishable only at PC2. At PC2, however, Quanjin (n = 3) was flanked by Banjin (n = 3) and Wujin (n = 3), indicating that Quanjin had more similarities to Banjin and Wujin than those shared between the latter two varieties in certain aspects (Fig. 4b). Therefore, the three mushroom varieties had a triangular relationship in their gene expression patterns. In addition to the between-variety relationships, the PCA results also revealed a visible difference in between-individual variations among the three varieties. Banjin displayed the least variation, whereas Wujin was the most variable, though the between-individual variations were mainly evident at PC2 (Fig. 4b).

Figure 4
Figure 4

Gene expression relationships among the three A. auricula-judae varieties. (Panel a) Raw reads counts of unigenes in each of the samples were transformed into regularized log–transformed data by rlogTransformation. Then, the distance between samples was calculated and displayed in a hierarchical clustering tree by hclust. The X-axis labels samples, and the Y-axis represents the Euclidean distance between samples. (Panel b) The top two principal components (X-axis, PC1; Y-axis, PC2) deciphered 88.1% of all difference among the three varieties. Banjin and Wujin were similar to each other in PC1 but showed a greater difference in PC2 when compared to Quanjin.

Among the 12,813 unigenes annotated to known genes/proteins, the expression of 11,371 (88.7%) unigenes was shared in all three varieties (Fig. 5a). Quanjin uniquely expressed 637 annotated unigenes, whereas only 36 and 21 unigenes were uniquely detected in Banjin and Wujin, respectively. However, the latter two shared an intersection of 523 annotated unigenes whose expression was not detected in Quanjin (Fig. 5a). Between these sets of 637 Quanjin unique unigenes and 523 Banjin-Wujin shared unigenes, 141 pairs of unigenes had nucleotide sequence similarities (determined by local BLASTn) and 57 pairs of unigenes had amino acid sequence similarities (determined by local BLASTx), suggesting that they may perform similar functions but have polymorphisms in the sequences. Taken together, the above results indicated that Banjin and Wujin transcriptomes are closely related to each other, while they were distant to Quanjin in the gene expression landscape.

Figure 5
Figure 5

Unigene expression. The numbers of unigenes detected in the three A. auricula-judae varieties are displayed in Venn charts. (Panel a) Expression of unigenes annotated to Auricularia subglabra, E. glandulosa, and other species (Fig. 3). (Panel b) Expressions of unannotated unigenes (Fig. 3).

Among the 1,124 unannotated unigenes, 672 (59.8%) were shared by all of the three A. auricula-judae varieties. Quanjin uniquely expressed 235 novel unigenes, whereas 169 novel unigenes were only detected in the intersection between Banjin and Wujin (Fig. 5b). Thirty pairs of homologues were detected in these two sets of novel unigenes by local BLAST. The expression levels of these novel unigenes in all nine samples are displayed in a clustering heat map using the FKPM values (Fig. 6a). The expression patterns of the novel unigenes (Fig. 6a) conformed to the whole transcriptome characteristics and supported that Quanjin was isolated from Banjin and Wujin (Fig. 4).

Figure 6
Figure 6

Expression profiles of novel and differentially expressed unigenes. The FPKM of all A. auricula-judae unigenes in nine samples were scaled together and transformed into Z-scores ((x-mean)/(s.d.)) using an in-house Perl program and are represented in a clustering heat map by Heatmap 2. The Z-scores are color coded as shown in the color bar. (Panel a) Expression profile of novel unigenes. (Panel b) Expression profile of differentially expressed unigenes.

Differentially expressed unigenes

Differentially expressed unigenes (DEGs) were determined for each of the two variety pairs using EdgeR with a cutoff threshold of P < 0.05. There were 2,861 DEGs in Banjin vs. Wujin, 5,992 DEGs in Quanjin vs. Banjin, and 6,549 DEGs in Quanjin vs. Wujin (Supplementary Table S4). An expression clustering heat map of all identified DEGs was produced for all nine samples using the FKPM values (Fig. 6b). The DEG clustering results also conformed to the whole transcriptome clarification that Quanjin was more divergent compared to the other two A. auricula-judae varieties (Fig. 4).

Gene ontology (GO) annotation and cluster analysis of function groups

The 12,813 BLAST-annotated A. auricula-judae unigenes were further annotated to 162 GO functions. Among them, 55 GO functions belonged to the category of molecular function, 50 belonged to cellular component, and 57 belonged to biological process (Fig. 7, Supplementary Dataset S5). Based on the number of unigenes in each GO function detected in each of the three varieties, a GO function clustering analysis was performed to summarize the relationships among the three varieties at the GO function level. The numbers of unigenes were transformed together into Z-scores and displayed separately as DEGs and un-DEGs (Fig. 7). Again, Banjin and Wujin displayed greater similarities than those of Banjin vs. Quanjin and Wujin vs. Quanjin. This is shown in Fig. 7 in two aspects as the greatest number of un-DEGs and the lowest numbers of DEGs between Banjin and Wujin (Fig. 7 and Supplementary Dataset S5), indicating similarities between the two varieties. There was further supporting evidence from the DEGs section, where the columns of up-regulated Quanjin unigenes clustered together (bq-q and qw-q, Fig. 7) and was isolated from the cluster containing columns of down-regulated Quanjin unigenes (bq-b and qw-w, Fig. 7), indicating similar patterns for the difference between Qianjin vs. Banjin and Qianjin vs. Wujin. Based on these results, the functional characteristics of Quanjin transcriptome were indeed unique compared to those of Banjin and Wujin.

Figure 7
Figure 7

GO of DEGs and unDEGs. The numbers of unigenes in each GO term were collected for DEGs (P < 0.05) and unDEGs, respectively, for each pairwise comparison among the three varieties. All of the unigene numbers were normalized into Z-scores ((x-mean)/(s.d.)) using an in-house Perl program and are represented in a clustering heat map by Heatmap 2. The Z-scores are color coded as shown in the color bar. Color word labeling was used to represent different classes of GO categories of molecular function (blue), cellular component (red), and biological process (black).

KEGG pathway enrichment associated with surface wrinkle

983 unigenes were extracted for wrinkle down-regulation (FPKM: Wujin > Banjin > Quanjin) and 988 unigenes were extracted for wrinkle up-regulation (FPKM: Wujin < Banjin < Quanjin). The total of 1971 unigenes were pooled and determined for significant enrichments (with corrected P < 0.05, by Benjamini-Hochberg FDR method) in six KEGG pathways, such as starch and sucrose metabolism (Rich Factor, number of enriched genes/number of total genes in the pathway, RF = 0.20), MAPK signaling pathway-yeast (RF = 0.19), biosynthesis of amino acids (RF = 0.17), biosynthesis of secondary metabolites (RF = 0.14), biosynthesis of antibiotics (RF = 0.14) and metabolic pathways (RF = 0.13) (Fig. 8, Supplementary Dataset S6). To our knowledge, this is the first description of biological functions possibly associated with fruiting body wrinkles, a major morphological feature of Auricularia mushrooms6,7,8.

Figure 8
Figure 8

KEGG pathways associated with fruiting body wrinkle Scatter plot of KEGG pathways enriched for unigenes that were up- or down-regulated by the wrinkle morphology. The X-axis displays the Rich Factor which is the ratio of numbers of the unigenes in a pathway term to all gene numbers annotated in the same pathway term. The size of bubble represents the number of unigenes and the color gradient indicates the corrected P value by Benjamini-Hochberg FDR method. Only pathways with corrected P < 0.05 were shown.

Expressions of peroxidase-like unigenes

To further validate the relationship of functional characteristics among the three A. auricula-judae varieties, we examined the expression patterns of peroxidase-like unigenes in all nine samples. In total, 43 unigenes displayed homologies to known peroxidases (Supplementary Dataset S7), including 14 heme-thiolate peroxidase (HTP)-like unigenes, four DyP-type peroxidase-like unigenes, and three manganese peroxidase (MnP)-like unigenes (Supplementary Figure S1). Figure 9 shows a clustering heat map based on the FKPM values. Consistent with previous results (Figs 4, 6 and 7), Quanjin (n = 3) was isolated from Banjin (n = 3) and Wujin (n = 3) samples, confirming that the expression profiles of peroxidase-like genes in Quanjin were unique when compared to those of Banjin and Wujin (Fig. 9). However, from the peroxidase expression profiles, clustering analysis could not sufficiently separate Banjin and Wujin, mainly due to a large between-individual variation in Wujin (Fig. 9, Wujin-2). This result conformed to the PCA results that the greatest between-individual variation was observed in Wujin when all unigenes were used (Fig. 4b).

Figure 9
Figure 9

Expression profiles of peroxidase-related unigenes. The Z-scores ((x-mean)/(s.d.)) of FPKM values of A. auricula-judae peroxidase-related unigenes were extracted from the Z-scores of all unigenes and are represented in a clustering heat map by Heatmap 2. The Z-scores are color coded as shown in the color bar. Forty-three unigenes were functionally annotated to peroxidase by GO and were labelled using the unigene ID followed by the BLASTn/x annotations (Supplementary Dataset S7).

Real time PCR validation of selected peroxidase-like unigene transcriptions

To validate the transcriptome results, 16 peroxidase-like unigenes (Supplementary Figure S1) were selected for RT-qPCR analysis. Firstly, Sanger sequencing of the RT-qPCR products (Supplementary Dataset S8) confirmed genuine amplifications of all of the 16 peroxidase-like transcripts tested (Supplementary Dataset S9). Secondly, the ΔCt values had negative correlations (P < 0.05) to the FKPM values in all three cultivars (Supplementary Table S5), showing compatibility of expression levels determined by the two methods. Interestingly, transcriptions of A. aj DyP1 (JQ650250)26 were detected by RT-qPCR with relatively high ΔCt values (Supplementary Table S6) in all three cultivars although its homologue was not included in our final unigene set of FPKM > 1.0. Further investigation using contigs detected signals of two non-overlapping JQ650250 homologues with low FPKM values (Supplementary Table S6), emphasizing that the final unigene set excluded lowly expressed A. auricula-judae genes.

Articles You May Like

Solar-powered moisture harvester collects and cleans water from air
Massive U.S. Machines That Hunt For Ripples In Space-Time Just Got An Upgrade
Novel electrocardiogram uses signals from ear and hand to check heart rhythm
‘Inflamm-aging’ causes loss of bone healing ability in the elderly
Adam Spencer: Why Are Monster Prime Numbers Important?

Leave a Reply

Your email address will not be published. Required fields are marked *