De novo assembly of the A. auricula-judae transcriptome
Each of the individual RNA samples from Quanjin (n = 3), Banjin (n = 3), and Wujin (n = 3) (Fig. 1) were used to construct cDNA libraries for sequencing on an Illumina HiSeq2000 system. After removal of low quality reads, 69,814,486 reads, 62,918,023 reads, and 69,589,663 reads were obtained for Quanjin, Banjin, and Wujin, respectively (Supplementary Table S1). These high-quality reads were assembled de novo using Trinity23, and 33,316 contigs, 29,523 contigs, and 33,682 contigs were generated for Quanjin, Banjin and Wujin, respectively. To obtain a non-redundant library of consensus sequences, contigs of the three varieties were merged using TGICL24. Primary unigenes (54,320) were obtained containing 20,129 distinct clusters and 34,191 distinct singletons. The sequencing reads were further processed in edgeR25 to obtain FPKM (Fragments Per Kilobase of transcript per Million mapped fragments) values for each primary unigene in each of the biological samples (n = 9). For each variety, median FPKM value of the three replicates was used to represent the expression level. Only unigenes with a FPKM > 1.0 in any of the three varieties were collected for further analysis (Fig. 2). Ultimately, a final unigene library containing 13,937 unigenes (length range 201–30,448 bp) were established. The average and N50 lengths of unigenes were 2,594 and 3,521 bp, respectively (Supplementary Table S2).
Annotation of unigenes
To identify if the unigenes were homologues of any known genes, a BLASTn search against the NCBI non-redundant nucleotide (nt) databases was performed and 12,011 unigenes were annotated. These unannotated unigenes were then searched against the Swiss-Prot and NCBI non-redundant protein (nr) databases using BLASTp with an E-value cutoff of 1e-4. Among the remaining 1,926 unigenes, 802 had significant similarities to known proteins. In total, 12,813 unigenes were annotated to known genes/proteins (Supplementary Dataset S1–S3). Only 0.0988% (median, n = 3), 0.101% (median, n = 3) and 0.101% (median, n = 3) of the unigenes were annotated to A. auricula-judae from varieties of Quanjin, Banjin and Wujin, respectively (Supplementary Table S3). Instead, the majority of unigenes were annotated to Auricularia subglabra (Fig. 3), highlighting lack of A. auricula-judae reference sequences in the public databases. In total, there were 1,124 unannotated unigenes among the nine samples (Supplementary Dataset S4).
Transcriptomic relationships between the three A. auricula-judae varieties
To determine the gene expression pattern relationships among the three varieties of A. auricula-judae, a cluster analysis using final reads set (Fig. 2) was performed. A Euclidean distance tree revealed that all three biological triplets clustered together in all varieties (Fig. 4a), confirming good sample quality. Quanjin (n = 3) was clearly isolated from Banjin (n = 3) and Wujin (n = 3), indicating an overall difference between their transcriptomic characteristics (Fig. 4a). PCA was also performed to confirm the relationships between the three varieties (Fig. 4b). The PCA revealed that Quanjin (n = 3) was clearly separated from Banjin (n = 3) and Wujin (n = 3) at both PC1 and PC2, whereas Banjin and Wujin appeared distinguishable only at PC2. At PC2, however, Quanjin (n = 3) was flanked by Banjin (n = 3) and Wujin (n = 3), indicating that Quanjin had more similarities to Banjin and Wujin than those shared between the latter two varieties in certain aspects (Fig. 4b). Therefore, the three mushroom varieties had a triangular relationship in their gene expression patterns. In addition to the between-variety relationships, the PCA results also revealed a visible difference in between-individual variations among the three varieties. Banjin displayed the least variation, whereas Wujin was the most variable, though the between-individual variations were mainly evident at PC2 (Fig. 4b).
Among the 12,813 unigenes annotated to known genes/proteins, the expression of 11,371 (88.7%) unigenes was shared in all three varieties (Fig. 5a). Quanjin uniquely expressed 637 annotated unigenes, whereas only 36 and 21 unigenes were uniquely detected in Banjin and Wujin, respectively. However, the latter two shared an intersection of 523 annotated unigenes whose expression was not detected in Quanjin (Fig. 5a). Between these sets of 637 Quanjin unique unigenes and 523 Banjin-Wujin shared unigenes, 141 pairs of unigenes had nucleotide sequence similarities (determined by local BLASTn) and 57 pairs of unigenes had amino acid sequence similarities (determined by local BLASTx), suggesting that they may perform similar functions but have polymorphisms in the sequences. Taken together, the above results indicated that Banjin and Wujin transcriptomes are closely related to each other, while they were distant to Quanjin in the gene expression landscape.
Among the 1,124 unannotated unigenes, 672 (59.8%) were shared by all of the three A. auricula-judae varieties. Quanjin uniquely expressed 235 novel unigenes, whereas 169 novel unigenes were only detected in the intersection between Banjin and Wujin (Fig. 5b). Thirty pairs of homologues were detected in these two sets of novel unigenes by local BLAST. The expression levels of these novel unigenes in all nine samples are displayed in a clustering heat map using the FKPM values (Fig. 6a). The expression patterns of the novel unigenes (Fig. 6a) conformed to the whole transcriptome characteristics and supported that Quanjin was isolated from Banjin and Wujin (Fig. 4).
Differentially expressed unigenes
Differentially expressed unigenes (DEGs) were determined for each of the two variety pairs using EdgeR with a cutoff threshold of P < 0.05. There were 2,861 DEGs in Banjin vs. Wujin, 5,992 DEGs in Quanjin vs. Banjin, and 6,549 DEGs in Quanjin vs. Wujin (Supplementary Table S4). An expression clustering heat map of all identified DEGs was produced for all nine samples using the FKPM values (Fig. 6b). The DEG clustering results also conformed to the whole transcriptome clarification that Quanjin was more divergent compared to the other two A. auricula-judae varieties (Fig. 4).
Gene ontology (GO) annotation and cluster analysis of function groups
The 12,813 BLAST-annotated A. auricula-judae unigenes were further annotated to 162 GO functions. Among them, 55 GO functions belonged to the category of molecular function, 50 belonged to cellular component, and 57 belonged to biological process (Fig. 7, Supplementary Dataset S5). Based on the number of unigenes in each GO function detected in each of the three varieties, a GO function clustering analysis was performed to summarize the relationships among the three varieties at the GO function level. The numbers of unigenes were transformed together into Z-scores and displayed separately as DEGs and un-DEGs (Fig. 7). Again, Banjin and Wujin displayed greater similarities than those of Banjin vs. Quanjin and Wujin vs. Quanjin. This is shown in Fig. 7 in two aspects as the greatest number of un-DEGs and the lowest numbers of DEGs between Banjin and Wujin (Fig. 7 and Supplementary Dataset S5), indicating similarities between the two varieties. There was further supporting evidence from the DEGs section, where the columns of up-regulated Quanjin unigenes clustered together (bq-q and qw-q, Fig. 7) and was isolated from the cluster containing columns of down-regulated Quanjin unigenes (bq-b and qw-w, Fig. 7), indicating similar patterns for the difference between Qianjin vs. Banjin and Qianjin vs. Wujin. Based on these results, the functional characteristics of Quanjin transcriptome were indeed unique compared to those of Banjin and Wujin.
KEGG pathway enrichment associated with surface wrinkle
983 unigenes were extracted for wrinkle down-regulation (FPKM: Wujin > Banjin > Quanjin) and 988 unigenes were extracted for wrinkle up-regulation (FPKM: Wujin < Banjin < Quanjin). The total of 1971 unigenes were pooled and determined for significant enrichments (with corrected P < 0.05, by Benjamini-Hochberg FDR method) in six KEGG pathways, such as starch and sucrose metabolism (Rich Factor, number of enriched genes/number of total genes in the pathway, RF = 0.20), MAPK signaling pathway-yeast (RF = 0.19), biosynthesis of amino acids (RF = 0.17), biosynthesis of secondary metabolites (RF = 0.14), biosynthesis of antibiotics (RF = 0.14) and metabolic pathways (RF = 0.13) (Fig. 8, Supplementary Dataset S6). To our knowledge, this is the first description of biological functions possibly associated with fruiting body wrinkles, a major morphological feature of Auricularia mushrooms6,7,8.
Expressions of peroxidase-like unigenes
To further validate the relationship of functional characteristics among the three A. auricula-judae varieties, we examined the expression patterns of peroxidase-like unigenes in all nine samples. In total, 43 unigenes displayed homologies to known peroxidases (Supplementary Dataset S7), including 14 heme-thiolate peroxidase (HTP)-like unigenes, four DyP-type peroxidase-like unigenes, and three manganese peroxidase (MnP)-like unigenes (Supplementary Figure S1). Figure 9 shows a clustering heat map based on the FKPM values. Consistent with previous results (Figs 4, 6 and 7), Quanjin (n = 3) was isolated from Banjin (n = 3) and Wujin (n = 3) samples, confirming that the expression profiles of peroxidase-like genes in Quanjin were unique when compared to those of Banjin and Wujin (Fig. 9). However, from the peroxidase expression profiles, clustering analysis could not sufficiently separate Banjin and Wujin, mainly due to a large between-individual variation in Wujin (Fig. 9, Wujin-2). This result conformed to the PCA results that the greatest between-individual variation was observed in Wujin when all unigenes were used (Fig. 4b).
Real time PCR validation of selected peroxidase-like unigene transcriptions
To validate the transcriptome results, 16 peroxidase-like unigenes (Supplementary Figure S1) were selected for RT-qPCR analysis. Firstly, Sanger sequencing of the RT-qPCR products (Supplementary Dataset S8) confirmed genuine amplifications of all of the 16 peroxidase-like transcripts tested (Supplementary Dataset S9). Secondly, the ΔCt values had negative correlations (P < 0.05) to the FKPM values in all three cultivars (Supplementary Table S5), showing compatibility of expression levels determined by the two methods. Interestingly, transcriptions of A. aj DyP1 (JQ650250)26 were detected by RT-qPCR with relatively high ΔCt values (Supplementary Table S6) in all three cultivars although its homologue was not included in our final unigene set of FPKM > 1.0. Further investigation using contigs detected signals of two non-overlapping JQ650250 homologues with low FPKM values (Supplementary Table S6), emphasizing that the final unigene set excluded lowly expressed A. auricula-judae genes.