### Synthetic data

In this section, we examine the performance of MuClone on simulated data. In what follows, we generate *N* loci from *M* samples with *K* underlying tumour mutation clusters with sequencing error rate (epsilon) and tumour content *t*_{m}.

We first randomly generate an evolutionary relationship between clusters, viewed as a binary phylogenetic tree. Each node in the tree represents a mutation cluster. The root node represents the ancestral cluster. For each sample, the cellular prevalence of the first descendant, ({phi} _{1^{mathrm{st}}}), is sampled from a Uniform distribution over [0, *ϕ*_{parent}], where *ϕ*_{parent} is the cellular prevalence of the parent node (cluster). The cellular prevalence of the second descendant, ({phi} _{2^{mathrm{nd}}}), is sampled from a Uniform distribution over ([ {0,{phi} _{{mathrm{parent}}} – {phi} _{1^{mathrm{st}}}}]), defined so that the sum of the children’s prevalences do not exceed their parent’s cellular prevalence. The absence or presence of each cluster, in each sample, is sampled from a Bernoulli distribution that assigns equal probability to both outcomes. If a cluster is not present in a sample, the corresponding cellular prevalence will be 0. See Supplementary Fig. 1 for an example of this process. Loci are assigned to a cluster uniformly at random from {0, …, *K*}, where cluster 0 represents the wildtype cluster and {1, …, *K*} are mutation clusters. For each locus, in each sample, the number of reads overlapping the locus (depth) is sampled from a Poisson distribution with mean *d*_{m}. Wildtype copy number is deterministically set to 2, and a copy number profile (major and minor copy number) is generated according to the following steps: The total copy number, *C*^{t}, is sampled uniformly at random from {1, …, *C*_{max}}. An integer number, *C*^{b}, is randomly (following a discrete Uniform distribution) picked from 1 to *C*^{t}, and *C*^{a} is defined as *C*^{a} = *C*^{t} − *C*^{b}. Lastly, the major copy number is set to the maximum of *C*^{b} and *C*^{a}; the minor copy number is set to the minimum of those two values. Then, corresponding to each cluster, the number of variant reads are sampled from the Beta-Binomial distribution described in Eq. (7) with precision parameter equal to 1000.

### Synthetic data evaluation

We simulated 10 synthetic data for 20,000 loci from 4 samples of a patient, with 5 underlying clusters, including an ancestral cluster. The maximum copy number was 5, and the error rate was 0.01. The average sequencing depth was assumed to be 100 for all samples.

To assess the performance and robustness of MuClone, we systematically shielded MuClone from clonal information (Fig. 1). In particular, the cellular prevalence information was perturbed by (i) adding noise to its value, or (ii) removing the cellular prevalence information of the clusters. The noise was generated from a normal distribution with mean zero and standard deviations: 0, 0.01, 0.1, and 0.2. The noise value, *ν*, was added to the cellular prevalence of the cluster, while bounding the resulting value between 0 and 1, that is,

$${phi} _m^{ ast z} = {mathrm{min}}left( {{mathrm{max}}({phi} _m^z + nu ,0),1} right),$$

where ({phi} _m^{ ast z}) and ({phi} _m^z) are the perturbed and original cellular prevalence of cluster *z* and sample *m*, respectively. The clusters which their clonal information was removed, were randomly chosen with equal probabilities. As expected, both sensitivity and specificity were highest with complete and accurate clonal information; see Fig. 1. This suggests that incorporating clonal information can improve mutation detection accuracy and gives evidence to support MuClone’s approach. Furthermore, since the sensitivities were only marginally impacted by adding noise to the clonal information, MuClone should be able to cope with modest misspecification of the prior. However, specificity can decrease if the cellular prevalence is reduced to levels associated with the wildtype cluster and sensitivity can improve if adding noise increases the cellular prevalence to levels associated with a removed mutation cluster.

Naturally, accuracy is most severely impacted with reduced/corrupted clonal information; see Fig. 1. For the modest level of noise (noise standard deviation 0 and 0.01), the sensitivity and specificity of removing various numbers of clusters were compared through a Kruskal–Wallis test (4e−5 ≤ *p*-values ≤ 1e−4) which shows that the change in performance due to clonal information is significant. In noiseless settings, the confidence interval for the difference (of zero and four removed clusters) in mean sensitivity and specificity are [0.16, 0.26] and [0.08, 0.32], respectively. When the noise standard deviation is equal to 0.01, these intervals are [0.11, 0.21] and [0.09, 0.36].

We also explored how sensitivity and specificity change as a function of the wildtype prior and the threshold *Φ*_{T} used to distinguish the cellular prevalence cutoff of a mutation cluster. In Fig. 2, we tested MuClone with wildtype prior values 0.5, 0.75, and 0.99, and with *Φ*_{T} values 0.001, 0.01, 0.02, 0.03, and 0.04. In the case that the wildtype prior equals 0.5, we assumed that a locus is equally likely to be a mutation or not (when no other information is provided). MuClone’s sensitivity and specificity are near 1 for *Φ*_{T} equal to 0.02 and wildtype prior equal to 0.5. As expected, with small values of *Φ*_{T}, the sensitivity and specificity decrease since it is difficult to distinguish between wildtypes and mutations with small cellular prevalences. The sensitivity also decreases for large values of *Φ*_{T} because mutations are miscalled as wildtypes. When the error rate was 0.01, and wildtype prior was 0.5, the optimal *Φ*_{T} was about 0.02. We used these values for the following experiments.

The performance of MuClone was tested with various tumour content (from 0.1 to 0.99) and different error rates (0.01 and 0.001); see Fig. 3. For samples with tumour content greater than 0.5, the sensitivity and specificity remain close to 1. Sensitivity and specificity decreased to only about 0.9 when the tumour content in the sample is as low as 0.1. These results establish promising performance over different ranges of tumour content with different error rates (likely scenarios in real data).

In addition, we also explored the performance of MuClone for samples with different coverage (mean depth): 30, 60 and 100; see Fig. 4. Intuitively, the performance is higher when we have more coverage. Since MuClone leverages cellular prevalence information to improve the performance of mutation detection, the performance gain is noticeable when the variant allelic ratio resolution supports the given cellular prevalence resolution (in our analysis the cellular prevalence of mutations is greater than 0.02).

Figure 5 demonstrates how well mutations are classified by MuClone. The input clonal information was perturbed by adding noise from zero mean normal distribution with standard deviation 0.01 to simulate a more realistic scenario. In Fig. 5a, each bin (*i*, *j*) shows the fraction of mutations in cluster *i* that are classified into cluster *j* by MuClone. Figure 5a shows that 85% of mutations are classified into the correct cluster.

In order to show that the classification errors occurred between clusters with small phylogenetic distance, we define a misclassification index to quantify phylogenetic distance; calculated as

$${mathrm{Misclassification}},{mathrm{index}} = frac{{mathop {sum}nolimits_{i ne j} {kern 1pt} q_{(i,j)} times frac{{{mathrm{dist}}_{(i,j)} – {mathrm{dist}}_i^{{mathrm{min}}}}}{{{mathrm{dist}}_i^{{mathrm{max}}} – {mathrm{dist}}_i^{{mathrm{min}}}}}}}{{mathop {sum}nolimits_{i ne j} {kern 1pt} q_{(i,j)}}},$$

(1)

where *q*_{(i,j)} is the number of mutations in cluster *i* that are classified into cluster *j*, and the Euclidean distance between the cellular prevalences of cluster *i* and *j* is denoted by dist_{(i,j)}. The distance of the closest and farthest cluster to cluster *i* is denoted by (mathrm{dist}_i^{{mathrm{min}}}) and (mathrm{dist}_i^{{mathrm{max}}}), respectively. In Fig. 5b, small misclassification indices demonstrate that misclassified mutations occur between close clusters. This can be interpreted as phylogenetically recently separated clusters.

### Real data

Two real data sets with multiple samples for each patient were used to evaluate the performance of MuClone. The first data set was multiple whole genome sequencing data from 7 patients with high grade serous ovarian cancer. The second data set was multiple whole exome sequencing data from 8 patients with non-small-cell lung cancer (NSCLC).

### High grade serous ovarian cancer

We tested MuClone’s performance on whole genome sequencing data (with depth 30x) from multiple tumour samples surgically resected from high grade serous ovarian cancer patients^{5}. The samples were obtained from different spatially distributed metastatic sites. Brief details about the number of samples for each patient, sample sites and the number of validated loci for each patient are shown in Supplementary Table 1. Germline mutations were excluded from the list.

The copy number, tumour purity, and mutation cluster information for experimentally re-validated mutation status were taken from the phylogenetic study of high-grade serous ovarian cancer (see the supplementary note of the paper^{5}). Mutation clusters were estimated with PyClone^{12} on the deep targeted sequencing data (>1000x coverage) from the same samples and in three patients with accompanying single cell sequencing data (see Table S16 in the phylogenetic study of high-grade serous ovarian cancer paper^{5}). Copy number and tumour purity estimates were calculated with the TITAN software^{28}. In order to eliminate germlines, loci with variant nucleotides in the corresponding normal sample were removed from the dataset. Then, the performance of MuClone was benchmarked against Strelka^{18}(v2.0.15), MutationSeq^{20}(v4.3.7), MuTect^{30}, FreeBayes^{25}(v1.2.0-2), MultiSNV^{23} and naive MuClone. Naive MuClone is a version of MuClone where no clonal information is provided (that is, all mutations are from an ancestral cluster).

In Fig. 6, the performance of MuClone is compared with other methods executed with default settings. For each patient, *p*, we assessed performance by averaging Youden’s index, sensitivity, and specificity across different samples. For patient *p*, with *n*_{p} samples, these are calculated as

$${mathrm{Sensitivity}}_p = frac{1}{{n_p}}mathop {sum}limits_{i = 1}^{n_p} {kern 1pt} {mathrm{Sensitivity}}_p^i,$$

$${mathrm{Specificity}}_p = frac{1}{{n_p}}mathop {sum}limits_{i = 1}^{n_p} {kern 1pt} {mathrm{Specificity}}_p^i,$$

(2)

$${hbox{Youden’s index}}_p = frac{1}{{n_p}}mathop {sum}limits_{i = 1}^{n_p} left( {{mathrm{Sensitivity}}_p^i + {mathrm{Specificity}}_p^i – 1} right),$$

where ({mathrm{Sensitivity}}_p^i), ({mathrm{Specificity}}_p^i) and ({hbox{Youden’s index}}_{p}^{i}) are the sensitivity, specificity and Youden’s index of sample *i* and patient *p*, respectively. In aggregate, MuClone outperforms other methods by improving sensitivity without compromising specificity; see Fig. 6. False negatives arise mainly because the WGS data is under-represented (the average depth of the WGS data is about 30x) and lacks variant alleles that are present in the targeted sequencing data. False positives arise due to erroneous signal from sequencing technical artefacts.

In Fig. 6, Strelka, MutationSeq, MuTect and Naive MuClone have lower performance as they do not incorporate information across multiple samples. FreeBayes was run on multiple samples and germlines were removed manually, but since the method only considers tumour samples, it has the lower performance versus other methods.

To assess the performance of MuClone and MultiSNV, we conducted a two-sided *t*-test for the difference in the mean of Youden’s index evaluated on mutation calls from MuClone and MultiSNV. The 95% confidence interval is [0.03, 0.1], with *p*-value equal to 0.0006; this shows that the difference is statistically significant. Importantly, MuClone improves sensitivity, enabling the detection of more mutations across the whole genome.

Figure 7 depicts the classification of mutations into clusters relative to the ground truth, as defined by running PyClone on the data (omitting singleton clusters^{5}). Each bin (*i*, *j*) of Fig. 7 shows the fraction of mutations in cluster *i* that are classified into cluster *j* by MuClone; 93% are correctly classified by MuClone. Moreover, we notice that misclassified mutations are classified into phylogenetically similar clusters (the misclassification index for patient 1 was 0.015).

### Non-small-cell lung cancer

We tested MuClone’s performance on early-stage NSCLC samples from the TRACERx data set^{7}. To help obtain the clonal and subclonal census, multiple tumour regions for each patient were sequenced by Illumina HiSeq. We used the copy number, purity estimate, and the mutation cluster information available in TRACERx study Supplementary Material^{7}. In the TRACERX study, the cellular prevalence was calculated from the whole exome sequencing data on a set of stringent mutations that were selected from MuTect and VarScan2 results with post-processing. In addition, the TRACERx study added a few mutations to reduce missed subclonal mutations; see Supplementary Appendix of TRACERx study^{7}.

To compare the performance of MuClone with Strelka, MultiSNV, and MuTect, we randomly selected 8 patients with subclonal mutations from the TRACERx data set (see Table S2 Supplementary Appendix 1 of the paper^{7}). The TRACERx study generated a re-validated and curated list of mutations for their analysis; see Supplementary Appendix 2 of TRACERx study^{7}. The mutations with full copy number information across all 8 patients were used as ground truth to evaluate performance.

We evaluated the false negative rates of mutation calling across several methods; see Table 1. Altogether, out of 7238 mutations, MuClone missed 475 mutations while Strelka, MultiSNV and MuTect missed 7205, 5720, and 1086 mutations respectively. Hence, borrowing statistical strength, as done in MuClone, across samples likely increases sensitivity to real mutations.

We next ran MuClone, MultiSNV and MuTect on the whole exome data from multiple samples of 8 patients to ascertain specificity. We note that MuClone removes reads with mapping quality less than 5 and for positions that have (i) a variant nucleotide in a normal sample, (ii) more than 40% filtered basecalls (A basecall is filtered if more than 3 mismatches occur between the read and the reference within a window of 20 bases on each side of the site.); or (iii) more than 75% of the reads that cross the site have deletions in any of the samples^{18}. For exome sequencing data, mutations were called if the corresponding MuClone mutation probability is greater than 0.9. The other methods were executed with default settings. The total number of calls and the number of common calls between different methods (restricted to positions with copy number information) at the patient level is depicted in Fig. 8. A high degree of variation across callers is observed. Altogether, MuClone called 13,556 mutations while MultiSNV and MuTect called 31,374 and 11,915, respectively. MultiSNV output the largest number of calls in all of the samples, while MuTect and MuClone output similar number of calls. Figure 8 also demonstrates the mutations used in the TRACERx study and their overlap with the mutation calls in different methods. The set of mutations overlapping between MuClone and TRACERx is most similar; this suggests that the increase in sensitivity conferred by MuClone does not come at the expense of specificity.

We also explored the performance of MuClone when clonal information differs in the number of input clusters or the value of the cellular prevalence; see Fig. 9. We perturbed the value of the cellular prevalences (estimated by PyClone) by adding noise from a mean zero normal distribution with different standard deviations: 0, 0.01, 0.1, and 0.2. We see that MuClone is robust to slight changes of cellular prevalence values. We also shielded MuClone from different fractions of the clonal information and that decreased the performance more than adding noise. In general, this result shows that more accurate clonal information provides better mutation detection.

### Conclusion

We studied the use of clonal information for the purpose of somatic mutation detection and classification in multi-sample whole-genome sequencing data. The proposed statistical framework uses the clusters cellular prevalences and copy number information for detection and classification of low prevalence mutations. Our proposal, MuClone, outperformed other popular mutation detection tools, while providing the added benefit of classifying whole genome sequencing mutations into biologically relevant groups. Both synthetic and real data results showed that using the cellular prevalences of tumour clusters can improve mutation detection sensitivity. Importantly, our results suggest improvement in sensitivity can be achieved without compromising specificity.

Since the accuracy of detecting mutations can affect the performance of phylogenetic analysis, we suggest improvement from using MuClone will impact the field of multi-region sequencing for cancer evolution studies. As the field matures, we expect that the method presented here will be incorporated into more analytically comprehensive modelling of whole genome sequencing data when multiple samples are used to infer properties of clonal dynamics. Next steps are in developing a unified iterative algorithm that alternates between identifying the phylogenetic structure of the constituent clones comprising each tumour sample, and detection of mutations leveraging the new phylogenetic structure.

As sequencing costs continue to decrease (e.g., with Illumina’s NovoSeq platform), multi-sample whole genome sequencing of tumours will continue to proliferate (e.g., rapid autopsy program) as a viable experimental design. Thus, MuClone will be an asset in the arsenal of analytical methods deployed to interpret evolutionary properties of cancer and to gain insights into clonal dynamics in time and space.