We have described a rank-based single sample gene set scoring method, implemented in the R/Bioconductor package singscore. Our method can easily be applied on any high throughput transcriptional data from microarray or RNA-seq experiments. While our method is non-parametric, genes with low read counts should be filtered out, adjusted for gene length [24, 25], and ideally for GC content bias  and other technical artefacts [27, 28], because these may alter gene ranks within individual samples. Accordingly, RNA abundance data formatted as RPKM, TPM, or RSEM can be used, with or without log-transformation.
Using microarray and RNA-seq platforms of the TCGA breast cancer data, we show that our singscore approach yields stable scores for individual samples because they are treated independently from other samples, in contrast to GSVA, PLAGE, z-score, and ssGSEA (Fig. 1a). Although modifying ssGSEA to exclude the final normalization step (ssGSEA!Norm) also results in stable scores, the raw scores produced by the ssGSEA!Norm algorithm cannot be directly interpreted (e.g. a value of 0 carries no context). This issue became apparent when comparing unnormalized ssGSEA scores from either the GSVA or GenePattern implementations (using the same parameters) where it was observed that while the scores are highly correlated they are not directly comparable (Additional file 1: Figure S3). While normalisation procedures used by GSVA and ssGSEA can be useful with large representative data sets, scoring data subsets where the relative composition of sample types varies (such as can occur with permutation testing) can cause the score of an individual sample to be unstable. Evaluation of the type of small, imbalanced dataset which may be encountered in a clinical context is shown in Additional file 1: Figures S6 and S7. The singscore method includes per sample normalisation and scaling by considering the theoretical minima and maxima for scores in each sample, and can be applied to a single sample in isolation. We further show that this method has a high power to distinguish samples with distinct biology which is comparable to other methods, as well as high F1 scores when identifying gene sets with differential expression, similar to the GSVA and z-score methods (Fig. 1b, c and d).
We show that current implementations of both ssGSEA!Norm and ssGSEA through the GSVA package are computationally much slower than all other methods when scoring samples against a large number of random signatures (Additional file 1: Figure S8), while our approach is fast. We note that while the performance of PLAGE is poor across the majority of comparisons performed in this study, this may be attributed to the fact that the activity of a gene set is computed by projecting samples onto the first eigen-vector of the expression matrix. Due to this computation, scores vary a lot with changes to sample composition, and the values may rotate around 0, similar to how projections of observations can vary when performing PCA on sub-sampled datasets. These observations suggest that the PLAGE method is fundamentally different from all the other scoring approaches and should be used only for within dataset analysis and not analyses between data sets.
We compared breast cancer cell lines overlapping between the CCLE and Daemen et al. data and showed high consistency in epithelial and mesenchymal scores obtained by our scoring approach for the majority of cell lines (Fig. 2b). Because only a small subset of cell lines show large differences in epithelial or mesenchymal scores between the two data sets it is tempting to speculate that variations in scores are not due to the differences in technical or computational pipelines, which would have affected all the cell lines in the analysis. Rather it is possible that differences reflect real biological variation in the molecular phenotypes of some cells: the more-variable cell lines within the Daemen et al. data have hybrid epithelial-mesenchymal phenotypes (i.e. high epithelial and mesenchymal scores); these cell lines may have a greater degree of epithelial-mesenchymal plasticity allowing variations in their EMT phenotype under different experimental conditions. Interestingly, all cell lines with relatively large variations along the mesenchymal axis and smaller differences in epithelial scores are luminal cell lines which in general are shown to have strong epithelial phenotype (Fig. 2b, and ), while most cell lines with large shifts on the epithelial axis and little change in mesenchymal scores are claudin-low cell lines which have been shown to be strongly mesenchymal (Fig. 2b, and ).
More recent single sample methods such as personalized pathway alteration analysis (PerPAS)  have not been discussed here, as they are fundamentally different from our approach. For example, PerPAS needs topological information for each gene set to perform pathway analysis, and further uses either a control sample or a cohort of normal samples based on which the gene expression data in single samples are normalised ; these requirements make this method unsuitable for many available datasets.
We also note that methods requiring a large number of samples and a balanced composition to calculate a precise and stable score for individual samples may need to be re-run several times across a large data set when new samples are added. This adds extra complexity which may not be obvious to most users running such scoring methods. Our singscore method provides a simple and easy-to-understand pipeline which is also computationally fast. This method performs as well as the comparable scoring methods in large data sets in terms of stability, while outperforming them in smaller data sets by providing more stable scores, which are also easily interpretable. We further show that, excluding PLAGE which had relatively poor performance in these tests, all methods had a similar power, type I error, and/or F1 score when the signal to noise ratio was high, however, the singscore, GSVA and z-score performed slightly better for data with a less prominent signal. Several visualisation options at both the bulk and single sample level are provided in the R/Bioconductor package singscore to enable users to explore genes, gene signatures, and samples in more depth.