Deterministic column subset selection for single-cell RNA-Seq

Bioinformatics

in Statistical Analysis

5 hours ago
81 Views

Analysis of single-cell RNA sequencing (scRNA-Seq) data often involves filtering out uninteresting or poorly measured genes and dimensionality reduction to reduce noise and simplify data visualization. However, techniques such as principal components analysis (PCA) fail to preserve non-negativity and sparsity structures present in the original matrices, and the coordinates of projected cells are not easily interpretable. Commonly used thresholding methods to filter genes avoid those pitfalls, but ignore collinearity and covariance in the original matrix.

Researchers from Cal Berkeley show that a deterministic column subset selection (DCSS) method possesses many of the favorable properties of common thresholding methods and PCA, while avoiding pitfalls from both. We derive new spectral bounds for DCSS. We apply DCSS to two measures of gene expression from two scRNA-Seq experiments with different clustering workflows, and compare to three thresholding methods. In each case study, the clusters based on the small subset of the complete gene expression profile selected by DCSS are similar to clusters produced from the full set. The resulting clusters are informative for cell type.

Count-variance plot for each column of A, where A is the data matrix from the mouse cortex scRNA-Seq experiment and the clustering workflow rna-seq

The color for each column represents whether the column is selected or not by k = 5, ϵ = 0.1 DCSS. The plot also shows the thresholds for count, variance, and index of dispersion with same number of selected columns as DCSS. The columns selected by DCSS are highly variable and have large counts.

Availability – The Python package containing code to perform the methods described in the article can be found at https://github.com/srmcc/dcss_single_cell.git.


McCurdy SR, Ntranos V, Pachter L (2019) Deterministic column subset selection for single-cell RNA-Seq. PLoS ONE 14(1): e0210571. [article]

Articles You May Like

Correlation Structure in Micro-ECoG Recordings is Described by Spatially Coherent Components
New infographic available from Technology Networks
NASA Says Ultima Thule Actually Looks Like A Pancake And A Walnut
When Teens Threaten Violence, A Community Responds With Compassion
MultiDomainBenchmark: a multi-domain query and subject database suite

Leave a Reply

Your email address will not be published. Required fields are marked *