Quality control (QC) in single cell RNA sequencing is extremely important at every stage of the workflow. Researchers are well aware of this concern and may have conducted various types of QC upstream of their analysis in SeqGeq. For those that haven’t yet, or those that want to refine or simply illustrate their QC in SeqGeq we have tools which can help.
Here we’ll often refer to “cells” in the expression matrix as observations or events (more common in flow cytometry), since the purpose of the QC in cells is to remove ambiguous or questionable data points, and these terms are more accurate in that case.
There are three main steps: (i) visualizing, (ii) assessing, and (iii) filtering usually performed in this order at an early stage of the analysis process, which we’ll discuss here.
Quality Control Plots
You can create a set of parameters for use in QC by clicking the Quality Control button within the Analyze tab of SeqGeq:
This will generate three plots, one in cell view and another two in gene view.
Assessing Cell Quality
One very common way to check cell quality is by setting a filter on those observations ranked first in gene’s expression versus gene expression per cell. Also called “Knee Calling” due to the shape of this plot. To create a double knee plot in SeqGeq researchers can simply change their cell QC plot x-axis to “Rank” (leaving the y-axis default as “genes_expressed_per_cell”). Then adjust these axes scales to log:
Note: The top plateau of cells contains those of the highest quality, the middle plateau is more questionable (likely dead or dying cells), and the final plateau represents debris and empty reads.
Typically parameters found in few cells and with only a minimal expression value in any cell are of less interest than others, and can be easily removed by gating out in the gene view QC graph window (note the axes scale settings here have also been changed to log-log):
Gating in the first Gene View Graph Window will produce a gene set containing the parameters filtered by the gate. You can filter on GeneSets within the GeneView Graph Window by selecting that GeneSet in the top left hand corner of the Gene View window:
Selecting the QualityGenes GeneSet while viewing Cells_Expressing vs Dispersion(1) parameters (within the second Quality Control Gene View graph window), will allow you to filter further on Consistently Over-expressed Genes (aka “COGs”):
Users can enter thresholds in Graph Windows using the Manual Gating tool, found within the Graph tab of that dialog:
The resulting dialog will allow you to enter a custom set of filtering thresholds:
(1) Dispersion Index = Variance / Mean
If you have any questions regarding QC, feel free to reach out and send whatever examples you can (screenshot illustrations, or even whole GeqZip files): email@example.com