Gene sets (hallmark gene lists, or gene tables) are lists of genes known to be related or expressed in the same cell population.

The term simply refers to a list of genes, usually associated with a particular biological function or cell population. The Gene Set is a fundamental concept of SeqGeq, which helps to organize the immense number of genes associated with single-cell RNA-seq data into smaller, more easily wielded pieces.

In SeqGeq gene sets function as filters, heatmapping tools, and independent parameters within the graph window and the layout editor. These artifacts help researchers gain quick and powerful insights for a dataset based on the transcriptional profile of populations therein. For example, a gene set might comprise known genes required in a given pathway, or known to be expressed by a particular cell subset.

In SeqGeq, these gene sets come in two flavors:

Static Gene Set

A list of genes that can be imported from a CSV file, or created manually from the current list of parameters within your data within SeqGeq’s Gene Sets tools.

GeneSet Tools

SeqGeq workspace window, illustrating the Gene Sets tools.

Analytic Gene Set

A gate drawn on a pivot window illustrating normalized genes between two populations (geometric in nature) -or- A set of genes defined algorithmically; for example using tSNE, or PCA.

How to Use Gene Sets

Gene sets help you organize your parameter lists into more manageable groups, essentially filtering down to genes of interest. Gene sets can also be used to quickly heatmap populations of cells within the Graph window or in the Layout Editor. Additionally, gene sets can themselves be used as parameters on which to gate cells directly. Within SeqGeq, these parameters are called “Synthetic Parameters,” and work much like Derived Parameters do in FlowJo. By default, synthetic parameters are algebraically simply a sum of transcripts recorded from each gene comprising a given gene set, for each cell of a given population. The normalization can be adjusted by users to calculate the sum (default), mean, median, or geometric mean of the genes in the set.

For example, a researcher might find the list of genes that identify killer T cells (in other words, CTL) in a journal article, and use that to create a killer T cell gene set. If the researcher then identifies a separate list of genes transcribed uniquely in activated cytotoxic T cells, this would allow them to further distinguish between activated and naive CTLs by investigating a graph of the CTL gene set versus the activation gene set.