Certain sequencing methods will utilize index sorting from flow cytometry as an upstream step in sample preparation. In these cases, many researchers don’t realize they’ve already gathered a powerful dataset which can be used to compliment the sequencing analysis.
SeqGeq gives users the ability to very easily combine both fluorescence intensity information per flow channel, per cell (or per sample in the case of bulk sequencing) with their expression matrix from sequencing.
Most of the data illustrated in this workflow is available as one demo data file which is already combined. This data was generously provided by the Advancing Medicines Partnership (AMP) group. This raw data is also available on the Immport data sharing site under study code: SDY997
The Index Sorting plugin for FlowJo can be used to export fluorescence intensity values from each well in an index sorted FCS file, in CSV format.
In this case cells are named by their plate and WellIDs.
Such data can be easily merged with your expression matrix from single cell sequencing (or potentially from bulk sequencing as well). The one hitch here, your Flow data will need to contain a Cell ID column, which matches precisely the Cell ID column in your single-cell sequencing data file – Something to consider careful ahead of time when designing an experiment combining Flow and scRNA-Seq techniques:
Once you’ve established a proper Cell ID in both data matrices, you can simply drag and drop the Flow data file right on top of the sequencing matrix in SeqGeq’s workspace to merge the two.
Quality Control for mixed omics data will work much the same as with other data sets in terms of cells – consisting of the removal of outlier events which might consist of doublet events or empty reads:
For genes also, filtering outliers is recommended. Over expression among all cells is indicative of housekeeping features, while dimly expressing genes will contribute noise to downstream clustering. Secondarily, highly dispersed genes will tend to yield better downstream clustering:
Flow markers can be separated into a parameter set for reference throughout the analysis using a static Geneset:
Mixing proteomics and genomics data for clustering and dimensionality reduction will typically give better separation of biologically distinct clusters in dimensionality reduction.
A Boolean union of genesets can be used to combine these parameter-sets:
The results from dimensionality reduction using highly dispersed genes combined with fluorescence parameters from Flow and color mapping by cell type:
This is because the parameters generated in Flow are generally richer (as opposed to the sparsity inherent in sequencing), more bi-modal, involving a greater dynamic range, and not subject to dropouts.
Combining these two modalities of data acquisition is particularly powerful in that it disambiguates one of the most difficult steps in sequencing data analysis – Cell Calling.
Using surface receptors to identify cell types can establish a ground truth for cluster identity within single cell sequencing experiments. This is accomplished in just the same way a gating tree is established in normal Flow analyses:
Alternatively, or in conjunction with Multigraph Color Mapping, the phenotype of islands in tSNE space becomes clear:
The presence of transcriptome information then provides an excellent starting point for an incredibly deep exploration into the data, and amazing opportunities for discovery. See information on the Monocle plugin for SeqGeq and Differential Expression Analysis for more details in that regard.
For any questions or comments on your multi-omics analyses we’d love to hear from you: firstname.lastname@example.org