The CellOntology plugin is extremely useful for discovery work, especially when investigating new populations of cells that you may be unfamiliar with. Consider the following example:
We have a multi-parametric data set with a panel that includes the standard T, B, and monocyte subsets. In addition to the standard panels, we have included markers that we are less familiar with but would like to know if their expression pattern follows a particular lineage. We would also like to know if these markers, in conjunction with the standard panels, define a known cell type or not.
Step 1 – Clean up gates
To begin our analysis, we will create a series of gates to remove debris, doublets and dead cells. We will begin our “discovery” analysis on the cleaned up live cells population.
Step 2 – Terminal populations
We will continue creating our gating hierarchy defining broad categories (T-cells, B-cells) and any familiar terminal populations we have markers for (E.g. CD3+/HLADR-/CD45RA+/CD8+ and CD3-/HLADR-/CD19+ cells). The broad and/or terminal populations will help identify the context of the cells we are “discovering” when compared in an overlay or NxN plot.
Step 3 – Downsample
We intend to use a clustering algorithm to reduce the dimensional space of my data set. Since it is computationally expensive, we will reduce the number of events that will be fed into the clustering algorithm to 15,000. We begin our data reduction by using the downsample plugin on the live cells population.
Step 4 – Dimensionality reduction/ Clustering
With the remaining 20+ parameters we would like cluster cells into groups that share common features. There are several algorithms that we can choose to perform this function, but we will stick with tSNE (T-distributed Stochastic Neighbor Embedding) for this example. We begin the tSNE algorithm by selecting the downsample population from Step 3, and selecting the tSNE plugin from the populations band.
Step 5 – Define the tSNE space
The data now contain tSNE-X and tSNE-Y parameters. Using these in a graph window, we can see how our data have separated (quite nicely) into distinct “continents”. The distance between continents implies a difference in the phenotype of the cells in one group versus another. At this point, we would like to know where our known subsets of cells map onto the tSNE space. To do this, we will create an overlay in the Layout Editor:
- Open the Layout Editor.
- Drag in the population containing the tSNE parameters (e.g. Downsample-Live cells).
- The tSNE plot should appear.
- Drag a defined population (e.g. CD8 T-cells, B-cells, Naive CD4) over the tSNE plot in the Layout editor.
- The overlay will depict where the defined population exists in the context of the reduced dimensionality space (continents).Unknown populations remain as red clusters, a representative group indicated by the black arrows (red clusters) in the figure.
At this point the phenotype of the unknown populations depicted in the overlay are: CD3-/CD4-/CD8-/CD45RA-/CD19-/CD14-
Step 6 – Identify the unknown continents
To identify which markers are responsible for creating the remaining continents (indicated by black arrows in the figure above; also any red clusters) we will create a gate on a continent and explore the remaining markers’ expression using an NxN plot. To do this:
- Drag the Downsample-Live population into a fresh layout.
- Right click on the graph in the Layout Editor and select “Multigraph Overlays”, then select “NxN” plot.
- Half of a 10×10 plot should appear next to the plot.
- Right-click/ control-click on the NxN plot to modify the relevant markers. (Remove markers that have already defined continents from analysis Step 5 – define the tSNE space)
- Return to the graph window from step 1.
- Create a gate on an “unknown” continent within the downsample population and give it a name. (Note: it may be easiest to use the autogating tool).
- Drag this population over the plot from step 1. The graph and the NxN plot should become an overlay.
- Interrogate the NxN plot for expression of markers that seem to correlate with one another. (For the sake of simplicity, I have removed extraneous negative markers from the NxN).
In this example, one of the continents was found to be CD11c+/CD38+/CD14-/CD16-
Step 7 – Identify the cell type
From the example above our collective marker set is: CD3-/CD4-/CD8-/CD19-/CD14-/CD16-/CD11c+/CD38+. To identify what this cell type might be, we will return to the FlowJo workspace and create a generic gate at any level in the hierarchy. We will name the gate “CD3-CD4-CD8-CD19-CD14-CD16-CD11c+CD38+”. Next we will use the CellOntology plugin to define the name of this of this population (if it exists).
At this point we will open the CSV file produced from activation of the CellOntology plugin. The CSV contains several columns indicating whether your query matched an item in the database, the gene and protein names of the markers, a ranking score and a Cell ID among others.
Format the CSV to view the column contents. Ensure that short marker names match your generic gate name (leftmost column). Next, check out the score. The closer the value is to 1, the higher the probability that your cell/population type is the one found in the Cell ID column. Copy the Cell ID corresponding to the highest ranking score and paste into your web browser. Links containing your Cell ID and to the CellOntology database will appear. Click on the link to see what is known about your cell type.
In our example, it appears the CL_0001026 Cell ID corresponds to a common myeloid progenitor, with some relationship to dendritic cells.
We can repeat the procedure from Steps 6 & 7 to identify more unknown cell types.
Functionally, FlowCL decomposes gated population names from a FlowJo gating hierarchy (ex. CD3+CD4+CD8-) into its individual markers (CD3, CD4, CD8) and translates their relative abundance into a relation used in the cell ontology database (CL)2 (such as + for has plasma membrane part), then performs the following steps:
- A SPARQL (http://www.w3.org/TR/rdf-sparql-query/) query against the CL fetches the labels and IDs corresponding to the input markers by text matching to the label or synonyms fields in the CL;
- The marker labels are used to retrieve a list of cell types that contain (or lack) the marker labels;
- The set of markers that make up each cell type is then retrieved;
- A final query retrieves all parents up to the root of the CL for each cell type to build a tree diagram of the results.
- Courtot, M. et al. (2014) flowCL: ontology-based cell population labelling in flow cytometry. Bioinformatics, 31(8):1337-1339.
- Diehl, A.D. et al. (2011) Hematopoietic cell types: prototype for a revised cell ontology. J. Biomed. Inf., 44, 75-79.
For more information on installing and running specific Plugins:
Questions? Send us an email at Techsupport [at] FlowJo [dot] com