Discovery Example

The CellOntology plugin is extremely useful for discovery work, especially when investigating new populations of cells that you may be unfamiliar with. Consider the following example:

We have a multi-parametric data set with a panel that includes the standard T, B, and monocyte subsets. In addition to the standard panels, we have included markers that we are less familiar with but would like to know if their expression pattern follows a particular lineage. We would also like to know if these markers, in conjunction with the standard panels, define a known cell type or not.

Step 1 – Clean up gates

To begin our analysis, we will create a series of gates to remove debris, doublets and dead cells. We will begin our “discovery” analysis on the cleaned up live cells population.

Step 2 – Terminal populations

We will continue creating our gating hierarchy defining broad categories (T-cells, B-cells) and any familiar terminal populations we have markers for  (E.g. CD3+/HLADR-/CD45RA+/CD8+ and CD3-/HLADR-/CD19+ cells). The broad and/or terminal populations will help identify the context of the cells we are “discovering” when compared in an overlay or NxN plot.

Step 3 – Downsample

We intend to use a clustering algorithm to reduce the dimensional space of my data set. Since it is computationally expensive, we will reduce the number of events that will be fed into the clustering algorithm to 15,000. We begin our data reduction by using the downsample plugin on the live cells population.

Step 4 – Dimensionality reduction/ Clustering

With the remaining 20+ parameters we would like cluster cells into groups that share common features. There are several algorithms that we can choose to perform this function, but we will stick with tSNE (T-distributed Stochastic Neighbor Embedding) for this example. We begin the tSNE algorithm by selecting the downsample population from Step 3, and selecting the tSNE plugin from the populations band.

Step 5 – Define the tSNE space

The data now contain tSNE-X and tSNE-Y parameters. Using these in a graph window, we can see how our data have separated (quite nicely) into distinct “continents”. The distance between continents implies a difference in the phenotype of the cells in one group versus another. At this point, we would like to know where our known subsets of cells map onto the tSNE space. To do this, we will create an overlay in the Layout Editor:

  1. Open the Layout Editor.
  2. Drag in the population containing the tSNE parameters (e.g. Downsample-Live cells).FlowJo Layouts_ 23-Mar-2016.wsp-1
  3. The tSNE plot should appear.
  4. Drag a defined population (e.g. CD8 T-cells, B-cells, Naive CD4) over the tSNE plot in the Layout editor.
  5. The overlay will depict where the defined population exists in the context of the reduced dimensionality space (continents).FlowJo Layouts_ 23-Mar-2016.wsp-4Unknown populations remain as red clusters, a representative group indicated by the black arrows (red clusters) in the figure.

At this point the phenotype of the unknown populations depicted in the overlay are: CD3-/CD4-/CD8-/CD45RA-/CD19-/CD14-

Step 6 – Identify the unknown continents

To identify which markers are responsible for creating the remaining continents (indicated by black arrows in the figure above; also any red clusters) we will create a gate on a continent and explore the remaining markers’ expression using an NxN plot. To do this:

  1. Drag the Downsample-Live population into a fresh layout.
  2. Right click on the graph in the Layout Editor and select “Multigraph Overlays”, then select “NxN” plot.
  3. Half of a 10×10 plot should appear next to the plot.FlowJo Layouts_ 23-Mar-2016.wsp-2
  4. Right-click/ control-click on the NxN plot to modify the relevant markers. (Remove markers that have already defined continents from analysis Step 5 – define the tSNE space)
  5. Return to the graph window from step 1.
  6. Create a gate on an “unknown” continent within the downsample population and give it a name. (Note: it may be easiest to use the autogating tool).Sample 9 Hu PBMC deep pheno kit_cells_found.txt_ Time_Singlets_singledown.Pop - FlowJo
  7. Drag this population over the plot from step 1. The graph and the NxN plot should become an overlay. FlowJo Layouts_ 23-Mar-2016.wsp-3
  8. Interrogate the NxN plot for expression of markers that seem to correlate with one another. (For the sake of simplicity, I have removed extraneous negative markers from the NxN).FlowJo Layouts_ 23-Mar-2016.wsp-5

In this example, one of the continents was found to be CD11c+/CD38+/CD14-/CD16-

Step 7 – Identify the cell type

From the example above our collective marker set is: CD3-/CD4-/CD8-/CD19-/CD14-/CD16-/CD11c+/CD38+. To identify what this cell type might be, we will return to the FlowJo workspace and create a generic gate at any level in the hierarchy. We will name the gate “CD3-CD4-CD8-CD19-CD14-CD16-CD11c+CD38+”. Next we will use the CellOntology plugin to define the name of this of this population (if it exists).

At this point we will open the CSV file produced from activation of the CellOntology plugin. The CSV contains several columns indicating whether your query matched an item in the database, the gene and protein names of the markers, a ranking score and a Cell ID among others.

Microsoft Excel-1

Format the CSV to view the column contents. Ensure that short marker names match your generic gate name (leftmost column). Next, check out the score. The closer the value is to 1, the higher the probability that your cell/population type is the one found in the Cell ID column. Copy the Cell ID corresponding to the highest ranking score and paste into your web browser. Links containing your Cell ID and to the CellOntology database will appear. Click on the link to see what is known about your cell type.

In our example, it appears the CL_0001026 Cell ID corresponds to a common myeloid progenitor, with some relationship to dendritic cells.

Cell Ontology - CD34-positive, CD38-positive common myeloid progenitor - Classes | NCBO BioPortal

We can repeat the procedure from Steps 6 & 7 to identify more unknown cell types.

Technical Details

Functionally, FlowCL decomposes gated population names from a FlowJo gating hierarchy (ex. CD3+CD4+CD8-) into its individual markers (CD3, CD4, CD8) and translates their relative abundance into a relation used in the cell ontology database (CL)2 (such as + for has plasma membrane part), then performs the following steps:

  1. A SPARQL (http://www.w3.org/TR/rdf-sparql-query/)  query against the CL fetches the labels and IDs corresponding to the input markers by text matching to the label or synonyms fields in the CL;
  2. The marker labels are used to retrieve a list of cell types that contain (or lack) the marker labels;
  3. The set of markers that make up each cell type is then retrieved;
  4. A final query retrieves all parents up to the root of the CL for each cell type to build a tree diagram of the results.

References

  1. Courtot, M. et al. (2014) flowCL: ontology-based cell population labelling in flow cytometry. Bioinformatics, 31(8):1337-1339.
  2. Diehl, A.D. et al. (2011) Hematopoietic cell types: prototype for a revised cell ontology. J. Biomed. Inf., 44, 75-79.

For more information on installing and running specific Plugins:

Questions? Send us an email at Techsupport [at] FlowJo [dot] com