T-Distributed Stochastic Neighbor Embedding (tSNE) is an algorithm for performing dimensionality reduction, allowing visualization of complex multi-dimensional data in fewer dimensions while still maintaining the structure of the data.
tSNE is an unsupervised nonlinear dimensionality reduction algorithm useful for visualizing high dimensional flow or mass cytometry data sets in a dimension-reduced data space. The tSNE platform computes two new derived parameters from a user defined selection of cytometric parameters. The tSNE-generated parameters are optimized in such a way that observations/data points which were close to one another in the raw high dimensional data are close in the reduced data space. Importantly, tSNE can be used as a piece of many different workflows. It can be used independently to visualize an entire data file in an exploratory manner, as a preprocessing step in anticipation of clustering, or in other related workflows. Please see the references section for more details on the tSNE algorithm and its potential applications [1,2].
FlowJo v10.1r7 uses a Plugin mechanism for running tSNE. Plugins can be accessed and run through the Plugins menu (Workspace tab –> Populations band).
- Please see the Installing Plugins page for detailed information on how to obtain and setup FlowJo plugins.
- The tSNE plugin does not require R.
While tSNE is a powerful visualization technique, running the algorithm is computationally expensive, and the output is sensitive to the input data. This section will briefly cover a few key points in the area of preparing your data.
- Cleaning up your data – The best analyses begin with cleaning up raw data to exclude doublets, debris, and dead cells. This step reduces noise in the data and can improve the tSNE algorithm output. In addition, gate to include only the cells of interest (e.g., gating on CD3+ if T cells are of primary interest).
- Downsampling – tSNE computation time scales with the number of input events. A Downsample Gate tool is available under the Plugins menu (Workspace tab –> Populations band), allowing a subset of events from a parent population to be selected and placed in a child Downsample Gate. Initiating tSNE calculation on a Downsample Gate containing 10,000 events versus 50,000, or 100,000 events, will significantly reduce calculation time.
- Parameter Selection – In addition to choosing which events to use in your tSNE calculation it is also important to choose appropriate parameters. If your data set is fluorescence-based, select only compensated parameters (Comp-Parameter::Stain Reagent). Do not include parameters that may have been collected, but were not utilized in the staining panel and leave out any common parameters that do not vary across the sample population you are investigating. Inclusion of irrelevant parameters can add background noise in the calculation without contributing to the signal.
- Data Scaling – The tSNE algorithm is particularly sensitive to events that exceed the bounds of your plots. Therefore, you may see an improvement in your results if your data is properly transformed and on scale. For more information on this subject, see our help pages on transformations in FlowJo v10.
- Workflow- Because tSNE relies on the initial state of events seeded into the algorithm, the output will not be identical for different samples. One rational approach to comparing multiple samples might be to downsample and concatenate all samples together, creating a new keyword-based parameter denoting disease state, treatment group or study arm. Gate on the new parameter to sort out events from different study conditions and compare them in a common dimension reduced space. However, this approach is limited given the low number of events that can be fed into the algorithm. Since it is practically limited to about 100,000 events, that means if you have 50 samples, you can select only 2000 events from each.
Creating tSNE Parameters
- Open FlowJo v10.1r7 or later.
- Select/highlight a sample or gated population node within the samples pane of the FlowJo workspace. Try starting with a low number of events by creating a Downsample Gate, and initiating tSNE on the downsampled population.
- Under the Plugins menu (Workspace Tab –> Populations Band), select TSne… This will bring up a Choose Selected Parameters window with options.
- Select the parameters to be used for the tSNE calculation. If your data is fluorescence-based, make sure to choose only compensated parameters (denoted by the Comp- prefix).
- Select technical options (optional). Defaults have been provided as a starting point and should be acceptable for many data sets. However, users may need to explore varying these options to produce a good output. Users are urged to read below and refer to the references section for further reading on the tSNE algorithm when varying technical options.
- Initiate the calculation by pressing the Choose Selected Parameters button. The algorithm will run on the input population selected, utilizing selected options. The Platform will create two new parameters, which are the dimension-reduced outputs from the algorithm.
Iterations – Maximum number of iterations the algorithm will run. A value of 300-3000 can be specified.
Perplexity – Perplexity is related to the number of nearest neighbors that is used in learning algorithms. In tSNE, the perplexity may be viewed as a knob that sets the number of effective nearest neighbors. The most appropriate value depends on the density of your data. Generally a larger / denser dataset requires a larger perplexity. A value of 2-100 can be specified.
Eta (learning rate) – The learning rate (Eta), which controls how much the weights are adjusted at each update. In tSNE, it is a step size of gradient descent update to get minimum probability difference. A value of 2-2000 can be specified.
Generate Movie- When checked, a movie of the tSNE calculation is recorded. To view, click Movie of Previous TSne Run… when the calculation is complete.
Iterations per Frame- When Generate Movie is checked, this value specifies the number of iterations between each frame in the movie output.
Limit Duration- When box is checked, the algorithm will stop running after the specified amount of time.
Visualizing the tSNE data space
The Graph Window
When the tSNE calculation on a sample is complete, new tSNE parameters will become available within the drop down parameter list of the graph window. The tSNE parameters can be used in any graphic, gating, or other analysis, as can any of the original sample parameters. A gate created on the tSNE parameters should really only be done as a subset of the original gate used to create the parameters.
- Double click on the gated population used to calculate tSNE (ex. DownsampleDP.Pop gate). This will open a graph window. Select BTSNE_X vs BTSNE_Y to view the reduced data space in the same orientation as the Create tSNE Parameters window displayed during the calculation.
- Regardless of the plot type, gates can be drawn and subsets of events isolated based on their position within the reduced tSNE data space.
The Layout EditorOverlays of gated populations with known phenotypes can be displayed in the dimensional reduced tSNE space using the Layout Editor. In the example below, we have taken the original Downsample Gate and overlaid manually gated subsets. Note the distinct separation of markers in different regions of the continent structure.
1. Maaten and Hinton (2008). “Visualizing data using t-SNE.” Journal of Machine Learning Research, 9: 2579–2605.
2. Wallach, I.; Liliean, R. (2009). “The Protein-Small-Molecule Database, A Non-Redundant Structural Resource for the Analysis of Protein-Ligand Binding”. Bioinformatics 25 (5): 615–620.
For more information on installing and running specific Plugins:
Questions about plugins or FlowJo? Send us an email at FlowJo@BD.com