t-distributed stochastic neighbor embedding (t-SNE) is a machine learning dimensionality reduction algorithm useful for visualizing high dimensional data sets.
t-SNE is particularly well-suited for embedding high-dimensional data into a biaxial plot which can be visualized in a graph window. The dimensionality is reduced in such a way that similar cells are modeled nearby and dissimilar ones are modeled by distant points, with a result generally giving precious insights into many subpopulations.
As t-SNE can take a long time to run on a high number of dimensions, in SeqGeq you should rather start with a PCA and base your t-SNE calculation on the newly created analytical parameters rather than all gene reads.
How to Start t-SNE in SeqGeq
To start select a population in the workspace and click on the dimensionality reduction icon located in the discovery band.
A setting window will prompt you to choose the method you’d like to use. Select t-SNE in the list and, choose the name of your run. Next select the genes you’d like to take into account for the dimensionality reduction. Check the ‘All Genes’ box if you’d like to use them all or click on ‘Select Genes’ if you wish to make a custom selection.
In the selector window SeqGeq lets you choose individual genes, gene sets, or parameters. To speed up the calculation time and improve the t-SNE results it’s generally useful to select analytical parameters rather than genes or gene sets. Select ‘Parameters’ at the top of the window to display all your analytical parameters. Then choose the parameters you’d like to consider. In the example above, if we want to consider the six PCs generated by a previous dimensionality reduction by PCA we have to select them and click ‘Add Selected >’ or simply choose ‘Add All >>’.
Additionally, there are several options to adjust in order to optimize the t-SNE calculation:
- Iterations: Maximum number of iterations to perform.
- Perplexity: Perplexity parameter (i.e., optimal number of nearest neighbors).
Click ‘Advanced Settings’ to access more options:
- The initialization, which can be deterministic (default) meaning the result will always look the same (i.e., the same populations will allows be at the same position for the same sample) or random.
- Theta, which sets an approximation thoroughness. A higher value will reduce the calculation time but also the accuracy, whereas a lower value will increase both calculation time and accuracy.
- P Value Adjust Factor, which determines how much larger the space between natural clusters will be in embedded space than it was in the original space.
- P Value Adjust Iteration, controls the number of iteration to run in consolidating the embedded space, and thereby tighten the space between clusters.
- Momentum Switch iteration, which gives the impact of local variations on data trends. A higher value will reduce the impact of such variations.
- Initial Momentum, which sets up the impact of the original trend on overall trends. A higher value will increase the impact of the initial trend.
- Final Momentum, is the impact given to a final trend on the overall trend. A higher value will increase the impact of the final trend.
- SNE ETA, The factor indicating how much the step size is adjusted at each update. A higher value will increase the calculation time but might give worse results. Recommended is between 100 and 1,000.
(Mousing over any of the settings will give some feedback in a tooltip, in case you forget what each option is doing.)
Once done, hit ‘Save’ to go back to the t-SNE run setting window.
After you’ve selected all the options and the parameters you’d like to take into account simply press ‘Run’ at the bottom right corner.
t-SNE Output and Exploration
Depending on the computer, the number of cells, and the number of dimension the calculation may require several minutes to complete. After which time a new graph window will automatically be displayed illustrating the newly created data space.
You can explore the data directly in the graph window and visualize all the artificial dimensions. To select different analytical parameters, open the x- or y-wing, highlight ‘Analytical Parameters’ in the upper panel and the dimension in the lower one.
Note: Heatapping in the Color Map Axis area of the graph window can be very useful for exploring the various islands generated in t-SNE space, based on known genes or gene-sets.
If you would appreciate personalized feedback on t-SNE plots you’re working with, or simply have general questions about the process, we would welcome emails to firstname.lastname@example.org.