In SeqGeq the dimensionality reduction platform helps to perform certain complex algorithms in just a few clicks.
The goal in dimensionality reduction is to reduce the number of variables under consideration (i.e., gene reads) and to obtaining a set of principal variables (i.e., analytical parameters). This is particularly useful as working with too many dimensions would be overwhelming and one needs some help to visualize this amount of information. Dimensionality reduction takes your high dimensional dataset and allows researchers to view it on a biaxial plot, while preserving variance contained in biological populations.
There are various aspects to consider when reducing the dimensionality and it’s important to keep in mind that depending on your findings, you might want to begin the dimensionality reduction cycle again. In this process, each iteration should help you tease out deeper and deeper insights from your data.
Algorithms available in SeqGeq
Currently SeqGeq gives three ways to reduce dimensionality. Each one of them performs a different calculation which can be combined to obtain better results:
- t-SNE (t-distributed stochastic neighbor embedding) is a machine learning unsupervised nonlinear dimensionality reduction algorithm useful for visualizing high dimensional data sets in a two parameter dimension-reduced data space.
- PCA (principal component analysis) creates a reduced dimensionality projection by multiplying the data by a vector that transforms it into the rotated version of itself to provides the best view of the differences, for as many principal components as required.
- LDA (linear discriminant analysis) is a similar kind of projection in the data but it explicitly attempts to model the difference between the classes of data rather than similarities.
How to use dimensionality reduction
To use the platform select a population in the workspace and click on the dimensionality reduction icon located in the discovery band.
A new window will pop-up prompting you to select the method you’d like to use, give a name to your run, and select the genes you’d like to take into account for the dimensionality reduction. Depending on the method chosen, different options are available below that you might wish to adjust. More information about the specific options to the different methods are detailed dedicated specific pages. Usually, and without any good reason not to do so, it’s better to try first with the default settings and adjust afterwards if necessary.
After selecting a method and a name for the run, the most important step is to select the gene (or analytical parameters) to be used for the calculation. Check the ‘All Genes’ box if you’d like to use them all or click on ‘Select Genes’ if you wish to make a custom selection.
A selector window will appear letting you choose individual genes, gene sets, or parameters. In the absence of specific knowledge about the genes you’re interested in, it’s generally useful to click ‘Add All >>’. If necessary you can remove only certain genes from the right panel after selecting them using the ‘Remove Selected x’ button. If this isn’t the first cycle and you already have a better idea or special needs, you can base your calculation on certain genes only using one or more gene sets. For certain applications – this is for example usually the case for t-SNE – one might want to use analytical parameters instead of genes. In that case, simply check the parameters box on the top of this window and then select the dimension you’d like to use, just as you would do for individual genes.
After the calculation
Depending on the computer, the number of cells, and the number of dimensions, the calculation may take several minutes to complete. After which a new graphical window will automatically be displayed showing you the newly dimension-reduced data space. You can explore the data directly in the graph window and visualize all the artificial dimensions. To select different analytical parameters, open the x- or y-wing, highlight ‘Analytical Parameters’ in the upper panel and the dimension in the lower one.
If you have specific questions about dimensionality reduction not covered here, we encourage you to write into firstname.lastname@example.org.