linear discriminant analysis (LDA) is dimensionality reduction method that explicitly attempts to model the difference between the classes of data rather than similarities.
LDA is a generalization of Fisher’s linear discriminant that characterizes or separates two or more classes of objects or events. The resulting combination can be used in SeqGeq as a linear classifier to better separate populations of interest. This usually helps to better identify two or more population based on genes or gene sets.
As LDA is specifically designed to maximize the ability to separate two or more populations within a data-set, it only makes sense to use it once some populations have been identified.
How to start a LDA in SeqGeq
To start LDA, select population(s) in the workspace and click on the dimensionality reduction icon located in the discovery band.
The dimensionality reduction setting window will prompt you to choose the method you’d like to use. Select LDA in the list and, at wish, change the name of your run. Select the genes or parameters just like any other method from the dimensionality reduction platform. Depending on the goal, you might want to select all of them or specific gene sets only.
Make sure the box includes all the populations you’d like to separate (and at least two of them). If this isn’t the case, or the box doesn’t contain any population at all, simply drag and drop the populations directly from the workspace into the box. Since LDA maximizes the variance between populations, you will need at least two populations to start.
The options SeqGeq prompts you to adjust are:
- Accuracy of the calculation. A higher value will increase the calculation time but might give better results.
- The initialization, which can be deterministic (default) meaning the result will always look the same or random.
Once you’ve selected the parameters and the population for your LDA and adjusted the options, simply press ‘Run’ at the bottom right corner to start the calculation.
LDA output and exploration
Once the calculation completes, a table of discriminants and the variance ratio they result in is created. This can be used to control how many parameters are created and appended to the workspace when ‘OK’ is clicked. The total maximum number of discriminants available will depend on how many populations were selected and will be equal to n-1 total populations.
Depending on the computer, the number of cells, and the number of dimensions selected the calculation can take several minutes to complete. After completion a new graph window will automatically be displayed showing you the newly reduced data space.
You can explore the data directly in the graph window and visualize all the artificial dimensions. To select different analytical parameters, open the x- or y-wing, highlight ‘Analytical Parameters’ in the upper panel and the dimension in the lower one.
However, to identify the different populations in the LDA space it’s easier to create an overlay in the layout editor. In this example, B and T-cells were separated using LDA and overlaid in the layout editor.
If you have further questions, or would simply like to discuss your LDA results please contact firstname.lastname@example.org.