Principal component analysis (PCA) is a dimensionality reduction methods which creates a reduced dimensionality projection to provides the best view of the differences in the data.

PCA is widely used to visualize high dimensionality data (aka data with many parameters). In SeqGeq you can directly explore the data after projecting into on a biaxial plot. This helps to organize the data, and allows researchers to identify different cell clusters which are projected at different places by the PCA due to the differences in their gene expression.

As PCA is generally much faster to run than t-SNE on a high dimensional data set, it’s an excellent way to start exploring and clustering the different cell populations without waiting a long time for the calculation to complete. PCA is also useful prior to t-SNE because it condenses the sparse data-matrices typical in single cell sequencing experiments to parameters more amenable to t-SNE (data compression).

 

How to start PCA in SeqGeq

To start a PCA select a population in the workspace and click on the dimensionality reduction icon located in the discovery band.

 

In the new window SeqGeq prompts you to choose the method you’d like to use. Select PCA in the list and, choose the name of your run. Next select the genes you’d like to take into account for the dimensionality reduction. Check the ‘All Genes’ box if you’d like to use them all or click on ‘Select Genes’ if you wish to make a custom selection.

 

A selector window will appear letting you choose individual genes, gene sets, or parameters. In the absence of specific knowledge about the genes you’re interested in, it’s generally useful to click ‘Add All >>’. If necessary you can remove only certain genes from the right panel after selecting them using the ‘Remove Selected x’ button. If this isn’t the first cycle and you already have a better idea or special needs, you can base your calculation on certain genes only using one or more gene sets.

 

Lastly, SeqGeq lets you change two variable for a PCA:

  • The number of principal component (PC) you want to calculate. Increasing this number will possibly improve the result in retaining more discriminatory information but might increase the calculation time. For most analyses, ~10 or 12 PCs are sufficient.
  • The initialization, which can be deterministic (default) meaning the result will always look the same (i.e., the same populations will be at the same position for the same sample) or random. In this case each reduced dimensionality projection will be unique and will not look the same.

Once you’ve selected the genes or parameters you’d like to take into account and the options of your PCA it’s normally time to start the run. Simply press ‘Run’ at the bottom right corner.

 

PCA output and exploration

Depending on the computer, the number of cells, and the number of dimension the calculation can take several minutes. When finished, you’ll be asked to review the run and to select the number of dimension you’d like to keep (this can be less than the total number of PC you calculated).

 

Starting from the first PC, the window displays the % of variance explained by this component and the cumulative variance (the sum of variances in previous PCs). For example the first PC in the picture above explains 74% of the variability within the data. The sixth PC explains only 2% of the variance, but the cumulative variance for the six first PCs already explains 94%. If you go up to the last calculated PC (here we calculated 12 total) you’d be able to capture 98% of the variability, slightly improving the result you’d get using six PCs only but potentially at the cost of subsequent performance.

Note it is possible to export the Eigen vectors as a .csv file. Simply hit the ‘Export Eigen Vectors’ button before you hit ‘OK’ and select the destination in the newly opened window.

After you’ve selected the PC you’re interested in, a new graph window will automatically be displayed showing you the newly dimension-reduced data space, for the first two PCs. You can explore the data directly in the graphical window and visualize all the artificial dimensions. To select different analytical parameters, open the x- or y-wings, highlight ‘Analytical Parameters’ in the upper panel and the dimension in the lower section.

 

If you’re still having trouble with principle components, we’re happy to help. Consider sending a quick email to seqgeq@flowjo.com, explaining these obstacles in detail.