Taylor Index is a metric primarily for evaluating the goodness of clustering algorithms in n-dimensional space.

The Taylor Index is a metric used in the Euclid plugin for evaluating the goodness of a clustering result based on the relative separation of the individual clusters versus their compactness. It is the ration of the distances between cells in a cluster to the distance between clusters.

The Taylor Score is a weighted summation of all Taylor Indices associated with a full set of clustering results.

## Background

The Taylor Index provides information concerning the similarity or dissimilarity of two clusters based on their phenotypic expression across a n-dimensional space. The algorithm calculates the similarity of the cells within a cluster using robust standard deviation, the cluster position by n-dimensional mean, and the distance of paired clusters in the n-dimensional space using the distance metrics Euclidean distance. The final score, calculated on a per cluster pair basis is the ratio of intra-cluster distance to inter-cluster distance. The Euclid plugin illustrates the results on a heatmap.

## Taylor Index Calculations

• Intra-cluster calculations: The Taylor Index first calculates the compactness of each cluster, the intra-cluster distance, by robust standard deviation. The smaller the distance between the cells within a cluster, the more tightly packed they are, the more likely they are to be phenotypically alike, as the numbers used to calculate ‘distance’ are the intensity values of the selected parameters.
• Inter-cluster calculations: The inter-cluster distances are also calculated by Euclidean distance in n-dimensional space. The equation for this is:
• Taylor Index calculation: The ratio between the Euclidean distance (inter-cluster distance) to the sum of the robust standard deviation (intra-cluster distance) of the clusters across the n-dimensional space, represents the Taylor Index. The calculation of the Taylor Index is performed for all pairs of clusters in the given dataset and selected clustering method.
• Heatmap creation: Taylor Index values are then visualized on a heatmap. Smaller values indicate a poor clustering outcome; either small distances between two clusters and/or large spread within at least one cluster. Larger values indicate better clustering; either well separated clusters and /or tight, compact clusters. The color range in the heatmap is darker colors for smaller, worse separation between a pair of clusters to hotter colors culminating in yellow for well separated, compact clusters.    In the example below, each row and column represent a cluster number. The intersection is the Taylor Index for those two clusters. In this example clusters 1 and 3 are the best resolved from each other, and cluster 2 resolves poorly from all other clusters.

## Taylor Score Calculations

The Taylor Score is the weighted summation of all Taylor indices for a clustering outcome.

• Weighting: Clusters with more cells are less likely to be a set of outliers, and simpler cluster definitions are a good indicator that over-clustering has been avoided. Hence, we calculate the weights on Taylor indices as: