Further explanation of the metric used to assess differences in samples that have been binned using Probability Binning.

T(X) is a statistic which provides an indication of the probability with which two distributions are different and also provides a metric by which multiple distributions can be ranked. The higher the value of T(X), the less like the control sample the test sample is.  T(X) does not depend on the shape of the distribution (i.e., it is nonparametric with regard to the distribution of events).

When T(X) = 0, the two histograms are indistinguishable (p = 0.5) and when T(X) = 1, the populations differ by one standard deviation, giving the probability that the two populations differ p < 0.17. A value T(X) > 4 implies that the two distributions are different with a p < 0.01 (99% confidence). However, the minimum value of T(X) that has biological significance depends on the nature of the data being analyzed and therefore needs to be determined empirically. Only populations which have T(X) values larger than this empirical minimum can be considered to be different.

What is max T(X) versus T(X)?  A T(X) = 4 is the same significance whether the maximum value is 50 or 50,000, as it solely indicates that you are four standard deviations away from expected. The reason the max T(X) can vary is that with more events (or more bins) you can achieve greater precision, and thus, greater significance. For example, if you do a Wilcoxon-rank test on two groups of 3 events, the most significant p value you can achieve is about 0.03 (when there is no overlap between the two groups). With 4 events, the most significant p value you can achieve is about 0.004. A p value of 0.05 is the same significance whether the most (or highest) significance you can achieve is 0.03, 0.004, or 10^-44. Similarly with T(X), a T(X) = 4 is the same, whereas the max T(X) tells you what the maximum significance could have been.

 

Establishing the Baseline T(X):

Several populations can be compared in order to determine the minimum T(X) value because machine stability during the collection, as well as inherent variability in the FACS data, are just two reasons why the comparison of a population to itself can give a T(X) > 0.

Minimally, for a baseline value, you can compare a population to itself by opening the Population Comparison platform on a sample and dragging the same sample to the control box. FlowJo compares the two halves of this population (one half made up of every other cell while the other half is made up of the cells in between). Better yet, you can also compare the same sample collected twice (collection at the beginning and end of the sample acquisition best determines the machine stability). Ideally, you should compare several different samples that have been treated with the same stimulation (controls n=2 or 3).

The minimum number of positive events for a well-separated population that results in a statistically significant* T(X) value (>2) is about 100. This minimum does not depend on the number of negative events and depends only slightly on the number of bins. The less the separation between the positive and negative events, the greater the percent positive must be in order for the algorithm to detect the presence of the positive events.

* T(X) will very readily label two samples as MATHEMATICALLY different. It’s up to the user to determine whether this difference is BIOLOGICALLY meaningful. Hence, the baseline T(X) must be established with the controls first before assigning meaning to an unknown.