Multi-Group Permutation Test

This Multi-Group Permutation Test tool extends the standard two-sample permutation test tool to compare two or more groups simultaneously, using the F-ratio as its test statistic. Like the two-sample version, it relies on computation rather than assumptions, making it appropriate for small samples, non-normal data, and ordinal or count responses.The F-ratio is a simple value that summarises how spread apart the group averages are, relative to the level of variability within each group (it is essentially a “signal to noise” ratio). If your datasets are all very similar to each other, then F will be small, whereas if your datasets are very different from each other, then F will be large.

Because this version of the permutation tool accommodates multiple data sets, it uses an omnibus test, which first asks whether any of the groups differ. If they do, then pairwise comparisons are run automatically (i.e., every dataset is compared to every other), with Bonferroni correction applied to account for the fact that we are undertaking multiple tests.

A detailed description of permutation testing (written in the context of teaching statistics) is given by Cobb (2007). A shorter descrition is given by Allen Downey in his well known blog articles there is only one test (2011) and there is still only one test (2016). A nice visual illustration of how it works is given is given by Jared Weiber (2019).

Hypotheses

Null hypothesis (always the same)
There is no difference between the values across any of the groups (i.e. they all come from the same distribution).

Alternative hypothesis (always the same)
At least one group has a different distribution of values from the others.

Data

Paste your data for each group from Excel into the boxes below with one value per row. Edit the name at the top of each box to label your groups. Use + Add Dataset to add more groups (up to 6); click × on a group to remove it.

Permutations:

Load demo data

Results

Note that p values may change slightly between multiple runs on the same dataset. Don’t worry, this is expected and is not an error — it reflects the random permutations sampled in each run. This does not affect the rigour of the method, and you can safely report a single p value.

...

Interpretation

For a relatively straightforward discussion on the interpretation of p-values (including a list of common misinterpretations), see Dahiru (2008). For a more detailed exploration of the challenges associated with the misinterpretation of statistical testing, see Ioannidis (2005). For a discussion on the importance of testing for effect size (not just significance), see Sullivan and Feinn (2012).