The Kruskal–Wallis test by ranks, Kruskal–Wallis H test or One-way ANOVA on ranks is a non-parametric method for testing whether samples originate from the same distribution1, especially when the ANOVA normality assumptions may not apply. Although this test is for identical populations, it is designed to be sensitive to unequal means. … see more

Let $ n_{i} (i = 1, 2, …, k) $ represent the sample sizes for each of the k groups (i.e., samples) in the data. Next, rank the combined sample. Then compute Ri = the sum of the ranks for group i. Then the Kruskal Wallis test is:

$$ H = \left [ \frac{12}{N(N+1)} \cdot \Sigma \frac{T_{c}^{2}}{\eta _{c}}\right ]-3\cdot (N+1) $$

This statistic approximates a chi-square distribution with k-1 degrees of freedom if the null hypothesis of equal populations is true. Each of the ni should be at least 5 for the approximation to be valid.

Finally, the p-value is approximated by $ Pr(\chi _{g-1}^{2}\geq H) $.

If some $ n_{i} $ values are small (less than 5) the probability distribution of H can be quite different from this chi-squared distribution. If a table of the chi-squared probability distribution is available, the critical value of chi-squared, $ \chi _{\alpha :g-1}^{2} $, can be found by entering the table at g−1 degrees of freedom and looking under the desired significance or alpha level.

If the statistic is not significant, then there is no evidence of stochastic dominance between the samples. However, if the test is significant then at least one sample stochastically dominates another sample. Therefore, a researcher might use sample contrasts between individual sample pairs, or post hoc tests using Dunn’s test, which (1) properly employs the same rankings as the Kruskal-Wallis test, and (2) properly employs the pooled variance implied by the null hypothesis of the Kruskal-Wallis test in order to determine which of the sample pairs are significantly different2.


Example

The following example considers the Atmospheric CO2 Concentrations [ppm] in a given city. Three weather stations provide value for a semester and are compared among them. Then, the main station compares its values with a secondary device which stores data at a different resolution.

The comparison is made to find whether the stations provide similar information or have significant differences (i.e. H < p-value).

# setup libraries
library(ggplot2)
library(reshape2)
library(ggthemes)
library(stargazer)

The data from the 3 stations is loaded. The date, rather than useful to verify the source of information, is not relevant in the analysis.

StationsCity <- read.csv("myData/StationCity.csv", row.names=1)
dplyr::tbl_df(StationsCity)
StationsCity$Date <- NULL
Date Station-GBMcity Station-GBMnorth Station-GBMsouth
1 01/01/2013 377.04 375.71 338.48
2 02/01/2013 375.52 374.66 312.75
3 03/01/2013 380.69 367.69 327.79
4 04/01/2013 379.21 370.54 299.91
5 05/01/2013 377.12 372.72 307.83
6 06/01/2013 378.38 367.8 330.19
7 07/01/2013 379.44 355.17 316.36
8 08/01/2013 376.07 373.47 307.41
9 09/01/2013 377.81 371.22 331.21


StationsCity <- melt(StationsCity)
StationsCity$case <- seq(1:(dim(StationsCity)[1]/3))
colnames(StationsCity) <- c('station','value','case')

pStation <- ggplot(data=StationsCity, aes(x=case, y=value, colour=station)) +
  geom_point() + geom_line(alpha=0.25) +
    geom_hline(yintercept = median(StationsCity$value), linetype='dashed')
pStation + theme_wsj() + scale_colour_wsj("colors6", "")

pStationb <- ggplot(data=StationsCity, aes(value, colour=station)) +
  geom_density() +
    geom_vline(xintercept = median(StationsCity$value), linetype='dashed')
pStationb + theme_wsj() + scale_colour_wsj("colors6", "")

Kruskal Wallis Test A

Kruskal Wallis Test A

testStations <- kruskal.test(value ~ station, data = StationsCity)
testStations
Kruskal-Wallis rank sum test
data:  value by variable
Kruskal-Wallis chi-squared = 359.69, df = 2, p-value < 2.2e-16
StationsCity <- read.csv("myData/StationCity.csv", row.names=1)
StationOld <- read.csv("myData/StationCityOld.csv", row.names=1)
dplyr::tbl_df(StationOld)
value station week
1 366.86 Station.GBMcity-PG 07/01/2013
2 371.2 Station.GBMcity-PG 14/01/2013
3 375.63 Station.GBMcity-PG 21/01/2013
4 377.99 Station.GBMcity-PG 28/01/2013
5 373.34 Station.GBMcity-PG 04/02/2013
6 368.65 Station.GBMcity-PG 11/02/2013
7 360.67 Station.GBMcity-PG 18/02/2013
8 368.29 Station.GBMcity-PG 25/02/2013
9 369.09 Station.GBMcity-PG 04/03/2013


StationsCity <- StationsCity[,c('Date','Station.GBMcity')]
  StationsCity$station <- 'GBMcity'
    StationsCity$case <- rownames(StationsCity)
      StationsCity$Date <- NULL
        colnames(StationsCity) <- c('value','station','case')

StationOld$week <- NULL
  StationOld$case <- rownames(StationOld)
    StationOld$station <- 'GBMcity.B'

StationSame <- rbind(StationsCity,StationOld)

StationSame$value <- as.numeric(StationSame$value)
StationSame$station <- as.factor(StationSame$station)
StationSame$case <- as.integer(StationSame$case)

pStation <- ggplot(data=StationSame, aes(x=case, y=value, colour=station)) +
  geom_point() + geom_line(alpha=0.25) +
  geom_hline(yintercept = median(StationSame$value), linetype='dashed')
pStation + theme_wsj() + scale_colour_wsj("colors6", "")

pStationSec <- ggplot(data=StationSame, aes(value, colour=station)) +
  geom_density() +
  geom_vline(xintercept = median(StationSame$value), linetype='dashed')
pStationSec + theme_wsj() + scale_colour_wsj("colors6", "")

Kruskal Wallis Test A

Kruskal Wallis Test A

testStations <- kruskal.test(value ~ station, data = StationSame)
testStations
Kruskal-Wallis rank sum test
  data:  value by variable
  Kruskal-Wallis chi-squared = 0.0017473, df = 1, p-value = 0.9667

To sum up…

  • Since the sample is larger than 10 elements, H may be used as p-value
  • H1 is 2.2e-16
  • H2 is 0.9667
  • Therefore, we may say that Test 2 maintains the relationship among the different stations so the hypothesis H0 is accepted, whereas Test 1 is significantly different and so H0 is rejected—despite it can be seen from the graphic that is caused by one of the stations which is significantly different.



Now, how the test is made?


StationsCity

maxValue <- max(StationsCity$value)
minValue <- min(StationsCity$value)
sizeVal <- dim(StationsCity)[1]

StationsCityRanked <- StationsCity
StationsCityRanked$value <- rank(StationsCity$value)

StationsCityRanked <- dcast(StationsCityRanked, case ~ station)
StationsCityRanked$case <- NULL
station value case Station.GBMcity Station.GBMnorth Station.GBMsouth
1 Station.GBMcity 377.04 1 485 465 158
2 Station.GBMcity 375.52 2 463 444 45
3 Station.GBMcity 380.69 3 525 312 107
4 Station.GBMcity 379.21 4 512 376 14
5 Station.GBMcity 377.12 5 486 409 32
6 Station.GBMcity 378.38 6 506 315 120
178 Station.GBMcity 375.82 178 467 541 161
179 Station.GBMcity 376.83 179 482 360 1
180 Station.GBMcity 374.03 180 431 253.5 40
181 Station.GBMcity 366.1 181 277 309 53
182 Station.GBMcity 373.9 182 427 499 207


sumRanks <- colSums(StationsCityRanked)
meanRanks <- colMeans(StationsCityRanked)

SumCombined <- sum(sumRanks)
MeanCombinded <- sum(meanRanks)

CountCombinded <- table(StationsCity$station)

H <- ( 12 / (dim(StationsCity)[1]*(dim(StationsCity)[1]+1))) *
  ( (sumRanks[1]^2)/CountCombinded[1] +
    (sumRanks[2]^2)/CountCombinded[2] +
    (sumRanks[3]^2)/CountCombinded[3] ) -
  3 * (dim(StationsCity)[1]+1)
H

df <- 3 -1

lim.p <- qchisq(0.05, df)
lim.p

H < lim.p

Similarly, with the other dataset:

StationSame

maxValue <- max(StationSame$value)
minValue <- min(StationSame$value)
sizeVal <- dim(StationSame)[1]

StationSameRanked <- StationSame
StationSameRanked$value <- rank(StationSame$value)

StationSameRanked <- dcast(StationSameRanked, case ~ station)
StationSameRanked$case <- NULL

sumRanks <- colSums(StationSameRanked, na.rm = T)
meanRanks <- colMeans(StationSameRanked, na.rm = T)

SumCombined <- sum(sumRanks)
MeanCombinded <- sum(meanRanks)

CountCombinded <- table(StationSame$station)

H <- ( 12 / (dim(StationSame)[1]*(dim(StationSame)[1]+1))) *
  ( (sumRanks[1]^2)/CountCombinded[1] +
      (sumRanks[2]^2)/CountCombinded[2] ) -
  3 * (dim(StationSame)[1]+1)
H

df <- 2 -1

lim.p <- qchisq(0.05, df)
lim.p

H < lim.p

Finally, the following figure plots density values for the chi-square distribution with different degrees of freedom. In the examples, H1 is significantly less than p with df=3-1, and H2, on the other hand, is more than p with df=2-1 and so H0 is accepted.

dfDist <- c(1, 2, 5, 10)
  labels <- paste('df =',dfDist)
  colpal <- c('tomato','green','cyan','gold')
  j = 1
plot(x=-1, xlim=c(0,5), ylim=c(0,1), xlab='chi-sq', ylab='density')
for(i in dfDist){
  curve(dchisq(x,i), xlim=c(0,10), col=colpal[j], lwd=1, add=TRUE)
  j <- j + 1
}
legend("topright", bty='n', text.width=1, labels, lwd=2.5,
       lty=c(1, 1, 1, 1), col=colpal)
abline(h=0,lty=3)
abline(v=0,lty=3)

Kruskal Wallis Test E





  1. Kruskal; Wallis (1952). “Use of ranks in one-criterion variance analysis”. Journal of the American Statistical Association. 47 (260): 583–621 [return]
  2. Dunn, Olive Jean (1964). “Multiple comparisons using rank sums”. Technometrics. 6 (3): 241–252 [return]