Kruskal–Wallis

The Kruskal–Wallis test by ranks, Kruskal–Wallis H test or One-way ANOVA on ranks is a non-parametric method for testing whether samples originate from the same distribution¹, especially when the ANOVA normality assumptions may not apply. Although this test is for identical populations, it is designed to be sensitive to unequal means. … see more

Let $ n_{i} (i = 1, 2, …, k) $ represent the sample sizes for each of the k groups (i.e., samples) in the data. Next, rank the combined sample. Then compute R_i = the sum of the ranks for group i. Then the Kruskal Wallis test is:

$$ H = \left [ \frac{12}{N(N+1)} \cdot \Sigma \frac{T_{c}^{2}}{\eta _{c}}\right ]-3\cdot (N+1) $$

This statistic approximates a chi-square distribution with k-1 degrees of freedom if the null hypothesis of equal populations is true. Each of the n_i should be at least 5 for the approximation to be valid.

Finally, the p-value is approximated by $ Pr(\chi _{g-1}^{2}\geq H) $.

If some $ n_{i} $ values are small (less than 5) the probability distribution of H can be quite different from this chi-squared distribution. If a table of the chi-squared probability distribution is available, the critical value of chi-squared, $ \chi _{\alpha :g-1}^{2} $, can be found by entering the table at g−1 degrees of freedom and looking under the desired significance or alpha level.

If the statistic is not significant, then there is no evidence of stochastic dominance between the samples. However, if the test is significant then at least one sample stochastically dominates another sample. Therefore, a researcher might use sample contrasts between individual sample pairs, or post hoc tests using Dunn’s test, which (1) properly employs the same rankings as the Kruskal-Wallis test, and (2) properly employs the pooled variance implied by the null hypothesis of the Kruskal-Wallis test in order to determine which of the sample pairs are significantly different².

Example

The following example considers the Atmospheric CO2 Concentrations [ppm] in a given city. Three weather stations provide value for a semester and are compared among them. Then, the main station compares its values with a secondary device which stores data at a different resolution.

The comparison is made to find whether the stations provide similar information or have significant differences (i.e. H < p-value).

# setup libraries
library(ggplot2)
library(reshape2)
library(ggthemes)
library(stargazer)

The data from the 3 stations is loaded. The date, rather than useful to verify the source of information, is not relevant in the analysis.

StationsCity <- read.csv("myData/StationCity.csv", row.names=1)
dplyr::tbl_df(StationsCity)
StationsCity$Date <- NULL

	Date	Station-GBMcity	Station-GBMnorth	Station-GBMsouth
1	01/01/2013	377.04	375.71	338.48
2	02/01/2013	375.52	374.66	312.75
3	03/01/2013	380.69	367.69	327.79
4	04/01/2013	379.21	370.54	299.91
5	05/01/2013	377.12	372.72	307.83
6	06/01/2013	378.38	367.8	330.19
7	07/01/2013	379.44	355.17	316.36
8	08/01/2013	376.07	373.47	307.41
9	09/01/2013	377.81	371.22	331.21

StationsCity <- melt(StationsCity)
StationsCity$case <- seq(1:(dim(StationsCity)[1]/3))
colnames(StationsCity) <- c('station','value','case')

pStation <- ggplot(data=StationsCity, aes(x=case, y=value, colour=station)) +
  geom_point() + geom_line(alpha=0.25) +
    geom_hline(yintercept = median(StationsCity$value), linetype='dashed')
pStation + theme_wsj() + scale_colour_wsj("colors6", "")

pStationb <- ggplot(data=StationsCity, aes(value, colour=station)) +
  geom_density() +
    geom_vline(xintercept = median(StationsCity$value), linetype='dashed')
pStationb + theme_wsj() + scale_colour_wsj("colors6", "")

Kruskal Wallis Test A

testStations <- kruskal.test(value ~ station, data = StationsCity)
testStations

Kruskal-Wallis rank sum test
data:  value by variable
Kruskal-Wallis chi-squared = 359.69, df = 2, p-value < 2.2e-16

StationsCity <- read.csv("myData/StationCity.csv", row.names=1)
StationOld <- read.csv("myData/StationCityOld.csv", row.names=1)
dplyr::tbl_df(StationOld)

	value	station	week
1	366.86	Station.GBMcity-PG	07/01/2013
2	371.2	Station.GBMcity-PG	14/01/2013
3	375.63	Station.GBMcity-PG	21/01/2013
4	377.99	Station.GBMcity-PG	28/01/2013
5	373.34	Station.GBMcity-PG	04/02/2013
6	368.65	Station.GBMcity-PG	11/02/2013
7	360.67	Station.GBMcity-PG	18/02/2013
8	368.29	Station.GBMcity-PG	25/02/2013
9	369.09	Station.GBMcity-PG	04/03/2013

StationsCity <- StationsCity[,c('Date','Station.GBMcity')]
  StationsCity$station <- 'GBMcity'
    StationsCity$case <- rownames(StationsCity)
      StationsCity$Date <- NULL
        colnames(StationsCity) <- c('value','station','case')

StationOld$week <- NULL
  StationOld$case <- rownames(StationOld)
    StationOld$station <- 'GBMcity.B'

StationSame <- rbind(StationsCity,StationOld)

StationSame$value <- as.numeric(StationSame$value)
StationSame$station <- as.factor(StationSame$station)
StationSame$case <- as.integer(StationSame$case)

pStation <- ggplot(data=StationSame, aes(x=case, y=value, colour=station)) +
  geom_point() + geom_line(alpha=0.25) +
  geom_hline(yintercept = median(StationSame$value), linetype='dashed')
pStation + theme_wsj() + scale_colour_wsj("colors6", "")

pStationSec <- ggplot(data=StationSame, aes(value, colour=station)) +
  geom_density() +
  geom_vline(xintercept = median(StationSame$value), linetype='dashed')
pStationSec + theme_wsj() + scale_colour_wsj("colors6", "")

Kruskal Wallis Test A

testStations <- kruskal.test(value ~ station, data = StationSame)
testStations

Kruskal-Wallis rank sum test
  data:  value by variable
  Kruskal-Wallis chi-squared = 0.0017473, df = 1, p-value = 0.9667

To sum up…

Since the sample is larger than 10 elements, H may be used as p-value
H₁ is 2.2e-16
H₂ is 0.9667
Therefore, we may say that Test 2 maintains the relationship among the different stations so the hypothesis H₀ is accepted, whereas Test 1 is significantly different and so H₀ is rejected—despite it can be seen from the graphic that is caused by one of the stations which is significantly different.

Now, how the test is made?

StationsCity

maxValue <- max(StationsCity$value)
minValue <- min(StationsCity$value)
sizeVal <- dim(StationsCity)[1]

StationsCityRanked <- StationsCity
StationsCityRanked$value <- rank(StationsCity$value)

StationsCityRanked <- dcast(StationsCityRanked, case ~ station)
StationsCityRanked$case <- NULL

	station	value	case	Station.GBMcity	Station.GBMnorth	Station.GBMsouth
1	Station.GBMcity	377.04	1	485	465	158
2	Station.GBMcity	375.52	2	463	444	45
3	Station.GBMcity	380.69	3	525	312	107
4	Station.GBMcity	379.21	4	512	376	14
5	Station.GBMcity	377.12	5	486	409	32
6	Station.GBMcity	378.38	6	506	315	120
178	Station.GBMcity	375.82	178	467	541	161
179	Station.GBMcity	376.83	179	482	360	1
180	Station.GBMcity	374.03	180	431	253.5	40
181	Station.GBMcity	366.1	181	277	309	53
182	Station.GBMcity	373.9	182	427	499	207

sumRanks <- colSums(StationsCityRanked)
meanRanks <- colMeans(StationsCityRanked)

SumCombined <- sum(sumRanks)
MeanCombinded <- sum(meanRanks)

CountCombinded <- table(StationsCity$station)

H <- ( 12 / (dim(StationsCity)[1]*(dim(StationsCity)[1]+1))) *
  ( (sumRanks[1]^2)/CountCombinded[1] +
    (sumRanks[2]^2)/CountCombinded[2] +
    (sumRanks[3]^2)/CountCombinded[3] ) -
  3 * (dim(StationsCity)[1]+1)
H

df <- 3 -1

lim.p <- qchisq(0.05, df)
lim.p

H < lim.p

Similarly, with the other dataset:

StationSame

maxValue <- max(StationSame$value)
minValue <- min(StationSame$value)
sizeVal <- dim(StationSame)[1]

StationSameRanked <- StationSame
StationSameRanked$value <- rank(StationSame$value)

StationSameRanked <- dcast(StationSameRanked, case ~ station)
StationSameRanked$case <- NULL

sumRanks <- colSums(StationSameRanked, na.rm = T)
meanRanks <- colMeans(StationSameRanked, na.rm = T)

SumCombined <- sum(sumRanks)
MeanCombinded <- sum(meanRanks)

CountCombinded <- table(StationSame$station)

H <- ( 12 / (dim(StationSame)[1]*(dim(StationSame)[1]+1))) *
  ( (sumRanks[1]^2)/CountCombinded[1] +
      (sumRanks[2]^2)/CountCombinded[2] ) -
  3 * (dim(StationSame)[1]+1)
H

df <- 2 -1

lim.p <- qchisq(0.05, df)
lim.p

H < lim.p

Finally, the following figure plots density values for the chi-square distribution with different degrees of freedom. In the examples, H₁ is significantly less than p with df=3-1, and H₂, on the other hand, is more than p with df=2-1 and so H₀ is accepted.

dfDist <- c(1, 2, 5, 10)
  labels <- paste('df =',dfDist)
  colpal <- c('tomato','green','cyan','gold')
  j = 1
plot(x=-1, xlim=c(0,5), ylim=c(0,1), xlab='chi-sq', ylab='density')
for(i in dfDist){
  curve(dchisq(x,i), xlim=c(0,10), col=colpal[j], lwd=1, add=TRUE)
  j <- j + 1
}
legend("topright", bty='n', text.width=1, labels, lwd=2.5,
       lty=c(1, 1, 1, 1), col=colpal)
abline(h=0,lty=3)
abline(v=0,lty=3)

Kruskal Wallis Test E

Kruskal; Wallis (1952). “Use of ranks in one-criterion variance analysis”. Journal of the American Statistical Association. 47 (260): 583–621 ^[return]
Dunn, Olive Jean (1964). “Multiple comparisons using rank sums”. Technometrics. 6 (3): 241–252 ^[return]