This page contains extra R content not covered in the demonstrations and could be considered supplementary to the module. This content is useful for completing the advanced exercises from Week 5 and focuses on conducting non-parametric tests in R. As covered in the lecture series, non-parametric tests are distribution-free tests. They are useful if your data does not meet the assumptions of the parametric test.
The Mann-Whitney U Test is a non-parametric test for when you have a
categorical IV (with two levels) and a continuous DV. The equivalent
parametric test is an independent-samples t-test. The function that runs
a Mann-Whitney U Test is the wilcox.test()
function. It’s a
bit confusing that the Mann-Whitney U Test uses a function with a
different name, but the reason for this is because the Mann-Whitney U
Test is also called the Wilcoxon Rank Sum Test (though I avoid using
this name as it is easily confused with the Wilcoxon Signed Rank
Test).
If we wanted to assess whether cat-people are more introverted than dog-people in the class dataset using a Mann-Whitney U Test, the code would look like this:
This code is identical to that when we prepared the data for an independent-samples t-test.
data1.clean <- data %>%
filter(cat.dog != "both") %>%
filter(cat.dog != "neither") %>%
filter(cat.dog != "") %>%
mutate( introvert = introversion2 + introversion5 + introversion7 + introversion8 + introversion10) %>%
select(cat.dog,introvert)
To conduct a Mann-Whitney U Test, you need to specify the two things needed for all analysis functions: the formula, and the data.frame.
wilcox.test(introvert~cat.dog,data = data1.clean)
##
## Wilcoxon rank sum test with continuity correction
##
## data: introvert by cat.dog
## W = 242, p-value = 0.8645
## alternative hypothesis: true location shift is not equal to 0
A boxplot is the perfect visualisation for non-parametric tests, as it visualises the median and the interquartile range. We can combine the boxplot geom and with the violin geom to also give a visualisation of the distribution of the data.
ggplot(data1.clean,aes(x = cat.dog,y = introvert,fill = cat.dog)) +
geom_violin() +
geom_boxplot(width = .2) +
theme_classic() +
theme(legend.position = "none")
To report a Mann-Whitney U test, you will need to include the following information:
The U-statistic is the ‘W’ from the output of the
wilcox.test()
function. This function also gives you the
p-value. To get the medians, you can use the summarise()
and median()
functions:
data1.summary <- data1.clean %>%
group_by(cat.dog) %>%
summarise(introvert_median = median(introvert,na.rm = TRUE))
data1.summary
## # A tibble: 2 × 2
## cat.dog introvert_median
## <chr> <dbl>
## 1 cat 14
## 2 dog 14
Therefore, the write-up becomes:
A Mann-Whitney U test indicated that there was a non-significant differences in introversion between cat-people (Mdn = 14) and dog-people (Mdn = 14), U = 242, p = 0.864.
The Wilcoxon Signed Rank Test is a non-parametric test for when you have a within-subjects categorical IV (with two levels) and a continuous DV. The equivalent parametric test is an paired-samples t-test.
Let’s re-evaluate the analysis investigating participant’s mood before and after watching a cute cat video in the class dataset, but this time use the non-parametric test.
Again, this code is identical for when we prepared the data for the paired-samples t-test.
data2.clean <- data %>%
filter(!is.na(pre.mood)) %>%
filter(!is.na(post.mood)) %>%
select(student.no,pre.mood,post.mood) %>%
gather(key = "condition",value = "mood",pre.mood,post.mood) %>%
mutate(condition = factor(condition,levels = c("pre.mood","post.mood")))
Similar to how an independent-samples t-test and a paired-samples
t-test use the same function, the Mann-Whitney U Test and the Wilcoxon
Signed Rank Test also use the same function: wilcox.test()
.
The difference here is the paired
argument must be set to
TRUE
- the same as if you were conducting a paired-samples
t-test with the t.test()
function.
Therefore, the code becomes:
wilcox.test(Pair(pre.mood,post.mood) ~ 1,data)
##
## Wilcoxon signed rank test with continuity correction
##
## data: Pair(pre.mood, post.mood)
## V = 399, p-value = 1.685e-07
## alternative hypothesis: true location shift is not equal to 0
Again, we can use a boxplot and violin plot to visualise the difference before and after exposure:
ggplot(data2.clean,aes(x = condition,y = mood,fill = condition)) +
geom_violin() +
geom_boxplot(width = .2) +
theme_classic() +
theme(legend.position = "none")
Similar to the Mann-Whitney U test, we require information of the median for each condition. This code is exactly the same as above:
data2.summary <- data2.clean %>%
group_by(condition) %>%
summarise(introvert_median = median(mood,na.rm = TRUE))
data2.summary
## # A tibble: 2 × 2
## condition introvert_median
## <fct> <dbl>
## 1 pre.mood 68.5
## 2 post.mood 74
We also need the test statistic (W) and the associated p-value. Once we have all this information, the final write-up looks like this:
A Wilcoxon signed rank test indicated that there was a significant difference on mood before (Mdn = 68.5) and after (Mdn = 74) watching a short cat video, W = 399, p < .001.
The Kruskal-Wallis test is a non-parametric alternative to the one-way ANOVA, meaning it is used when you have a categorical IV with more than 2 groups, and a continuous DV.
As before, we will demonstrate the Kruskal-Wallis test using the class dataset by revisiting the question of whether cat-people, dog-people, those who like both, and those who like neither differ on introversion. First, we must prepare the data - this code is identical to that for the one-way ANOVA.
data3.clean <- data %>%
# filter(cat.dog != "both") %>%
# filter(cat.dog != "neither") %>%
filter(cat.dog != "") %>%
mutate(introvert = introversion2 + introversion5 + introversion7 + introversion8 + introversion10) %>%
select(cat.dog,introvert)
The function to conduct the Kruskal-Wallis test is
kruskal.test()
. Again, with all statistical test functions,
you need to supply the formula and the data.frame you wish to
analyse.
kruskal.test(introvert ~ cat.dog,data = data3.clean)
Here we can visualise the data using a boxplot and violin plot.
ggplot(data3.clean,aes(x = cat.dog,y = introvert,fill = cat.dog)) +
geom_violin() +
geom_boxplot(width = .2) +
theme_classic() +
theme(legend.position = "none")
To report a Kruskal-Wallis test, you need the following bits of information:
The first two points are included in the output of the
kruskal.test()
function, and we can use the
summarise()
and median()
functions to get the
third point:
data3.summary <- data3.clean %>%
group_by(cat.dog) %>%
summarise(introvert_median = median(introvert,na.rm = TRUE))
data3.summary
## # A tibble: 4 × 2
## cat.dog introvert_median
## <chr> <dbl>
## 1 both 12.5
## 2 cat 14
## 3 dog 14
## 4 neither 13
With this information, you can write-up your results. Remember, much like with the one-way ANOVA, this Kruskal-Wallis test will tell you whether the groups are significantly different, but it does not tell you where those differences are. In order to determine this, you will need to conduct post-hoc comparisons.
A Kruskal-Wallis test indicated that there was a non-significant differences in introversion between cat-people (Mdn = 14), dog-people (Mdn = 14), those who like both (Mdn = 13.5), and those who like neither (Mdn =12), Chi-square(3) = 0.44, p = 0.931.
If you would like to practice the skills on this page, weekly exercise questions on this content are available in the advanced exercises for Week 5. You can download the interactive exercises by clicking the link below.