This page contains extra R content not covered in the demonstrations and could be considered supplementary to the module. This content is useful for completing the advanced exercises from Week 6 and focuses on conducting chi-square tests in R. This includes the chi-square goodness of fit test, and the chi-square test of independence.

Contingency Tables

To conduct chi-square tests in R, we use the chisq.test() function. Unlike most of the statistical analysis functions we’ve looked at so far, this function does not accept a formula and a dataset. Instead, the main argument that this function expects is a contingency table.

To create a contingency table in R, we can use the table() function. We have briefly talked about the table() function (to view the frequencies for a single variable)[https://antlee53.github.io/stirpsychstats/2data.html#table()_Function], but as a quick recap, if you wanted to view the frequencies for favourite Australian animals in the class dataset, the code would be:

table(data$aus.animal)
## 
##  echidna kangaroo    koala platypus   wombat 
##       28       24       27       21       20

However, the table() function can also be used to create a contingency table. A two-variable contingency table will be created from a data.frame that only has two variables in it. Therefore, you can use the select() function (covered in Week 3) to create a data.frame that only includes the two variables that you’re interested in.

So for example, if we wanted to create a contingency table of videogamers vs. non-videogamers across the three programmes in the class dataset, the code would look like this:

select(data,video.games,program) %>%
  table()

Alternatively, this could be done by inputting two vectors of variables from the same dataset as arguments. Here is that code:

table(data$video.games,data$program)
##      
##       Conversion MSc Health MSc Other Research MSc
##   No              19         31                 21
##   Yes             15         16                 18

Chi-square Goodness of Fit Test

As covered in the lecture series, the chi-square goodness of fit test is used to compare the observed distribution of a single categorical variable with an expected distribution.

The function that performs a chi-square goodness of fit test is the chisq.test() function. There are two inputs we require. First, is a numeric vector with the observed frequencies. Second, is the probability of the expected frequencies (argument named p).

For instance, if we conducted a study that counted the frequency of 100 people’s favourite colour, and observed 20 people reported “red”, 35 people reported “green”, and 45 people reported “blue”, then the first argument would be:

c(20,35,45)
## [1] 20 35 45

Note: if the variable we are interested in is a variable in a dataset, then, as described above, we can use the table() function to get the frequencies.

If we expect an equal distribution among the three colours, our expected probabilities would be represented as:

c(1/3,1/3,1/3)
## [1] 0.3333333 0.3333333 0.3333333

Altogether, to conduct the chi-square goodness of fit test, we input these vectors into the chisq.test() function:

chisq.test(c(20,35,45),p = c(1/3,1/3,1/3))
## 
##  Chi-squared test for given probabilities
## 
## data:  c(20, 35, 45)
## X-squared = 9.5, df = 2, p-value = 0.008652

1. Conduct the Statistical Test

Following the example from the lecture series, let’s conduct a chi-square goodness of fit test for favourite Australian animals using the class dataset. We want to compare the frequencies in the class dataset with the expected proportions form a national UK poll to see if the class distribution is similar to national rates, or if there’s something different about this cohort.

To conduct this analysis, we enter the frequencies from the dataset as the first argument, and a vector with the expected probabilities as the second argument.

chisq.test(table(data$aus.animal),p = c(.0171,.2222,.4615,.1624,.1368))
## 
##  Chi-squared test for given probabilities
## 
## data:  table(data$aus.animal)
## X-squared = 343.83, df = 4, p-value < 2.2e-16

2. Write-up the Results

To report a chi-square test, you need the following information: * The chi-square statistic (the test statistic). * The degrees of freedom. * The p-value.

Once you have this information, the write-up becomes:

A chi-square goodness of fit test found a significant difference between the class distribution of favourite Australian animals and the expected values based on national rates, chi-square(4) = 343.83, p < .001.

Chi-square Test of Independence

The chi-square test of independence is used to determine if the distribution of frequencies of a categorical DV are different at different levels of an IV.

The chi-square test of independence uses the same function as the chi-square goodness of fit test, but the inputs are different. The function is smart enough to know which test to conduct given which inputs it receives.

1. Conduct the Statistical Test

If you input a contingency table that has 2 variables, then the function knows to conduct a chi-square test of independence. As described above, contingency tables can be created using the table() function.

As such, if we were to test whether the proportion of video gamers was different across the three programmes in the class dataset, the code looks like this:

c.table <- select(data,video.games,program) %>%
  table()

chisq.test(c.table)
## 
##  Pearson's Chi-squared test
## 
## data:  c.table
## X-squared = 1.5059, df = 2, p-value = 0.471

2. Write-up the Results

To write-up a chi-square test of independence, you need the same information as above, being the test statistic, the associated degrees of freedom, and the p-value. Altogether, the write-up then can look something like this:

A chi-square test of independence did not find a significant difference in videogamers across the three programmes, chi-square(2) = 1.51, p = 0.471.

Advanced Exercises

If you would like to practice the skills on this page, weekly exercise questions on this content are available in the advanced exercises for Week 6. You can download the interactive exercises by clicking the link below.

Click here to download this week’s exercises.