This page contains extra R content not covered in the demonstrations and could be considered supplementary to the module. This content is useful for completing the advanced exercises from Week 3 and focuses on additional tidyverse functions. Note that this still is not a comprehensive list of tidyverse functions. For a full list of tidyverse functions, check out the tidyverse cheatsheets.

arrange()

The arrange() function reorders the rows in your data.frame according to a columns in the data.frame. Like all tidyverse functions, the first argument is the data.frame you wish to manipulate. The second argument is the name of variable you wish to sort by.

arranged.data <- arrange(data,variable_name)

If the variable is numeric, the data.frame will be sorted into ascending numerical order according to that variable. If the variable is a character, the data.frame will be sorted into alphabetical order according to that variable. If you want to reverse the order, use the desc() function within arrange():

arranged.data <- arrange(data,desc(variable_name))

rename()

Use the rename() function to rename the variables in your data.frame. In the code below, we rename three variables in a data.frame:

renamed.data <- rename(data,
                       new.variable.name1 = old.variable.name1,
                       new.variable.name2 = old.variable.name2,
                       new.variable.name3 = old.variable.name3)

A few things to note. First, to ease readability, we have spaced out this function across multiple lines. Second, the new name of the variable goes on the left of the = symbol, while the original name goes on the right. A common mistake is to mix this up!

separate()

One of the key principles of data cleaning is that each cell of your data.frame should only have one variable. However, if importing data from different programs, this may not be the default - sometimes, a single variable will hold multiple values, like in the example below:

In the data.frame above, there are three columns; however, multiple values are saved in two of these columns: First, age should have separate columns for year and month, while the mean and standard deviation scores should also be separated into separate columns.

Below is the code that separates the ‘age_y_m’ variable into two:

imported.data2 <- separate(imported.data,
                           col = age_y_m,
                           into = c("age_y","age_m"),
                           sep = "_")

Let’s break down each or the arguments above. As always, the first argument is the data.frame you are performing the tidyverse function on. The ‘col’ argument specifies the name of the variable you wish to split (in the example above, the ‘age_y_m’). The ‘into’ argument specifies the new variable names of your separated columns. You need to specify the correct number of columns in a character vector (in our example, the two new column names are ‘age_y’ and ‘age_m’). The ‘sep’ argument specifies the character that separates your new column (in the example, the age in years and months are separated by an underscore, so we set sep = "_".

Similarly, we can use the separate() to split the mean and standard deviation into two variables:

imported.data3 <- imported.data2 %>%
  separate(col = mean_sd,
           into = c("mean","sd"),
           sep = " sd=")

unite()

Occasionally, you will need to do the opposite of the separate() function, and combine multiple columns to one. This can be achieved with the unite() function.

The code below reverses the what we did in the last question. Note: this exact process is not something you will want to do for data cleaning, but is included here for illustrative purposes.

imported.data2 <- unite(imported.data3,
                       col = "age_y_m",
                       age_y,
                       age_m,
                       sep = "_")
imported.data <- unite(imported.data2,
                       col = "mean_sd",
                       mean,
                       sd,
                       sep = " sd=")

Again, let’s break down each argument. As always, the first argument is the data.frame. The ‘col’ argument is the name of the new variable once columns have been combined. After this, you simply list the column names you wish to combine. In both examples above, we are only pasting two columns together, but this function easily accommodates more than two. Finally, the ‘sep’ argument dictates how the two values will be separated within the one column (in the case of age, this is an underscore: “_“).

recode()

You can use the recode() function to recode individual values within a variable. To use this function, use it within a mutate() function.

In the data.frame below, we want to recode the values “Yes” and “No” in the complete variable to TRUE and FALSE respectively.

data.recoded <- mutate(data.to.recode,
                       complete_logical = recode(complete,
                                                 "Yes" = TRUE,
                                                 "No" = FALSE)
                       )

To review how the mutate() function works, see this week’s workbook. The first argument for the recode() function is the variable you wish to recode. Afterwards, specify the old value and then the new value you wish to recode it to (separated by a = symbol). You need to be exhaustive with your list (i.e., all old values must have a new value). Also, regardless of the class of your variable, all old values must be contained within quotation marks.

Advanced Exercises

If you would like to practice the skills on this page, weekly exercise questions on this content are available in the advanced exercises for Week 2. You can download the interactive exercises by clicking the link below.

Click here to download this week’s exercises.