Abstract
Below are some notes I have taken on David Robinson’s screencasts, with tips and tricks I use for my own R peregrinations in the Tidyverse framework. Hopefully, these notes will be useful to others.
参考网址:https://oliviergimenez.github.io/tidyverse-tips/
starwars %>%
filter(species %in% c('Aleena','Droid')) %>%
count(species, gender) %>%
complete(species,gender)
starwars %>%
filter(species %in% c('Aleena','Droid')) %>%
count(species, gender) %>%
complete(species,gender,fill = list(n = 0))
starwars_raw %>%
summarise(across(where(is.numeric),list(mean = ~mean(.x,na.rm = TRUE),
sd = ~sd(.x,na.rm = TRUE))))
starwars %>%
filter(!is.na(species)) %>%
count(species = fct_lump(species, 3)) %>%
mutate(species = fct_reorder(species, n)) %>%
ggplot(aes(x = n, y = species)) +
geom_col() +
mytheme
starwars %>%
filter(!is.na(species)) %>%
count(species = fct_lump(species, 3)) %>%
mutate(species = factor(species)) %>%
ggplot(aes(x = n, y = fct_reorder(species,n,.desc = FALSE))) +
geom_col() +
mytheme
Free the scales
The argument scales = "free"
is useful when using facet_wrap()
. It allows the X and Y axes to have their own scale in each panel. You can choose to have a free scale on the X axis only with scales = "free_x"
, same thing for the Y axis with scales = "free_y"
.
Flip coordinates
We used to add a coord_flip()
following geom_col()
to improve the reading of a bar plot by having the categories on the Y axis. This extra line of code is no longer needed as we can simply permute the variables in the aes()
.
Titles too long
Also, in a facet_wrap()
, the title of each panel might be too long so that it doesn’t read properly. There are two ways to fix that. Either you decrease the font size with a theme(strip.text = element_text(size = 6))
or you truncate the title with a mutate(tr_title_text = str_trunc(title_text, 25)
.
## [1] "My name..."
Log scale
It often makes sense to plot your data using log scales. It is very easy to do in ggplot2 by piping a scale_x_log10()
or a scale_y_log10()
.
Axes format
To improve the reading of your figure, it might be useful to represent the unit of an axis in percentage or display numbers with commas. The scales
package is what you need. For example, pipe a scale_y_continuous(labels = scales::percent)
to have your Y axis in percentages, or scale_x_continuous(labels = scales::comma)
to add commas to the numbers of your X axis.
Regular expressions
I find them boring but regular expressions
for describing patterns in strings are very useful when you have to filter rows based on some patterns (str_detect()
), remove characters (str_remove()
) or separate rows (separate_rows()
). Good resources are this book chapter dedicated to strings and the vignette of the stringr
package.