Show the code
devtools::install_github("wilkelab/ungeviz") #you only need to do this step once Goh Si Hui
February 1, 2024
February 2, 2024
In this hands-on exercise, we will learn how to:
plot static error bars using ggplot2
plot interactive error bars using ggplot2, plotly and DT
create hypothetical outcome plots (HOPs) by using ungeviz package.
Before we start, let us ensure that the required R packages have been installed and import the relevant data for this hands-on exercise.
For this exercise, other than tidyverse, we will use the following packages:
tidyverse: a family of R packages for data science process,
plotly: to create interactive plot,
gganimate: to create animation plot,
DT: to display interactive html table,
crosstalk: to implement cross-widget interactions (currently, linked brushing and filtering), and
ggdist: to visualise distribution and uncertainty.
The code chunk below uses p_load() of pacman package to check if the abovementioned packages are installed in the computer. If they are, they will be launched in R. Otherwise, pacman will install the relevant packages before launching them.
We will use Exam_data.csv for this exercise.
A point estimate is a single number, such as a mean. Uncertainty, on the other hand, is expressed as standard error, confidence interval, or credible interval.
Don’t confuse the uncertainty of a point estimate with the variation in the sample
To plot error bars of maths scores by race using the data provided, we will first derive the necessary summary statistics using the following code chunk.
group_by() of dplyr package is used to group the observation by RACE, summarise() is used to compute the count of observations, mean, standard deviation mutate() is used to derive standard error of Maths by RACE, and the output is save as a tibble data table called my_sum.
We will then display my_sum tibble data frame in html table format using the following code chunk
| RACE | n | mean | sd | se |
|---|---|---|---|---|
| Chinese | 193 | 76.50777 | 15.69040 | 1.132357 |
| Indian | 12 | 60.66667 | 23.35237 | 7.041005 |
| Malay | 108 | 57.44444 | 21.13478 | 2.043177 |
| Others | 9 | 69.66667 | 10.72381 | 3.791438 |
We can visualise the standard error bars of mean maths score by race using the following code chunk.
Note that the error bars are computed by using the formula mean+/-se. :::{.callout-important} For geom_point(), it is important to indicate stat=“identity”. :::
Instead of plotting the standard error bar of point estimates, we can also plot the confidence intervals of mean maths score by race.
ggplot(my_sum) +
geom_errorbar(
aes(x=reorder(RACE, -mean),
ymin=mean-1.96*se,
ymax=mean+1.96*se),
width=0.2,
colour="black",
alpha=0.9,
size=0.5) +
geom_point(aes
(x=RACE,
y=mean),
stat="identity",
color="red",
size = 1.5,
alpha=1) +
labs(x = "Maths score",
title = "95% confidence interval of mean maths score by race")
The confidence intervals are computed by using the formula mean+/-1.96*se. The error bars are sorted using the average maths scores. labs() argument of ggplot2 is used to change the x-axis label.
We can also plot interactive error bars for the 99% confidence interval of mean maths scores by race using the following code chunk.
shared_df = SharedData$new(my_sum)
bscols(widths = c(4,8),
ggplotly((ggplot(shared_df) +
geom_errorbar(aes(
x=reorder(RACE, -mean),
ymin=mean-2.58*se,
ymax=mean+2.58*se),
width=0.2,
colour="black",
alpha=0.9,
size=0.5) +
geom_point(aes(
x=RACE,
y=mean,
text = paste("Race:", `RACE`,
"<br>N:", `n`,
"<br>Avg. Scores:", round(mean, digits = 2),
"<br>95% CI:[",
round((mean-2.58*se), digits = 2), ",",
round((mean+2.58*se), digits = 2),"]")),
stat="identity",
color="red",
size = 1.5,
alpha=1) +
xlab("Race") +
ylab("Average Scores") +
theme_minimal() +
theme(axis.text.x = element_text(
angle = 45, vjust = 0.5, hjust=1)) +
ggtitle("99% Confidence interval of average /<br>maths scores by race")),
tooltip = "text"),
DT::datatable(shared_df,
rownames = FALSE,
class="compact",
width="100%",
options = list(pageLength = 10,
scrollX=T),
colnames = c("No. of pupils",
"Avg Scores",
"Std Dev",
"Std Error")) %>%
formatRound(columns=c('mean', 'sd', 'se'),
digits=2))ggdist is an R package that provides a flexible set of ggplot2 geoms and stats designed especially for visualising distributions and uncertainty.
It is designed for both frequentist and Bayesian uncertainty visualization, taking the view that uncertainty visualization can be unified through the perspective of distribution visualization:
We can use stat_pointinterval() or stat_gradientinterval() to build a visual for displaying distribution of maths scores by race
These two functions come with many arguments, please refer to the syntax reference for more detail.
ggplot(data = exam,
(aes(x = factor(RACE), y = MATHS))) +
geom_point(position = position_jitter(
height = 0.3,
width = 0.05),
size = 0.4,
color = "#0072B2", alpha = 1/2) +
geom_hpline(data = sampler(25, group = RACE), height = 0.6, color = "#D55E00") +
theme_bw() +
transition_states(.draw, 1, 3)
.draw is a generated column indicating the sample draw