Snippet #4: boxplots and scatterplots: simple recipes

R
tutorials
ggplot
visualization
recipes
Author

Giorgio Luciano

Published

March 11, 2023

  1. Simulate data, check and assign data types
  2. Create a scatterplot with ggplot
  3. Create violin plot with ggstatsplot

Example 1: We want to visualize the difference between two groups of patients that follow two different diets. Group A has an average of total cholesterol of 180 with a standard deviation of 20 while Group B and average of 200 with a standard deviation of 40

library(MASS)
library(ggplot2)
library(data.table)


npatientsA <- 500
npatientsB <- 520
cholA <- mvrnorm(n=npatientsA, mu=180, Sigma=20, empirical=T)
cholB <- mvrnorm(n=npatientsB, mu=200, Sigma=40, empirical=T)

dataA <- cbind(cholA,rep("A",npatientsA))  
dataB <- cbind(cholB,rep("B",npatientsB))  

data <- data.frame(rbind(dataA,dataB))
colnames(data) <- c("Cholesterol","group")
data$Cholesterol <- as.numeric(data$Cholesterol)

p1 <-ggplot(data, aes(x = group, y = Cholesterol)) + geom_jitter(alpha=0.05) 

p1

A few observations on the code. First of all, we need to input the data in a data.frame otherwise ggplot will give us an error. The second observation is that since we put chr labels on our groups we needed to define Cholesterol as.numeric in order to avoid unwanted resultsstrange results. Try to comment the line data$Cholesterol <- as.numeric(data$Cholesterol) and you can see by yourself what will happen. (hint: a “labelstorm!”)

Jiiter plots is one of my favorite way to represent data. data and immediately understand the distribution of your data and also avoid the pitfall of boxplot (see (Matejka and Fitzmaurice 2017))

If you need inferential statistics on your data another resource is (Patil 2021). See the following example with our data. NOTE that we nee to transform the group label as.factor

library(ggstatsplot)
You can cite this package as:
     Patil, I. (2021). Visualizations with statistical details: The 'ggstatsplot' approach.
     Journal of Open Source Software, 6(61), 3167, doi:10.21105/joss.03167
data$group <- as.factor(data$group)

pstack  <- ggbetweenstats(data,group,Cholesterol)
                          
pstack    

References

Matejka, Justin, and George Fitzmaurice. 2017. “Same Stats, Different Graphs.” Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, May. https://doi.org/10.1145/3025453.3025912.
Patil, Indrajeet. 2021. Visualizations with Statistical Details: The ’Ggstatsplot’ Approach 6: 3167. https://doi.org/10.21105/joss.03167.