library(MASS)
library(ggplot2)
library(data.table)
<- 500
npatientsA <- 520
npatientsB <- mvrnorm(n=npatientsA, mu=180, Sigma=20, empirical=T)
cholA <- mvrnorm(n=npatientsB, mu=200, Sigma=40, empirical=T)
cholB
<- cbind(cholA,rep("A",npatientsA))
dataA <- cbind(cholB,rep("B",npatientsB))
dataB
<- data.frame(rbind(dataA,dataB))
data colnames(data) <- c("Cholesterol","group")
$Cholesterol <- as.numeric(data$Cholesterol)
data
<-ggplot(data, aes(x = group, y = Cholesterol)) + geom_jitter(alpha=0.05)
p1
p1
- Simulate data, check and assign data types
- Create a scatterplot with
ggplot
- Create violin plot with
ggstatsplot
Example 1: We want to visualize the difference between two groups of patients that follow two different diets. Group A has an average of total cholesterol of 180 with a standard deviation of 20 while Group B and average of 200 with a standard deviation of 40
A few observations on the code. First of all, we need to input the data in a data.frame
otherwise ggplot will give us an error. The second observation is that since we put chr
labels on our groups we needed to define Cholesterol as.numeric
in order to avoid unwanted resultsstrange results. Try to comment the line data$Cholesterol <- as.numeric(data$Cholesterol)
and you can see by yourself what will happen. (hint: a “labelstorm!”)
Jiiter plots is one of my favorite way to represent data. data and immediately understand the distribution of your data and also avoid the pitfall of boxplot (see (Matejka and Fitzmaurice 2017))
If you need inferential statistics on your data another resource is (Patil 2021). See the following example with our data. NOTE that we nee to transform the group label as.factor
library(ggstatsplot)
You can cite this package as:
Patil, I. (2021). Visualizations with statistical details: The 'ggstatsplot' approach.
Journal of Open Source Software, 6(61), 3167, doi:10.21105/joss.03167
$group <- as.factor(data$group)
data
<- ggbetweenstats(data,group,Cholesterol)
pstack
pstack