2 Data Visualization

Mere numbers and tables would not always provide the required information for any decision making. It is necessary that we gather some visual patterns, trends from the data to support any decision making process. This is where the data visuaization helps in getting visual perspective of the data. Basic plots such as scatter plot, bar chart, box-plot, histogram plot are great tools to provide intuitive information about the data.

2.1 Relation of diabetes and pregnancies

First we try to gather the relationship between occurrence of diabetes disease and age of the subjects with the pregnancies.

2.1.1 Scatter plot

p1<-ggplot(diab,aes(x=Age,y=Pregnancies,col=Outcome))+geom_point()+geom_smooth(method="loess", se=T)+facet_grid(.~Outcome)
ggplotly(p1)

The above plot also shows the trend modeled through Loess method for the data provided.

2.1.2 Boxplot

The plot shows the details about pregnancies and its distribution across the age of the subjects with diabetes outcome

p2<-ggplot(diab,aes(x=Age,y=Pregnancies))+geom_boxplot(aes(fill=Outcome))+facet_wrap(Outcome~.)
ggplotly(p2)

2.1.3 Density Plot

Through the density plot we can find the distribution of univariate variables in our case the pregnancies of the test subjects

p3<-ggplot(diab,aes(x=Pregnancies))+geom_density(aes(fill=Outcome),alpha=0.6)+
  geom_vline(aes(xintercept=mean(Pregnancies)),
            color="blue", linetype="dashed", size=1)+facet_grid(.~Outcome)+scale_fill_aaas()
ggplotly(p3)

2.2 Relation between Glucose, Blood Pressure, Age, Pregnancy

2.2.1 Scatter Plot

p3<-ggplot(diab,aes(x=Age, y=Pregnancies, size=Glucose, fill=BloodPressure))+geom_point(alpha=0.2)+
  facet_grid(.~Outcome)+geom_jitter(width = 0.4)+scale_x_continuous(limits = c(18, 80))+scale_fill_material("red")
ggplotly(p3)