2 Data Visualization
Mere numbers and tables would not always provide the required information for any decision making. It is necessary that we gather some visual patterns, trends from the data to support any decision making process. This is where the data visuaization helps in getting visual perspective of the data. Basic plots such as scatter plot, bar chart, box-plot, histogram plot
are great tools to provide intuitive information about the data.
2.1 Relation of diabetes and pregnancies
First we try to gather the relationship between occurrence of diabetes disease and age of the subjects with the pregnancies.
2.1.1 Scatter plot
p1<-ggplot(diab,aes(x=Age,y=Pregnancies,col=Outcome))+geom_point()+geom_smooth(method="loess", se=T)+facet_grid(.~Outcome)
ggplotly(p1)
Loess
method for the data provided.
2.1.2 Boxplot
The plot shows the details about pregnancies and its distribution across the age of the subjects with diabetes outcome
p2<-ggplot(diab,aes(x=Age,y=Pregnancies))+geom_boxplot(aes(fill=Outcome))+facet_wrap(Outcome~.)
ggplotly(p2)
2.1.3 Density Plot
Through the density plot we can find the distribution of univariate variables in our case the pregnancies of the test subjects
p3<-ggplot(diab,aes(x=Pregnancies))+geom_density(aes(fill=Outcome),alpha=0.6)+
geom_vline(aes(xintercept=mean(Pregnancies)),
color="blue", linetype="dashed", size=1)+facet_grid(.~Outcome)+scale_fill_aaas()
ggplotly(p3)
2.2 Relation between Glucose, Blood Pressure, Age, Pregnancy
2.2.1 Scatter Plot
p3<-ggplot(diab,aes(x=Age, y=Pregnancies, size=Glucose, fill=BloodPressure))+geom_point(alpha=0.2)+
facet_grid(.~Outcome)+geom_jitter(width = 0.4)+scale_x_continuous(limits = c(18, 80))+scale_fill_material("red")
ggplotly(p3)