Scatter plots allow us to place points that let us see possible correlations between two features of a data set. Let’s see how we can create them with ggplot2 in Power BI.
The standard ggplot2 syntax for creating scatterplots is outlined below and require different sets of input commands than shown in the previous blog post.
We define a dataframe, df, and then in the aes() statement, we give it an x-variable and a y-variable to plot. I will save it as a ggplot object called p, because we are going to use this as the base and then layer everything else on top:
library(ggplot2) library(gridExtra) # Install both of these packages to follow through this tutorial. p <- ggplot(data=df,aes(x = df$Day,y=df$Home.Value))
Now for the plot to print, we need to specify the next layer, which is how the symbols should look – Another question here is: Do we want points or lines, what color, how big and many other attributes etc.
Let’s start with points first:
# Print plot with default points p1 <- p + geom_point() p1
We can add a third feature by adding a color gradient on each point, or by resizing each point based on their value of this 3rd feature. For example:
p2 <- p1 + geom_point(color="red") #set one color for all points p3 <- p1 + geom_point(aes(color = df$Home.Value)) #set color scale by a continuous variable p4 <- p1 + geom_point(aes(color=factor(region))) #set color scale by a factor variable grid.arrange(p2, p3, p4, ncol=3)
We can also change the default colors of the graph in ggplot2 like this
p5 <- p4+ scale_color_manual(values = c("orange", "black","green","blue","red")) p5
Changing shape or size of points
We’re sticking with the basic p1 plot, but now changing the shape and size of the points:
Let’s first comment out the previous p2,p3 and p4 variables by selecting & pressing Ctrl+Shift+C.
p2 <- p1 + geom_point(size = 5) #increase all points to size 5 p3 <- p1 + geom_point(aes(size = factor(df$Year))) #set point size by factor variable p4 <- p1 + geom_point(aes(shape = factor(df$Year))) #set point shape by factor variable grid.arrange(p2, p3, p4, nrow=1)
Add lines to scatterplot
Moving further, You can also add the lines to the scatterplot with the smoothing effect achieved in the below syntax.
p2 <- ggplot(df, aes(x = df$Day, y =df$Home.Value)) +stat_smooth(method = "loess",colour="green",span=0.1)+stat_smooth(method="loess",colour="blue",span=0.75) #connect points with line p3 <- p1 + geom_point(color="red") +geom_smooth(method = "loess", se = TRUE) #add regression line p4 <- p1 + geom_point() + geom_vline(xintercept = 15, color="red") #add vertical line grid.arrange(p2, p3, p4, nrow=1)
Lastly, I am wrapping up this post and will cover the other formatting options in the upcoming post to make this visualization beautiful.
Thanks for reading this post and I hope you have extended your visualization skills further using ggplot2 and R.