Create Multiple Scatterplots using ggplot2 in Power BI – Part 1

Scatter plots allow us to place points that let us see possible correlations between two features of a data set. Let’s see how we can create them with ggplot2 in Power BI.

The standard ggplot2 syntax for creating scatterplots is outlined below and require different sets of input commands than shown in the previous blog post.

We define a dataframe, df, and then in the aes() statement, we give it an x-variable and a y-variable to plot. I will save it as a ggplot object called p, because we are going to use this as the base and then layer everything else on top:

library(ggplot2)
library(gridExtra)
# Install both of these packages to follow through this tutorial.
p <- ggplot(data=df,aes(x = df$Day,y=df$Home.Value))

Now for the plot to print, we need to specify the next layer, which is how the symbols should look – Another question here is: Do we want points or lines, what color, how big and many other attributes etc.

Let’s start with points first:

# Print plot with default points 
p1 <- p + geom_point()
p1

 

scatterplot1

We can add a third feature by adding a color gradient on each point, or by resizing each point based on their value of this 3rd feature. For example:

p2 <- p1 + geom_point(color="red") #set one color for all points 
p3 <- p1 + geom_point(aes(color = df$Home.Value)) #set color scale by a continuous variable 
p4 <- p1 + geom_point(aes(color=factor(region))) #set color scale by a factor variable 
grid.arrange(p2, p3, p4, ncol=3)

 

scatterplot2

We can also change the default colors of the graph in ggplot2 like this

p5 <- p4+ scale_color_manual(values = c("orange", "black","green","blue","red"))
p5

Changing shape or size of points

We’re sticking with the basic p1 plot, but now changing the shape and size of the points:

Let’s first comment out the previous p2,p3 and p4 variables by selecting & pressing Ctrl+Shift+C.

p2 <- p1 + geom_point(size = 5) #increase all points to size 5
p3 <- p1 + geom_point(aes(size = factor(df$Year))) #set point size by factor variable
p4 <- p1 + geom_point(aes(shape = factor(df$Year))) #set point shape by factor variable

grid.arrange(p2, p3, p4, nrow=1)

scatterplot3

Add lines to scatterplot

Moving further, You can also add the lines to the scatterplot with the smoothing effect achieved in the below syntax.

p2 <- ggplot(df, aes(x = df$Day, y =df$Home.Value)) +stat_smooth(method = "loess",colour="green",span=0.1)+stat_smooth(method="loess",colour="blue",span=0.75)
#connect points with line
p3 <- p1 + geom_point(color="red") +geom_smooth(method = "loess", se = TRUE) #add regression line
p4 <- p1 + geom_point() + geom_vline(xintercept = 15, color="red") #add vertical line

grid.arrange(p2, p3, p4, nrow=1)

scatterplot4.jpg

Lastly, I am wrapping up this post and will cover the other formatting options in the upcoming post to make this visualization beautiful.

Thanks for reading this post and I hope you have extended your visualization skills further using ggplot2 and R.