Power BI Desktop has a native support for creating and rendering R visuals using various libraries supported and R script visual. ggplot2 is one of them and the most widely used package in R to build custom graphs & visuals.
ggplot2 is based on the grammar of graphics, the idea that you can build every graph from the same few components:
- A data set
- A set of geoms—visual marks that represent data points, and
- A coordinate system -to display data values, map variables in the data set to aesthetic properties of the geom like size, color, and x and y locations.
Layers for building Visualizations
ggplot2 is based off the grammar of graphics, which sets a paradigm for data visualization in layers: Below are the six layers for defining visuals in ggplot2 library.
We won’t go too much in depth to the overall philosophy of the grammar of graphics because the best source of this is from the creator of ggplot, Hadley Wickham, who created a great paper on the topic which you can read here.
To learn the syntax for grammar of graphics and ggplot, we can get a better understanding through some quick examples. In this blog, I’ll quickly show some histogram syntax examples, then in the following blogs, I’ll show various examples of specific plot types using qplot() and ggplot(), then we’ll wrap our understanding by building off the final layers of the grammar of graphics in the future posts.
Data and Set-up
Let’s get started:
- R Studio: Those of you who do not have RStudio installed, they can download and install it by clicking on the link provided. It is an open source R development tool and available free of cost.
- Download and Install R or Microsoft R
- ggplot2() Package: install this package using the below syntax:
- Power BI Desktop
- Real Estate Data Set
The general syntax of using ggplot2 will look like this:
ggplot(data = <default data set>, aes(x = <default x axis variable>, y = <default y axis variable>, ... <other default aesthetic mappings>), ... <other plot defaults>) + geom_<geom type>(aes(size = <size variable for this geom>, ... <other aesthetic mappings>), data = <data for this point geom>, stat = <statistic string or function>, position = <position string or function>, color = <"fixed color specification">, <other arguments, possibly passed to the _stat_ function) + scale_<aesthetic>_<type>(name = <"scale label">, breaks = <where to put tick marks>, labels = <labels for tick marks>, ... <other options for the scale>) + theme(plot.background = element_rect(fill = "gray"), ... <other theme elements>)
We’ll build up an understanding of this piece by piece. But first we’ll need data! We’ll use real estate data provided in the blog.
Quick Example with Histograms
Let’s bring the dataset into the Power BI first by using the Get Data dialog box.
Once data is in Power BI, Select the R script visual and drag the Home.Value column to the Values section of the fields area as shown in the below screenshot.
This will create the dataset for you in the R visual and then you can rename the dataset variable to df for the easy reference in the next lines of the R script.
The goal here is to create a histogram based on the house prices and to create histogram, I have used the in built R function “qplot”. By passing the data frame to the qplot function, we can get the results shown in the above figure 1.
The cool thing about this R visualization is that we can customize it as per our requirements and also slice & dice with the other attributes in Power BI.
Now we will use the ggplot2 package, an advanced and sophisticated histogram that can be customized as per the business requirement.
I have written the script in the below figure. copy it in your R visual and execute it once again.
Pretty much the syntax of grammer of graphics is self explanatory and you can figure out what each of the syntax components that are joined by the “+” sign is interacting with each other to create our customized histogram. You can study about each of these components in details by visiting the Hadley Wickham’s ggplot 2 resources link provided in the start of the blog.
The end result of our ggplot function is shown in the below picture:
I have also added the extra layer of kernel density estimation plot (smoothing effect) for the histogram which is impossible to achieve using the standard Power BI visuals.
You can read more about it on Wikipedia kernel density estimation plot
That’s all for now on the new year’s day concerning histograms in Power BI using R. I’ve shown that ggplot2 has amazing customization capabilities, however it definitely takes time to get used to!
Let me know how would you like it and I will cover the other visuals in more details in coming posts based on ggplot2 and plotly library.
till then…. Good Bye.