I feel rueful. I’ve learned Tableau. A proprietary tool [cue the horror-movie sound].
Nothing wrong with that, you say? Well, I felt like a traitor, given that my book, Computing Skills for Biologists, spends many pages on why to prefer open-source software over proprietary tools.
In academia, some people carry it like a badge of honor that they don’t have Microsoft Office installed. While the intentions are good (accessibility of open source tools for everyone), the choice turns out less clear cut in an industry setting.
When a recruiter suggested that I add Tableau to my analytics toolbox, I initially wasn’t fond of the idea. For one, I thought it expensive. Second, I didn’t see what advantage it would possibly bring over R, which for the 15 years has served me well for all analytics, visualization, and dashboarding needs.
Tableau Public is free and easy to use
When I learned that Tableau offers a free version of its Desktop software, I wanted to give it a try. Tableau Public isn’t open software by any definition, but at least it’s accessible to everyone. So I bend my own rules and call it “showcasing adaptability.”
So learning Tableau was initially the reason why I analyzed crime data during the COVID-19 pandemic. Once I started digging deeper though the story that unfolded kept me captivated—changes of 1,200% in a metric are quite exciting for a data geek. Unfortunately, we are talking about an increase in simple assault cases, which quickly ends my excitement from a social stand-point.
Below is a graph that I generated from the Boston crime data in Tableau within a few minutes. Tableau definitely made it easy to connect and visualize the data. The interface is intuitive and allows the creation of impactful and clear plots quickly.
Both R and Tableau have an extensive community, but Tableau’s is friendlier
What excited me most, though, was the Tableau user community that quickly jumped in to help out a fellow user. The tone among community members felt kinder, compared to the Stackoverflow community that I usually turn to. That’s a plus for Tableau. Their forum uses a point system to reward interaction, similar to Stackoverflow. Is it the Tableau Community Ambassadors then, that make the difference and keep the tone friendly?
I conducted a similar visualization in R (using Chicago’s crime data), shown below. I’ll admit, it took me significantly longer to set up the plot. Connecting to the data (7M rows on BigQuery), aggregating it to the desired granularity, and eventually setting all details of the plot took me 2 hours. You can find the code in a GitHub repository.
I certainly enjoy the flexibility that
ggplot provides, but it poses an easy trap to keep fiddling until everything looks perfect. In brief, R’s flexibility can count as much for as against it.
R is repeatable and shareable
So Tableau won in terms of efficiency on the first try, but that ease comes at a high cost. Processes in drag-and-drop software, such as Tableau, are harder to replicate. R is a scripted language, meaning that function calls in a text file lead to the results. A text document is repeatable, sharable, and easy to archive. When I open such an R script many months later, I can still easily understand what’s happening—especially when the file is well commented.
I yet have to find out how to achieve the same level of documentation in Tableau. If I have access to the workbook, I see what went into the plot but have to individually retrace which set, filter, adjusted option, or table calculation brought me there. All click-and-drop tools share this burden. Repeating the same analysis on a slightly different data set is tedious. Or am I missing some Tableau features? Please comment below if you know how to increase the repeatability of complex analyses in Tableau.
R goes beyond visualizing data
When visualizing graphs in Tableau, I often wondered if the identified patterns are actually meaningful. Because something looks “interesting” that doesn’t necessarily mean that something IS (statistically speaking) “interesting.” Tableau offers a small set of analytics settings such as confidence bands, trend lines (that report R squared and p-values) but it felt unsatisfying.
Indeed, the majority of Tableau users does not seem fond of rigorous statistical testing in their visual reports. For instance, how can I understand if groups in a bar chart are truly distinct without an indication of standard deviation? Whether or not something looks different might be controlled by the resolution of the axis.
Tableau’s strength certainly lays in visualization, not statistical analysis. I clearly missed R’s capability to plug the data into something as simple as a t-test or two-way ANOVA to truly understand what’s going on. That’s a big plus for R.
Interestingly, since last year, Tableau allows connecting to R, Python, and MATLAB through their “external services connectors.” Unfortunately, Tableau Public lacks that feature but I imagine that it greatly expands the use of Tableau. Run your models in R, then enjoy Tableau’s interface to quickly and appealingly visualize the results.
I enjoyed learning and using Tableau. Given its extensive support, forums, and concise tutorials, I understand why companies opt for a proprietary business intelligence tool such as Tableau, over R. It is more accessible for analysts with little coding background and facilitates quick, ad hoc visual analysis.
When it comes to repeatability and statistical rigor, I prefer a scripted language, such as R, that is easier to integrate into comprehensive analytics pipelines. Tableau’s external service capability allows connecting to Python, R, or Matlab. This allows leveraging the best of these tools.