Blog

Linear Regression Basics

So, you think you know linear regression? I have likely run thousands of regression models as a researcher and data analyst. However, a recent job interview made me think about some finer details of linear regression. As a consequence, I wanted to brush up on the topic. The outcome is a series of posts on linear regression. This is the first part and provides the foundations of linear regression.

Continue reading “Linear Regression Basics”

Publish to WordPress directly from RStudio

The following piece of automation let me fall in love with R all over again: Publish a WordPress post directly from an R Markdown file! What's so great about that? Usually, I develop my analysis and visualizations in an RMarkdown file that I publish to Github (version control, code sharing—all the good reasons to use git).

Github is fantastic to share code, but it's not trivial to skim through a script and quickly understand what was done and how. Even with lots of comments, it will require dedication and time. However, A WordPress post about the analysis provides background, emphasizes only the interesting coding bits, provides visual results, and has a much higher likelihood to be found by other programmers.

Unfortunately, the above workflow separates the WordPress post from the underlying R script. Changing one part requires manually changing the other. The ability to have the raw code (posted on Github) and the WordPress post from a single source of truth (the R Markdown script) is a game-changer! Best of all, it's easy to set up, as we'll see below.

Continue reading “Publish to WordPress directly from RStudio”

(Don’t) Blame the weather

This project uses time series analysis and forecasting techniques to explore crime data during the COVID-19 pandemic, while also considering the weather.

Did COVID-19 prevent homicides?

Admittedly, that header is a bit sensational, but when I analyzed Boston’s crime data during Covid-19, the drop in reported offenses was quite astonishing. While verbal dispute offenses sky-rocketed, and more people set out to rob a bank, the overall number of crimes dropped by more than half. So indeed, the lock-down prevented harm by reducing the risk of catching the virus and also decreased the probability of getting hit by a car or bullet.

Continue reading “(Don’t) Blame the weather”

Tableau vs R

I feel rueful. I’ve learned Tableau. A proprietary tool [cue the horror-movie sound].

Nothing wrong with that, you say? Well, I felt like a traitor, given that my book, Computing Skills for Biologists, spends many pages on why to prefer open-source software over proprietary tools.

In academia, some people carry it like a badge of honor that they don’t have Microsoft Office installed. While the intentions are good (accessibility of open source tools for everyone), the choice turns out less clear cut in an industry setting.

Continue reading “Tableau vs R”

Boston crime in times of COVID-19

I am writing this at a time when COVID-19 paralyzes the world. The fatalities, geographic patterns, and economic impact of the disease are subject to fantastic visualizations elsewhere.

However, the lockdown in response to the disease has an impact on almost all aspects of our life and, hence, creates striking patterns in otherwise consistent data.

Here, we’ll take a look at other (equally sad) graphs—the crime statistics in Boston. Did the number of reported offenses change during COVID-19? Did the occurrence of specific offenses vary in comparison to other periods?

Continue reading “Boston crime in times of COVID-19”

Identifying SEM keywords from a technical text

Identifying keywords for Search Engine Marketing (SEM) or on-site Search Engine Optimisation (SEO) efforts is a basic task for many marketers. Usually, one can retrieve keywords easily, thinking “What is my text about?” or “How would a customer search for this text?”. However, identifying keywords gets tricky when the marketer is not a subject-matter expert for the content that he or she is advertising for. This post introduces a text analysis tool to help non-experts extract meaningful keywords from a technical text.

Continue reading “Identifying SEM keywords from a technical text”