background picture with networks
Module 1 | Lesson 1

Introduction

Productive R Workflow is a mini course helping R developers be more efficient at data analysis.

This page explains why I decided to create this project, how it is organized and setup the foundation of the project we will work on.

Free
6 minutes read

Welcome ๐Ÿ‘‹ !

Over the past ten years, I've immersed myself in the realms of data science and software engineering, during which I've launched several online projects (1, 2, 3) centered around these disciplines.

This experience has made my desk a go-to destination for friends and colleagues seeking guidance on their data analysis endeavors. Often, they come to me after some initial data exploration, equipped with an R script in hand, ready for refinement.

Here's a glimpse of what that might look like:

setwd("~/perso/tmp/project/penguins")

data <- read.csv("https://raw.githubusercontent.com/holtzy/R-graph-gallery/master/DATA/data_2.csv")

summary(data)

print(round(mean(subset(na.omit(data), species == "Adelie" & island == "Torgersen")$bill_length_mm),2))
print(round(mean(subset(na.omit(data), species == "Adelie" & island == "Biscoe")$bill_length_mm),2))
print(round(mean(subset(na.omit(data), species == "Adelie" & island == "Dream")$bill_length_mm),2))


# Plot
penguins_clean <- na.omit(   data  )
plot(penguins_clean$bill_length_mm, penguins_clean$bill_depth_mm, type='n', xlab='Bill Length (mm)', ylab='Bill Depth (mm)', main='Penguin Bill Dimensions')
points(
  penguins_clean$bill_length_mm[penguins_clean$species  ==  "Adelie"], penguins_clean$bill_depth_mm[penguins_clean$species == "Adelie"], col='red', pch=16)
points(penguins_clean$bill_length_mm[penguins_clean$species == "Chinstrap"], penguins_clean$bill_depth_mm[penguins_clean$species == "Chinstrap"], col='green', pch=17)
points(penguins_clean$bill_length_mm[penguins_clean$species == "Gentoo"],
       penguins_clean$bill_depth_mm[penguins_clean$species == "Gentoo"], col='blue', pch=18)
legend("topright", legend=unique(penguins_clean$species),
       col=c('red'
        , 'green',
       'blue'), pch=c(16, 17, 18))

It works! ๐ŸŽ‰

Take a moment to read through the code and execute it.You're bound to uncover some intriguing findings!

However, it's riddled with issues that compromise its usability and maintainability.

The errors I often encounter are recurrent, yet they can be swiftly addressed with a few best practice tips.

โ†’ A small investment of your time now can lead to a lifetime of improved coding.

If the pitfalls marked in this code snippet aren't immediately apparent to you, then this course will be particularly enlightening.

But there's more to it than just code quality!

Code issues. Workflow issues.

Beyond the struggle with suboptimal code, my peers often found themselves in a bind, copying and pasting results into reports and rushing to email them to superiors, only to spot errors moments later.

Sharing these results, not to mention collaborating on code, was a constant hurdle. This scenario was all too familiar to me.

That's why this course exists: it's the compilation of essential tips and tools I wish I had when I first dived into data science ten years ago. It's designed to streamline and enhance your workflow from the ground up.

Solutions

In today's tech-rich environment, data analysts have a plethora of tools at their disposal to enhance efficiency. Choosing the right one can often be overwhelming.

This course is meticulously crafted to be as concise as possible, focusing exclusively on the essential technologies that you simply can't afford to overlook.

Structured into five distinct modules, it will guide you through the critical skills needed to elevate your data analysis:

Each module is divided into bite-sized lessons, allowing you to progress at a comfortable pace that suits your schedule.

Learn by doing

I firmly believe that active engagement is key to learning.

To truly grasp, remember, and apply new concepts, one must immerse themselves in practice. That's why this course is designed to be highly interactive, filled with a variety of exercises, quizzes, and interactive coding environments to enrich your learning experience.

Furthermore, we'll collaborate on a hands-on project that involves analyzing the well-known penguin dataset. You might recall the preliminary code analysis I shared at the beginning of this post? Our goal is to refine and enhance this code with each lesson. By the course's conclusion, not only will you have mastered the material, but we will also have transformed that initial snippet into a sophisticated, web-based report.

Let's dive into the first phase of our project without further ado!

1
Create a folder in your computer called productive-r-workflow
2
Create a script called analysis.R in the folder. Paste the code from the top of this post.
3
Open and run the script in Rstudio. At the end, you should get a scatterplot.

Now that our working environment is ready, let's start improving our code and workflow, one step at a time!

Next โ†’

{Tidyverse} is your friend