Why create a personal R package?
As a consulting data scientist, I write a lot of R code in a lot of different places – physically and virtually. Different computers, servers, evironments, VPNs, operating systems, all of the above. Even when I have the luxury of working with the same client (and computing environment) for enough time to work on different projects, things can get messy.
When is it worth it?
I find myself often facing a dilemma – do I keep project specific code consolidated in one location at the expense of possible duplication later on by copying around old general purpose functions to allow for customization and further development in the future? Or do I maintain the general purpose functions which may be called from several different projects in one location at the expense of making customization and enhancing functionality more of a headache?
I find there are pros and cons to each method:
- Decentralized: portable, customizable … but can be grossly duplicatitive and suffer from curse of versionality.
- Centralized: organized, clean, efficient, scalable … but can be rigid, requires discipline and can break old programs if you’re not careful.
Surely centralization is the better solution after some tipping point. However, the unpredictable nature of my work sometimes makes it difficult to predict when (or if) that tipping point will occur – when the benefits of centralization begin to outweigh the costs of the portable lightweight decentralized method.
Why not just a folder full of functions?
I’m not sure I have a good answer to this yet. This was my previous solution for maintaining general purpose R functions until building a package.
Some current thoughts:
- Organization: I’m finding it easier to organize my functions with structured documentation on parameters and examples, albeit with some upfront cost of actually writing this documentation.
- Sharing: makes it easier to share your code that is documented in a common-tongue with others.
- Version Control: If you’re using Github, you can always revert back to previous versions.
- Tests & Checks: When building a package, your code is evaluated for errors and missing dependencies, including your examples.
- Shiny apps: I realized during this process that you can actually wrap Shiny apps up into functions and configure them to take arguments from your environment. Useful for quick and dirty exploratory work so you don’t have to worry about directories or change ui.R and server.R code around to fit something new you want to throw in a Shiny app.
How to start building
-
I got started following Hilary Parker’s post: Writing an R package from scrach. This gets you a minimal package on Github.
-
I had to fill in with some steps from Steve Mosher’s post on building in Windows. I needed to add R to my path and install Miktex.
-
For anything else, you can probably find it on R guru Hadley Wickham’s R packages page soon to be published (2015) by O’Reilly .
These made my life easier
Workflow for using the package
Once you’ve built a package on Github, it’s simple to pull it down wherever you are.
Workflow for adding to a package
- Clone the git repository from Github locally on whichever machine you’re on.
- Add function(s) to the
\R
folder of your package. - Good practice to add
packageName::
before each function from an external package, so it’s clear what your dependencies are for each function. - Update DESCRIPTION file with package dependencies: imports and suggests.
- Check and Build in RStudio.
- Commit and push changes to Github.
Gotchas
If using RStudio and roxygen2, you might have to Configure Build Tools to allow Roxygen to generate the documentation you want it to.
In RStudio:
=> Build
=> Configure Build Tools
=> Check box for “Generate Documentation with Roxygen”
=> Click “Configure”
=> Probably want to check boxes for at least “RD files” and “NAMESPACE”.
If your NAMESPACE file (simple list of your functions and dependencies) isn’t updating and you dont want to use RStudio, try devtools::document()