This post is really about workflow. Specifically a data-science workflow, although it should be relevant for others. It will probably resonate most (if at all) with those who have some experience (mostly positive) generating reports from Rmarkdown files with knitr, but might have some gripes. Maybe not gripes, maybe just feelings of uncertainty over whether it makes sense to contain your hard work in an Rmarkdown file or an R script, or both.
Generate reports with Rmarkdown (Rmd) files
With Rmarkdown, you can generate these stylish reports with code like this.
Generate reports directly from R scripts
One can also cut out the middle-man (Rmd) and generate the exact same HTML, PDF and Word reports using native R scripts. This was news to me until this week. It’s a subtle difference, but one that I’ve found nimble and powerful in all the right places. Check this out for a quick intro.
How it works: Code as normal. Tweak the comments in your code to render the document text, headers, format, style, etc. of your report however you like. You can compile any old R script, regardless of it’s structure, but there are a lot of options at your disposal for formatting and prettifying, if that’s your thing. Then it’s a one liner to compile into a report:
Rmarkdown vs R
Rmd != R: You can’t source an Rmarkdown file like you would an R script. I have no doubt there are tools that exist (or can be easily developed) to strip the code chunks from an Rmarkdown file, but this seems cumbersome.
Competing incentives: presentation vs. workflow: When you’ve got tons of code chunks with just a few lines each, it can be annoying to test your code without knitting (compiling) your entire document. I often purposely keep chunks big to facilitate running blocks of selected code interactively. This makes for smooth coding, but slightly more obtuse documents. One strategy I’ve tried is to “Rmarkdownify” my code only after I’ve thoroughly developed and tested it… but then when it comes time to re-examine, change or pipe code someplace else, you’ve got this Rmarkdown document to overhaul. And in my work (many more parts analysis than development), I’m rarely ever done or know when I’m done.
No need to duplicate Rmd and R scripts: Say you’re writing some data wrangling code that pulls from a handful of data sources, merges them all together, aggregates, scales and transforms them into an analytics ready dataset. You want to document this process… but you also want to be able to pipe this piece of ETL code elsewhere. I’ve been tempted in the past to maintain both a bare-bones R script and a verbose flowery Rmd file describing the process. This keeps both the developers (on your team or within yourself) happy and the consumers of your analysis happy… but it will probably drive you crazy maintaining two versions of more-or-less the same thing. With an R script formatted with markdown-style comments, you might be able to get the two birds with one stone.
Run-time: This isn’t very well addressed by either method, but I certainly find it easier to work with bigger data anything computationally intensive using native R scripts. When I knit a big Rmarkdown script, I often cross my fingers and hope it doesn’t bug 95% through and I have to start over. By default, knitting .Rmd files does not persist objects to the Global Environment, although I’d be surprised if there wasn’t a way to change this.
All pros, no cons: If you’re working on a team that doesn’t want to use knitr and Rmarkdown, no matter. Your team members might gaze at seemingly strange comments in your R scripts, but they can run, read, edit and pipe your code as if it was their own. You can even compile their code into reports. This will essentially just separate code from output and plots printed to the console. It might not be the prettiest, but it sure beats saving off graphics and results and copying and pasting into slides somewhere. And I find it’s easier to find your chart, finding, or what-have-you in a compiled document than within a script where you have to run code, dependencies and likely muddle up the current environment in which you’re working.
Rendered report in the flesh
All the features I’m used to using with Rmarkdown documents worked when embedded in native R scripts.
This is perhaps not a great example of how a typical R script would look. A typical R script/document would probably have significantly more code and less comments. However, I know how code appears in a report – my purpose is really to test the markdown functionality.