How to apply the MVVM pattern in reproducible research?15 Apr 2015
The penalty for a mistake on a simple structure is only a little time and maybe some embarrassment.
More Complicated structures require more careful planning.
-- Steve McConnell in Code Complete 2
If you are using Rmarkdown just to build a simple instruction or a blog post, you don't need to worry about your code structure at all. Mistakes in this kind of small projects are easy to be fixed. However, if you are asking
Rmarkdownto generate a 50-page-long paper with 10 tables and 5 figures, you would better to think of a good way to organize your codes because you don't want to get into a case in which you have to make edits to 200 different places throughout your
markdown document in order to fix a small mistake 200 times. This kind of code revision is time-consuming and you might also get a chance to make some stupid mistakes that can only increase your work load and make you feel worse. Having a good software architectural design is also very meaningful if you are collaborating with someone else. A clean code structure plays an important role in helping other people understand your codes.
People in the field of software engineering developped many design patterns for software architecture. I don't want to pretend I know a lot about them but as far as I know, I believe the most suitable design pattern for a long and complicated reproducible
Rmarkdown report is something called Model/View/ViewModel (MVVM)
What is MVVM?
The concept of MVVM was first announced by John Gossman in his blog in 2005. It is a variation of Model/View/Controller (MVC), which is one of the most important designs in software architecture. The MVVM design is widely adopted in today's website design, where the Model is the database, the View is the webpage and the View Model is the connection part that packs up data from the database and pass them to the View part. I'll provide an example of using these concepts in reproducible research in the next section but you can alway go to John's blog, which explained these concepts in a much better way, for more information.
Also, I really like Ryan Nystrom's metaphor about MVC on Quora, even though I believe his painting metaphor fits the concepts of MVVM better. In his post, he said,
Paints are the model. They have unique and similar properties to other paints. You can mix them or use them as pure as possible.
The painter is the controller. The painter performs the task using the paint and easel. The painter takes the paint from the palette and decides how to apply it to the easel.
The easel is the view. It is agnostic of the painter and paint and doesn't care how it will be used. It does, however, have qualities like size and texture that effect the outcome of the painting.
How can we use MVVM design pattern in reproducible research?
Suppose we are trying to write a reproducible report for a randomized clinical trial. This trial has 2 visits and 5 clinical measuring instruments(A1, A2,..., A5). The results for each instrument are stored in separated files and the randomization information is stored in a file called "random.txt". We are supposed to make 2 tables in the final report.
- Step 1. Model (A1.R, A2.R,..., A5.R files)
In this step, you combine the randomization file with the data file for each instrument, clean the data, and do some necessary reformatting to make sure you can have tidy data available for all the tests. You may want to export these tidy data if you have the need to share the cleaned database with someone else.
- Step 2. View Models (table1.R, table2.R, figure1.R files)
In this step, you are supposed to generate tables and figures, which are mostly ready to be printed on the paper. However, you don't need to worry about the table/figure formatting (Title, footnotes..., etc.) here. Those accessaries can be easily added on during the 3rd step. Here, for example, if table 1 need all the baseline visit data from test A1, A2 and A3, you will need to
sourceA1.R, A2.R and A3.R, merge/join these tidy data for baseline, and do appropriate analysis to generate the table. I will suggest you to put all of the files in this step in a sub-folder with the name (or nickname) of the paper, for example, "primary analysis".
- Step 3. View (primary.analysis.rmd file)
In this step, you should
sourceall the tables/figures you generated in Step2 at the very beginning of your
.rmdfile. Then you can go ahead to write the report like you usually do and print the tables/figures when necessary.
This is just an example showing how I would adopt the MVVM design into my code structure. Different people may have different understandings to the concepts of this design. Also, in different cases with differnt scales, you might want to modify this structure as you need, as long as the ultimate goal of constructing a maintainable project is achieved.