22 Oct 2015
Last year, Lee Pang wrote an amazing tutorial on how to set up a Desktop Shiny App with the portable version of R and Chrome. That post helped me a lot this year when I was trying to do a big proof-of-concept project to introduce shiny into the data management workflow in my work place. Now thanks to the success of that project, my institution is moving towards the server version of shiny and I would like to share some experiences I got during this progress in my recent posts before I say goodbye to this method.
When do you need a portable Shiny App?
After testing out portable Shiny Apps, community Shiny server and Shiny server Pro, I feel like I can share some of my opinions about these three options. Basically, you may want to consider a portable shiny app when:
1) You or your workplace demands data security
Shiny is great! However, the community version of shiny doesn't have SSL or authentication enabled. Sure, you can set up an NGINX over the top of the Shiny Server and enable SSL for that. However, to my limited knowledge, there is no easy way to set up authentication on Shiny server itself (You may set up an authentication within your Shiny App following this instruction. It might be a good way to go but it makes me feel a little uncomfortable). As a result, if you are working in an industry demanding high data security, you would better be cautious to put any data on the community version of Shiny Server.
On the other hand, the portable version, is actually pretty safe. Instead of settling on a private network, which allows everyone in your company be able to access it, it actually settles on the workstations of specific people. Someone has to log into that computer in order to use the Shiny App. Generally, using the portable version of Shiny App does neither increase nor decrease the security level.
2) You or your workplace/institution don't have enough budget to buy Shiny Server Pro
Shiny Server Pro offers solutions to address our security concern but...Man, it is very expensive. To the date this blog is written (10-23-2015), If you are using it for teaching, it's FREE. If you are doing research, it's 5K per year, and otherwise, it's 10k per year. Honestly speaking, this price is not bad at all comparing with other products on the market, like the SAS one, which I heard will take 75K annually. However, it is still expensive enough that people need to think about.
3) You have an IT-lockdown in your institution
Many of us have our IT department holding the administrator right of our workstations. As an analyst, I sure have a copy of R on my computer but it would be a painful experience to help install R on our customers' (in many cases, our colleagues') computer, unless to say someone needs to maintain it and update R and its packages to the lastest version as you are using. In this case, using portable R with portable shiny seems to be a a straight-forward solution. If you don't have an IT-lockdown in your workplace, good for you. Then you can probably follow what is explained in Lee Pang's post to call the "real" R instead of the portable R on someone else's computer.
4) You prefer not to mass around with setting up a server
Setting up a shiny server can be easy, but it can also be complicated if you want to some customizations (eg, you want to install it on a copy of redhat/centOS). Coding inside a terminal might be good for nerds (like me), but I can see it might scare away a lot of people as well.
5) Your customers don't care too much about waiting
The portable version of Shiny is using the portable R, which is only available as the 32-bit version. Also, when you try to open up the protable shiny app, it starts from starting portable R, which also takes time. In practice, the lag is noticible but still in acceptable level. The app should be ready within 5 sec after you double-click the shortcut.
6) You know how to update R and R package on other people's computer
I put this point at the very end but it is actually very important. It might sounds a little ridiculous but it is actually a very important question to think about when you are trying to design an app for a long-term project. The first thing you should keep in mind that, please don't expect other people to install/update R packages for you. I will explain the way I choose to solve this issue in another blog post.
While having a working Shiny Server Pro is still the ultimate solution, having a portable Shiny App may still work as "a solution" in many cases. It is free and safe. You don't need to worry about messing around with server and its speed is still acceptable. If you just start with Shiny and want to prove it to your senior-management team that this product worths something, I would say you should definitely check out this portable solution.
15 Apr 2015
The penalty for a mistake on a simple structure is only a little time and maybe some embarrassment.
More Complicated structures require more careful planning.
-- Steve McConnell in Code Complete 2
If you are using Rmarkdown just to build a simple instruction or a blog post, you don't need to worry about your code structure at all. Mistakes in this kind of small projects are easy to be fixed. However, if you are asking
Rmarkdownto generate a 50-page-long paper with 10 tables and 5 figures, you would better to think of a good way to organize your codes because you don't want to get into a case in which you have to make edits to 200 different places throughout your
markdown document in order to fix a small mistake 200 times. This kind of code revision is time-consuming and you might also get a chance to make some stupid mistakes that can only increase your work load and make you feel worse. Having a good software architectural design is also very meaningful if you are collaborating with someone else. A clean code structure plays an important role in helping other people understand your codes.
People in the field of software engineering developped many design patterns for software architecture. I don't want to pretend I know a lot about them but as far as I know, I believe the most suitable design pattern for a long and complicated reproducible
Rmarkdown report is something called Model/View/ViewModel (MVVM)
What is MVVM?
The concept of MVVM was first announced by John Gossman in his blog in 2005. It is a variation of Model/View/Controller (MVC), which is one of the most important designs in software architecture. The MVVM design is widely adopted in today's website design, where the Model is the database, the View is the webpage and the View Model is the connection part that packs up data from the database and pass them to the View part. I'll provide an example of using these concepts in reproducible research in the next section but you can alway go to John's blog, which explained these concepts in a much better way,
for more information.
Also, I really like Ryan Nystrom's metaphor about MVC on Quora, even though I believe his painting metaphor fits the concepts of MVVM better. In his post, he said,
Paints are the model. They have unique and similar properties to other paints. You can mix them or use them as pure as possible.
The painter is the controller. The painter performs the task using the paint and easel. The painter takes the paint from the palette and decides how to apply it to the easel.
The easel is the view. It is agnostic of the painter and paint and doesn't care how it will be used. It does, however, have qualities like size and texture that effect the outcome of the painting.
How can we use MVVM design pattern in reproducible research?
Suppose we are trying to write a reproducible report for a randomized clinical trial. This trial has 2 visits and 5 clinical measuring instruments(A1, A2,..., A5). The results for each instrument are stored in separated files and the randomization information is stored in a file called "random.txt". We are supposed to make 2 tables in the final report.
- Step 1. Model (A1.R, A2.R,..., A5.R files)
In this step, you combine the randomization file with the data file for each instrument, clean the data, and do some necessary reformatting to make sure you can have tidy data available for all the tests. You may want to export these tidy data if you have the need to share the cleaned database with someone else.
- Step 2. View Models (table1.R, table2.R, figure1.R files)
In this step, you are supposed to generate tables and figures, which are mostly ready to be printed on the paper. However, you don't need to worry about the table/figure formatting (Title, footnotes..., etc.) here. Those accessaries can be easily added on during the 3rd step. Here, for example, if table 1 need all the baseline visit data from test A1, A2 and A3, you will need to
source A1.R, A2.R and A3.R, merge/join these tidy data for baseline, and do appropriate analysis to generate the table. I will suggest you to put all of the files in this step in a sub-folder with the name (or nickname) of the paper, for example, "primary analysis".
- Step 3. View (primary.analysis.rmd file)
In this step, you should
source all the tables/figures you generated in Step2 at the very beginning of your
.rmd file. Then you can go ahead to write the report like you usually do and print the tables/figures when necessary.
This is just an example showing how I would adopt the MVVM design into my code structure. Different people may have different understandings to the concepts of this design. Also, in different cases with differnt scales, you might want to modify this structure as you need, as long as the ultimate goal of constructing a maintainable project is achieved.
15 Apr 2015
(This blog post is largely built upon the inspiration I got after reading the 2nd chapter of Code Complete 2, which is a very practical book even for someone like me, who came into this field without a computer science background. I would highly recommend any data analysts or statistians who do a lot of programming works to read it.)
To some extent, the development of Rmarkdown (and knitr) is really revolutionary for scientific researches. For hundreds of years, scientists had spent way too much their valuable time on fighting with the formatting issues in their reports. In the old days, a last-minute change on the study data can be a real disaster to ruin a researcher's life, at least for a few hours. Now, with
Rmarkdown, it's like every scientist can hire his/her own copy marker to make sure the reports are presented in a good shape. One of the key features of
Rmarkdown is that the reports generated by
Rmarkdown can update themselves based on changes made to the database. However, the question is that, for these "smart" reports, is it still appropriate to say we are "writing" them?
When we say we are "writing reports", it suggests that we are writing them like we are writing a letter or a novel. The practice of "writing" is relatively linear — you start from the beginning and finish at the end. If you want to add tables and figures, you should stop your "writing" process, go ahead to make them, add them into your report, and restart the "writing" process. If you project this process into the process of writing a
Rmarkdown file, you are expecting to see something like this:
I have two dogs.
The bigger one is a Boarder Collie.
This format might still be okay if you just want to write a short instruction or a blog post. However, imagine that you are writing a 50-page paper with 20+ tables and 10+ figures, how's your life going to be when you want to re-read your source
.rmd file if you threw everything into one basket? In such a case, the readability of the original
.rmd file is so low that it is very difficult to maintain it. Needless to say what if your analytical codes produce an error message when you are tring to generate the file. It's going to be a nightmare to find out where the bug is and to fix it. After all, it goes against the design philosophy of the
markdown language, which is to maximize the readability of the source codes. Texts mixed with page-long analytical codes are not really readable.
Now, let's take a lesson from the workflow of architects. When an architect is trying to build a house, first of all, (s)he would draw out the blue print of that project so he knows what is expected to be done. If (s)he wants to install cabinets into the kitchen, (s)he should either hire someone to build them or buy them directly from the furniture store. Then, on someday, some guys will move those cabinets into the house and put them on. If we adopt the architect's philosophy into our case, we will have a "blue print" in our mind, make the tables (cabinets) somewhere else, import them into our
.rmd file (house) and we will only call those tables when we need them (final installation). Like this:
I have two dogs.
The bigger one is a Boarder Collie.
See, a lot better, right?
03 Apr 2015
REDCap, as says on its website, is a "mature and secure web application for building and managing online surveys and databases". It is used in many clinical researches as the database and the database management tool. One of the benefits of combining the usage of REDCap and R is that REDCap has an API (application programming interface), which can be used to easily export and import data into R. In this way, it can reduce the burden of data transformation and reduce possible human mistakes. It allows us to streamline the whole process of data collection, data cleaning and data analysis. In fact, there have been quite a few people who practiced this methodology. There is a R package called REDCapR on CRAN and here is its github repo. Thanks to their efforts, the process of importing data from the REDCap API is a lot easier than it used to be. There are some other interesting materials that might be helpful. I'm listing them here: (Slide)Using the API through R to automate Redcap exportsandR Tip – Directly Access the REDCap API from R.
What I'm trying to say in this blog post is that after we import the real-time-updated data from the REDCap API, we can use "R shiny" to maximize the benefits of having a streamline process. With R shiny, we can build a project website (better be an internal website for clinical research), which can display the real-time study enrollment information and some basic demographic information as the enrollment continues. REDCap itself has some basic data analytic tools to do some basic analyses but those tool are preprogramed and cannot fit in all the needs. The combination of REDCap API and R shiny allows statistical programmers and statisticians to build customized displaying panels, which can accurately demonstrate to investigators what they need to know during the study.
I'm going to pilot this method in a study I'm recently working on. I'll provide an update on this post later this year based on my practice.