R and Pharmaceutical Data Analysis- Top packages for clinical trial data and predictive modeling

Insights on R Package Quality and Validation for Clinical Trials

New Business Developer for Life Sciences at Appsilon

06 September 2023

Moving away from proprietary languages, Roche has made a notable decision to freeze their legacy macros library. With great enthusiasm, they now embrace R as the primary framework for evidence generation in late-stage clinical trials, and they remain open to exploring additional open-source languages in this evolving landscape. James Black, a senior member of Roche’s medical affairs department, highlights this pivotal moment and stresses its importance in assisting with New Drug Application (NDA) submissions. Some open source NDA’s that use R for submissions have been disclosed.

This is just one indicator from the pharmaceutical industry, openly embracing open source software in lieu of traditional, harder-to-access proprietary technology. In the post below, we’ll discuss extensively the types of open-source R tools and packages that are currently available for quality assurance and validation.

Table of Contents:

Do We Have to Validate R Packages for Regulatory Submissions?
R Packages in Clinical Development

Do We Have to Validate R Packages for Regulatory Submissions?

FDA Statistical Software Clarifying Statement

Regarding the use of software for statistical analysis of clinical data, the FDA has clarified that they do not require use of any specific software and statistical software is not explicitly discussed in Title 21 of the Code of Federal Regulations [e.g., in 21CFR part 11]. However, the software package(s) used for statistical analyses should be fully documented in the submission, including version and build identification.

Besides, as noted in the FDA guidance, E9 Statistical Principles for Clinical Trials “The computer software used for data management and statistical analysis should be reliable, and documentation of appropriate software testing procedures should be available.” Sponsors are encouraged to consult with FDA review teams and especially with FDA statisticians regarding the choice and suitability of statistical software packages at an early stage in the product development process.

A Guidance Document for the Use of R in Regulated Clinical Trial Environments from the R Foundation for Statistical Computing

The use of statistical software for the analysis and presentation of data collected in the course of these regulated activities is itself regulated, to varying levels. There are several documents that are relevant to this particular domain, which we present below.

Documents Collectively Referred to as GxP:

Principal Software Guidance Documents:

Principal Statistical Guideline Documents:

The overall purpose of the guidance document for the Use of R in Regulated Clinical Trial Environments is to demonstrate that R, when used in a qualified fashion, can support the appropriate regulatory requirements for validated systems, thus ensuring that resulting electronic records are trustworthy, reliable and generally equivalent to paper records.

The FDA recognizes that software used in regulated environments, such as in the medical device or pharmaceutical industry, must be developed, tested, and maintained in a controlled manner to ensure its reliability, safety, and effectiveness. Software validation is accomplished through a series of activities and tasks that are planned and executed at various stages of the Software Development Life Cycle (SDLC). The SDLC provides a systematic and structured approach to developing and maintaining software systems.

The SDLC encompasses various stages, which are:

Operational overview
Source code management
Testing and validation
Release cycles
Availability of Current and Historical Archive Versions
Maintenance, Support and Retirement
Qualified Personnel
Physical and Logical Security
Disaster recovery

By following the SDLC, organizations can implement proper controls, documentation, and testing procedures throughout the software development process. This helps to identify and mitigate risks, ensure traceability, and facilitate the validation of software systems to meet regulatory requirements.

R Packages in Clinical Development

Available Public Packages

When it comes to public packages in a clinical setting, two key factors require attention.

Establishing stable repositories within the organization is crucial. By maintaining reliable and controlled repositories, healthcare institutions can ensure the availability and accessibility of approved packages for clinical operations. This helps in maintaining consistency and reliability in data analysis and reporting.
Users must diligently document the packages used for clinical reporting. Accurate and comprehensive documentation enables transparency, traceability, and reproducibility in clinical research, facilitating regulatory compliance.

Posit Packages

In 2020, Posit, in collaboration with the R community and pharmaceutical organizations, took a significant step towards enhancing validation practices. They released comprehensive validation guidance documents specifically targeting popular packages like tidyverse, tidymodels, r-lib, gt, shiny, and rmarkdown.

Open Source Package Development & Maintenance

The maintenance of open-source brings a number of advantages. One significant benefit is the transparency and accessibility of information, which is often concealed when dealing with proprietary software.

This openness proves to be particularly valuable for Quality Teams as they can delve into the inner workings of packages, gaining insights that aid in the assessment of quality and regulatory compliance.

For quality professionals, tools such as GitHub are incredibly useful to verify R package version control and the maintenance done for those packages.

CRAN

Getting a package approved and added to the Comprehensive R Archive Network (CRAN) is an accomplishment that signifies its quality and adherence to stringent standards. CRAN serves as a central repository for R packages, ensuring their availability and reliability to users worldwide.

Automatic Tests

The process of getting a package added to CRAN involves passing various evaluations and tests. R packages need to undergo the R CMD CHECK process, which involves running a series of tests to ensure their functionality and compatibility. This examination verifies that the package meets CRAN’s requirements and guidelines, covering aspects such as code quality, documentation, and adherence to best practices.

We’ve placed a sample chart for the ggplot2 package below; you can find the CRAN package check results for ggplot2 here. The ggplot2 package has successfully passed all the necessary evaluations and tests, earning its place on CRAN.

automated ggplot2 check for clinical trial project

Image source

Author Tests

Package authors themselves often include their own tests to ensure the proper functioning of their packages. These custom tests serve as an extra layer of validation, ensuring that the package’s functions work as intended in various scenarios. These additional tests are part of the package source. More information can be found in Hadley Wickham and Jennifer Bryan’s R Packages (2e).

Guiding Validation Efforts: The Validation Hub’s Risk-Based Approach

When it comes to package selection for statistical programmers in pharmaceutical organizations, the available packages are extensive. The choice of additional packages is often guided by the Risk-Based Approach provided by the R Validation Hub. The guidance offered by the R Validation Hub ensures that the selected packages align with the risk management strategies and compliance requirements of the pharmaceutical industry. More can be found in their white paper.

Validated R Package Repository

The development of a validated R package Repository is underway by the R Validation Hub. The efforts for this initiative can be followed on their website.

The Pharmaverse

The Pharmaverse is collaborative effort by professional statistical programmers and developers at esteemed pharmaceutical companies like Roche, GSK, Johnson&Johnson, Merck, and others.

One notable aspect of the Pharmaverse is its publicly maintained packages, which undergo rigorous testing and code coverage. More information can be found on this blogpost. As an example, the test coverage and code for the admiral package is available. This transparency ensures that the packages are of high quality and can be trusted for critical clinical reporting workflows.

Building a Secure and Controlled Internal Package Repository for Clinical Trials

The first step towards building an internal repository is establishing a baseline of packages that have undergone rigorous validation and meet the organization’s quality standards. The risk-based approach and the aforementioned open source repositories provide valuable guidance in selecting these validated packages.

Leveraging the Posit Package Manager

To streamline the process of curating and managing the internal repository, organizations can turn to the Posit Package Manager. This package manager facilitates internal repository creation for clinical trial packages which are specifically selected by Quality and IT for clinical trials. The steps to achieve this are outlined in this GitHub repository.

Benefits of an Internal Repository

Having an internal repository of validated packages brings several advantages to pharmaceutical organizations engaged in clinical trials. Firstly, it provides a centralized and secure source for packages, eliminating the need to rely on external repositories that may introduce unforeseen risks or inconsistencies. Secondly, it enables seamless collaboration and knowledge sharing among statistical programmers and researchers within the organization.

Maintaining Control and Compliance

Establishing an internal repository goes hand in hand with maintaining control and compliance throughout the clinical trial process. The internal repository ensures that all packages used in the trials have undergone a comprehensive validation process and adhere to the organization’s quality standards.

Streamlining Package Management with renv for Statistical Programmers

The R package renv is an essential companion package that seamlessly integrates with industry-standard IT practices for effective package management. By utilizing renv::init(), statistical programmers can create a file called renv.lock, which captures the precise state of their project’s library at a specific moment. This renv.lock file serves as a comprehensive record, meticulously documenting repositories and package versions, and can even be versioned with Git for enhanced control and collaboration. As an example, here is the R FDA Pilot Submission renv.lock.

Testing R Functions: Essential Tools for Internal Package Validation

In the previous sections, we discussed the importance of selecting validated packages and establishing an internal repository for clinical trials. Here, we will explore key tools that aid in the testing and validation of R functions within these internal packages.

thevalidatoR

One of the essential tools for testing R functions is thevalidatoR. This open-source package, available on GitHub provides a comprehensive framework for validating R functions. It provides an R Package Validation Report. An example for the report generated by this package is shown below.

Image source

valtools

Another valuable resource for testing R functions is valtools. Developed by the Pharmaceutical Users Software Exchange (PhUSE), valtools offers a collection of utilities and functions specifically designed for the validation of R packages. The most common usage for valtools is to add elements required for validation under the R Package Validation Framework and facilitating validation.

testthat

testthat is the most popular unit testing package for R and is used by thousands of CRAN packages. The testthat package, developed by the Posit team, promotes a test-driven development approach, ensuring that functions are thoroughly tested before being integrated into the internal packages.

Ensuring Reproducibility in Shiny Applications

Shiny apps have revolutionized the way we develop interactive web applications for data visualization and analysis. However, with their increasing complexity, ensuring their testability and reproducibility becomes paramount.

In this section, we will explore a set of powerful tools and frameworks that facilitate testing and reproducibility in Shiny apps, enabling developers to build robust and reliable applications.

Rhino

Rhino, developed by Appsilon, is an opinionated framework with a focus on software engineering practices and development tools for building production-grade Shiny apps.

Rhino provides support in 3 main areas:

Clear code: scalable app architecture, modularization based on Box and Shiny modules.
Quality: unit tests, E2E tests with Cypress, logging and monitoring, linting.
Automation: project startup, CI with GitHub Actions, dependency management with renv, configuration management with config, Sass and JavaScript bundling with ES6 support via Node.js.

The latest version of Rhino (1.5.0) adds addins to your RStudio IDE for convenient and efficient R programming workflows.

shinytest

shinytest, developed by Posit, provides a convenient simulation of a Shiny app that you can control in order to automate testing.

shinyValidator

shinyValidator is an open-source package specifically designed for testing Shiny apps. This package aims at automating the audit of a Shiny app project’s quality, particularly required during a validation/qualification process. All results are gathered in an HTML report uploaded and available to everyone on any CI/CD platform or Posit Connect.

golem

golem is an opinionated framework for building production-grade Shiny applications. It simplifies the creation, development and deployment of a Shiny application as a package.

shinymeta

The shinymeta R package offers convenient tools to capture the underlying logic of a Shiny app and convert it into executable code outside of the Shiny environment, such as the R console. Additionally, it facilitates bundling the code and its associated results, making it easier to share with end users.

TEAL

teal is a Shiny-based interactive exploration framework for analyzing data.

By adopting these tools, developers can streamline their testing processes, identify and resolve issues, and build robust Shiny apps that meet the highest standards of quality and reproducibility.

Alternative Approaches

Alternatively, some companies may opt to purchase validation documentation for the available open source packages. There are some vendors who offer validation tests and documentation as well as validation support services.

An example is Atorus, which offers OpenVal, a subscription based approach that contains a repository of nearly 200 validated packages.

Concluding Remarks on R Packages for Clinical Trials

Ensuring the reliability, reproducibility, and validation of R packages and Shiny applications is paramount in the clinical research domain. By following a risk-based approach and utilizing resources like the R Validation Hub, Pharmaverse, and internal package repositories, organizations can establish a robust foundation of validated packages for their clinical trials.

Open-source tools such as thevalidatoR, testthat, and valtools contribute to R package validation, while packages such as Rhino, shinytest, and shinyValidator contribute to reproducibility of Shiny applications.

By adopting these best practices, pharmaceutical organizations can confidently leverage the power of R and Shiny for efficient and compliant clinical reporting workflows, ensuring the integrity and accuracy of their research findings.

Contact Us

Ismael Rodriguez

Life Sciences Innovation Lead

Insights on R Package Quality and Validation for Clinical Trials

Do We Have to Validate R Packages for Regulatory Submissions?

FDA Statistical Software Clarifying Statement