Object-oriented programming (OOP) is a popular and widely embraced programming paradigm in software development. The concept of object-oriented programming in R has been previously featured in one of our blog posts, specifically within the context of R6 classes.
In this blog post, we will dive deeper into the world of object-oriented programming, understand why it’s a valuable approach worthy of adoption, and its implementation in R.
Table of Contents
- What Was the Motivation Behind Introducing OOP in R?
- Why Do Developers Need OOP?
- What is OOP?
- OOP Systems in R
- Conclusion
What Was the Motivation Behind Introducing OOP in R?
The R Foundation describes R as a language and environment for statistical computing and graphics. It originated from the S language and environment, which was developed at Bell Laboratories.
The days of the S language is where we will start our tour on the history of OOP in R. S allowed users to use different kinds of statistical models. Even though statistical models can be different, they share a common set of operations, such as printing, making predictions, plotting, or updating the model.
Uniform functions were introduced to make it easier for users to interact with those models. Good examples of such uniform functions are print()
, predict()
, plot()
, update()
. These functions can be invoked on any model, irrespective of whether it’s a linear regression model or a time series model like ARIMA (Autoregressive Integrated Moving Average) working under the hood.
Interested in elevating your coding with Functional Programming in R? Check out our introductory article; ‘Unlocking the Power of Functional Programming in R.
Why Do Developers Need OOP?
A uniform interface simplifies interactions for users of code that employs OOP principles, but what drives our choice to utilize OOP in our own code?
Let’s go back to the example of having different models and the uniform print function. Without using OOP, we might implement this function as:
print <- function(x) {
if (inherits(x, "lm")) {
# print linear model
} else if (inherits(x, "Arima")) {
# print arima model
}
}
While this might not look that bad, imagine how long that function would be if it supported printing every available model in R.
Another issue with this approach is that only the author of the function can add new types there. This reduces flexibility as developers who would want print to support their own classes, would need to reach out to the author of the print function.
Object-oriented programming allows us to have a separate implementation of the print function for each of our classes. Some of you might recognize that this example is similar to how it would look written in the S3 OOP; we will be diving deeper into S3 in subsequent posts.
print <- function(x, ...) {
# Generic function
}
print.lm <- function(x, ...) {
# print linear model
}
print.Arima <- function(x, ...) {
# print arima model
}
Much better! Our code is now:
- More modular – instead of one big function, we have multiple smaller functions, which improves readability and can make testing easier.
- Flexible – Potentially, other users can now add their own print functions without having to modify any existing ones.
This section was inspired by Hadley Wickham’s talk: An Introduction to R7.
What is OOP?
Now, we have an idea of why we might want to use object-oriented programming, but have not yet defined what it is:
Object-oriented programming is a programming paradigm where we identify the following principles: Encapsulation, Polymorphism, Abstraction, and Inheritance.
In the article, we already had a chance to see encapsulation, polymorphism, and abstraction in action:
- Polymorphism allows us to perform the same action in different ways (call the print function but call it on different models).
- Encapsulation allows us to not worry about the internal details of the object when interacting with the object (e.g. how coefficients are stored in our linear model).
- Abstraction allows us to not worry about the internal implementation details of the object (for example, what method is used for fitting the linear regression).
We will explore inheritance in more detail when diving into specific OOP systems in R, but for completeness:
- Inheritance – classes can reuse code from other classes by designing relationships (hierarchy) between them. For example, in R, the
glm
class inherits from thelm
class
Additionally, there are a couple of terms that are often used when talking about OOP (We already used some of them!)
- Classes – user-defined data types that serve as blueprints for creating objects; they define what fields or data an instance of the class contains (for example, an instance of the
lm
class has acoefficients
field which contains a named vector of coefficients) - Objects – instances of individual classes; for example, each linear regression model is a linear regression model, but they can differ from each other (for example, they can be trained on different data)
- Methods – a function associated with a given class; they describe what an object can do. For example, you can make predictions using a linear regression model.
OOP Systems in R
All right, so how do we do OOP in R? Turns out R provides different ways of doing OOP:
- S3
- S4
- Reference Classes (referred to as RC or sometimes R5)
- R6
Interested in experiencing R6 Classes in action while designing a video game in R Shiny? Check out our article, How to Build a Video Game in R Shiny with CSS, JavaScript, and R6 Classes.
On top of that, there is a new OOP being developed called S7 (previously also called R7), and there are also other packages R packages providing ways of doing OOP in R including:
Some packages also defined their own OOP systems; for example, torch defined its own OOP system called R7 (not to be confused with R7 developed by the R Consortium, which is now called S7).
Each of those has its own advantages and disadvantages that we will be exploring in subsequent articles, so stay tuned.
Conclusion
The first appearance of OOP in R comes from the S language. Object-oriented programming was used in S to provide a common set of functions for interacting with statistical models. OOP makes it easy to provide end users with a uniform interface to a family of different classes (e.g. different statistical models).
OOP provides developers with flexibility and allows their code to be more modular. There are multiple ways of doing OOP in R, and more are being developed. We’ll dive deeper into object-oriented programming in R; stay tuned for our next article in this series.
Have questions about Object-Oriented Programming (OOP) in R or need support with your enterprise R/Shiny project? Feel free to reach out to us for assistance!