Join the R Community at ShinyConf 2023

timm with fastai – The Largest Computer Vision Models Library


What is timm?

Timm stands for pyTorch IMage Models. Ross Wightman created the Python library in 2019, with the purpose of collecting state-of-the-art image classification models from the latest papers. He would then implement and train them, in order to create the largest library of computer vision models enabling users to quickly check and compare their results in practice.

At the time of publishing this post, there are over 450 pre-trained model variations in the timm library. And new ones are added almost every month.

You can install timm easily using pip:

pip install timm

Then you can list all available models using list_models() function:

 

How to use timm in practice?

Being offered access to such a large collection of computer vision models, you can be overwhelmed by the vast amount of research and work that went into their creation. But it is a unique opportunity to be able to leverage the accessibility of those models and use them for your own purposes!

Which model architecture should you choose for your use case? There are often many good answers. And too many peculiarities to take into account. Timm offers a helping hand:

  1. This table comparing results and basic parameters can be helpful to get a rough idea of what is worth trying
  2. Timm contains implemented and pre-trained models and allows for a quick change between models, once you set any of them up (even the newest Vision Transformers models!)

Another resource, useful on such an occasion is the table from PapersWithCode.

Pre-trained model library

Timm’s huge pre-trained model library is a wonderful thing.  The library can be easily adopted into fastai training code.

In fact, it can be used as a classic transfer learning application. Reusing a neural network pre-trained for a task on a new, updated version. Typically, this is done by freezing the “body” of such a model, and training only its last few layers called the “head.” This way, after a much shorter training time – compared to the time-consuming initial training – the model is trained to perform well on the new task.

Using fastai, one can quickly prepare the dataset, plan the training stage and simply replace the timm’s model names in the customized learning. This detaches the “head” and freezes the rest of the model’s “body.”

In doing so, we don’t have to limit ourselves to one model. We can now build an ensemble of models to test and compare on our given task or dataset.

In the next section, I’ll show you how to do this quickly using the fastai module.

Transfer learning using timm and fastai

As a transfer learning example, I chose the image classification problem with the ‘Flower’ dataset from the fastai datasets library. The library contains 102 classes, with around 10 images for each class (English flower species).

With a single line, you can download any dataset from the fastai library:

 

Here’s an example of the dataset images:

 

Adapting and fine-tuning

The next steps are to adapt a given pre-trained model and fine-tune it to your own task/dataset:

Step 1) Define a timm body of a neural network model.

 

Step 2) Define timm with a body and a head.

 

Step 3) Define a timm learner.

 

Step 4) Create the learner.

As an example, here we create a learner based on rexnet_100, with Neptune tracking. Stay tuned to the Appsilon blog for an article on Neptune.

 

Step 5) Train the model.

 

Step 6) Check the model learning process.

You can plot the loss from the training and validation stages:

 

The code used above is inspired by the Walk with fastai blog. I recommend checking the site for more useful content.

Testing pytorch image models (timm)

At this stage, we are ready to make use of timm’s power. Since we already created and trained a model (ReXNet 100) from the module, we can now easily test others as well!

Let’s try two other models, also from 2020. I chose these because they were as light as possible (having a small number of parameters) and they gave decent results on ImageNet.

So far we’ve used:

  • ReXNet 100 with 5 million parameters, 18.5 MB
  • RegNetY with 3 million parameters, 12 MB
  • TF EfficientNet Lite with 5 million parameters, 18 MB

I trained all three in the same manner – 12 epochs with the same learning rate values. The only thing I had to change was the architecture’s name in the learner part.

'learn_regnety = timm_learner(...)
'learn_tf_efficient_lite = timm_learner(...)'

Comparing trained models

Depending on the metric we want to optimize, we can now compare and contrast the trained models. Here, I compare their validation loss, error rate, and accuracy. Each time, highlighting the model with the highest value of the given metric.

 

On the above plots, the largest values are represented by the most intense colors. At first glance, the best results, lowest losses/errors, and largest accuracy were reached by the smallest yet most efficient architecture – RegNetY. In this way, using timm pre-trained models, we can easily test and compare multiple architectures and find the best suited for a given computer vision problem.

Conclusion

When encountering a computer vision task that requires fast, but concrete resolution, consider using timm pre-trained models. Check out their state-of-the-art implementations and solve more with timm!

Gists of code used in the blog: