AdapterHub: New Kid on the Block for Transfer Learning

nlp, transformers and adapter

Image by Наталия Когут from Pixabay


First, there was a BERT, at least all of this started with it. It is a general-purpose language model, once pre-trained for different languages and in different size variations.

If you wanted to use it for some specific task, you should fine-tune it in a way that you take the already trained and widely available BERT, and expose it on training with some dataset that serves for your purposes. Q&A, text classification, NER, etc. As a result, what you get is a model with capabilities of original BERT, but when it is used for your specific purposes, it works well, even with a small amount of training data. And that model would have the same size as the original BERT or just a little bigger.

Fine-tuning means that you take original BERT, freeze the significant part of it, and expose only the last layer(s) to modifications that come with additional training using your data. This process is also known as transfer learning.


HuggingFace entered into the game here, establishing a platform for sharing NLP models pre-trained for a wide variety of use-cases. For each use-case, there are more than a dozen of available models that you can use with the same API, by merely changing the name of the model on HuggingFace’s site.

Now, let’s imagine a situation where you have a need for several BERT based models, each of them pre-trained for a different task. Q&A, intent detection, text summarization and Named Entity Recognition a.k.a. NER can easly be needed for the scope of one NLP based project.

And the new kid on the block: AdapterHub

Why would you keep a bit amount of storage space reserved for different models? BERT Large weights at about 3.2 GB of disk space. Remember that there is a very tinny difference between the original BERT and fine-tuned ones.

Here comes the new kid on the NLP block in the game:


The idea behind it to keep fine-tunings for different tasks, but only the differences from the original NLP model, BERT, for example. Therefore, for various NLP tasks based on BERT, you’ll be going to need only the original BERT copy and several small files that represent the differences from the original after fine-tuning.

List of available fine-tunings can be found on the following page:


And it is amazing 🙂

Next time when I’ll need some fine-tuned BERT model, this is the first place where I’ll look for it.



Leave a Reply

Your email address will not be published. Required fields are marked *

9 − 2 =