One Long Introduction
Currently I’m working on Question and Answering system based on BERT.
Some online resource for it:
Since it’s announcement and open-sourcing, many optimizations and variations appeared. There are smaller but faster versions optimized for memory footprint and speed for mobile devices. Also, after some research, some variations appeared:
- Distillation, the process of pruning some nodes that don’t weigh significantly in the process of prediction making, and
- Shrinking, the process of moving weights from 32-bit floats to 16-bit floats. Less memory consumption but less accurate results 🙁
As a result, following variations appeared, among others:
They come with wide variations in the training method and chosen meta parameters, vocabulary size, and used separator, among other differences.
Another distinguishing method among different models is the dataset on which they are pre-trained. They need to be pre-trained for needed specific tasks: Question and Answering, Language Translation, Text Summarization, Sentiment Analysis are just a few of them.
For Question and Answering, default dataset for pre-training is SQUAD, and in later time, SQUAD-2
but there are other models pre-trained with some QA datasets in addition to SQUAD:
On a top of it, there are two versions of SQUAD:
- Version 1.1 that is pretrained on questions and existing answers that can be found in availabl text, and
- Version 2.0 that is pretrained on the same dataset as SQUAD v.1.1, enhanced with questions that are labeled that have no answers in the texts provided for them.
As a result, two different model behaviors exists. Models trained with SQUAD 1.1 provides good and quality answers in a cases where answer can be found, verified by humas humans. Side-effect is that they can find answer on any question in any provided text, with small probability. As opposite to this, models trained with SQUAD 2.0 dataset are good in resolving situations when there is no answer in the provided text. But, found answers are in general with lower quality than those found by models trained with SQUAD 1.1.
So, there are a lot of choices that should be made when using BERT for question and answering. Reading about them is useull, but not sufficient when you try to implement them over your text corpus that will be used for finding answers on user-provided questions.
The best way to decide which model to use if of course, to try try all of them over your text corpus.
This is a serioush OO and OPS situation. Looks very time and resources consuming, isn’t it?
But saviors are today’s AI heroes of mine.
HuggingFace is a company with two great accomplishments for all of those that want to incorporate AI in their software sollutions:
- It gives access to a wide variety of pre-trained models
- Unified programming interface for all the models of the same class. There can be 30 different models for example, for QA based on BERT. All of them can be used with the same code, changing only the name of the model Model is automatically downloaded from HuggingFace CDN and cached on your computer.
Example for availabe pretrained BERT based models for QA:
I tried to count them and I didn’t have time enough 🙂
Now, how to do the analysis in the most efficient way?
The answer is my second AI hero.
My another post about Streamlit:
When all of the necessary choices are made, putting them into production is a job for software engineerings and UI experts. Your job is done. At least since somebody wants some improvements.