You simply couldn’t miss the GPT-3 hype. Everybody loves it. From the very first day when it was “shown”. “Shown” in quotes because the number of the people that were able to play with it was, and still is, very constrained. I gained access to it a couple of months ago.
Some Usability Analysis
In short, it is excellent. But, how useful is it?
Currently, the model is hosted on MS Azure cloud and licensed to MS only. MS paid to OpenAI 1B$. Do you think they know how to use it and make some money on it? I don’t think so.
One of the first “use-cases” was automatically generating code. I watched few demos to produce an Html layout automatically when the user describes the requirements. The videos were impressive, as a showcase what is the current state-of-the-art of AI. It looked like science fiction just a year ago, and the technology is not advanced enough if it doesn’t appear as magic to the users, doesn’t it?
So, the technology was magical and indeed still is.
But the 1M$ question is: how useful it is, in reality?
Not much. GPT-3 actually has no idea what it is talking about. It doesn’t “understand.” According to the given input and the training data he was exposed to, all it does so successfully is producing the text as the most probable one, according to the used training text corpora. That’s all.
Now, does this, the most probable generated text under given circumstances, be reliable? The obvious answer is: NO.
Can anybody put into production a software code written by GPT-3 without human supervision? The obvious answer is: NO.
The one, and still the only valid use-case for GPT-3 is to help content creators, a.k.a. writers, to be more productive. They can do it by asking GPT-3 for several continuations of the given text, choosing the most proper one according to their needs, and polishing it. Several websites appeared in the last couple of months offering the same functionality. They are not free, some of them are not that cheap as well, and the users are paying for “credits”, i.e., the number of automatically generated texts. Please keep in mind that users/writers will most probably have to ask for several inferences to be able to choose the relevant one according to the main idea of the text they are writing. Each generated text means one credit spent, no matter if it was helpful or not.
Maybe I’m too demanding from the technology with so much generated hype. But, in my humble opinion, not that shining in reality when we are extensively trying to make something useful with it. I wouldn’t like to see more similar cases in the future.
That was the summarization of my impressions with GPT-3. Now, let’s move towards the point of the title.
The Analogy With the Electricity
Let’s suppose you have an idea to make a new tool that you would like to offer to the public audience and eventually make some money.
Will it use electricity? Of course! Everybody does this these days; everybody’s new tools use electricity these days. So, it will run on electricity. How will you implement the powering it with electricity, i.e., wherefrom it will take it? By the public power grid, of course. Why? Simply because you have no other option. If you decide that your tool will use electricity, there is only one reliable source to it, and that’s the electricity provided by the public power grid. And, that electricity is produced by the BIG electricity power plant, whatever kind of, they are: nuclear power plant, giant wind turbines, thermo-electrical plant, etc. Can you serve that electricity by yourself? Nope. You can think of offering solar power supply by a proprietary source, but it will never be as robust and reliable as the public power grid.
Starting to Explain the Disappointment
That’s exactly the situation with using automatically generated text by GPT-3 in your application. Only the big players can offer such a vast and powerful AI model to the public: Google, Microsoft, etc. But, you can’t unless you own a company with several hundreds of billion dollars as a market capitalization. And, statistically, the odds are that you don’t.
Now, things are going even worse, in the case of GPT-3.
Do you have an instruction manual on how to use it? Nope. It’s a black box for implementing anything in some planned direction.
And now, even worse: You can’t fine-tune it to some specific downstream NLP task that would serve precisely to your needs. With BERT-like and GPT-2, it is not that easy since you still need a lot of computation power, and when you don’t have a grid of connected GPUs with 32-48 GB of video RAM, you would need several weeks. But, not impossible at all.
Now, thinking of fine-tuning GPT-3 or any other AI model with 175B parameters? No way. Simply forget it.
Finally, the Point!
So, the final summarization:
0 You want something cool and trendy AI feature in your app. Everybody has it, they are your competition, and you don’t want to lag behind it.
- You use GPT-3 as the most advanced NLP model out there. Using less means lagging behind the competition.
- You don’t know how to manage to get what you want from the model. It’s a trial and error process.
- No model can offer zero-shot learning for anything you or somebody else might need.
- You can’t fine-tune it to some specific and proprietary downstream task. It’s impossible: you don’t have the model, and nobody can afford to fine-tune it, and serve the versions per use-case.
Is this your idea on the direction where AI development should pursue to?