GPT-2 was a great success. OpenAI didn’t want to publish the most enormous and mightiest version, with 1.5B parameters. At least, claiming that they were afraid of misusing it for less ethical purposes. Lately, they claimed that they didn’t found shreds of evidence of such.
All of this is legit, considering the volume of the false “news” generated using it. And the truth is that it can be very successful in developing false news/stories. I tried this, played with it for a while, and, well, at least impressed.
Then GPT-3 come. With more than a hundred times more parameters. And how people reacted? It appeared that the people’s expectations grew at least with O(N), N == # of parameters. Such a disappointment in the people’s ability to figure out what is going on in that black box called GPT-3.
There is absolutely no doubt that GPT-3 is a remarkable technical, engineering, and scientific achievement. Still, there is one large BUT here: the technology that is used. No matter how significant the number of parameters is, the technology that drives those parameters is all that counts. And the truth is that that technology is nothing more but highly sophisticated statistics applied over the input sequence of words to produce the most probable next few (up to a thousand) following words. And the parameters for that statistics are obtained by exposing GPT-3 as a tabula rasa to learn something from the texts the creators revealed to him during the training process.
Therefore, ladies and gentlemen, the truth is that GPT-3 doesn’t understand what it is talking about. Again, it is all around producing the most probable continuation of the given sequence of words. And, the important thing here is that it takes words simply as numbers and generates the following numbers in return. Simply put: there is not a single clue of understanding. It’s all about statistics applied over a massive amount of data very quickly, and that’s it.
Now, one point more to be noted: this massive amount of data is probably the most extensive text corpora ever collected by human beings.
I’ve been playing for a while with GPT-2. From time to time, it speaks (ok, generates) Java and JavaScrpt code, standardized weblogs from a web server, and so on. Even pornography from time to time. Even base64 encoded texts. That means that the trainers exposed it to such kind of data, it couldn’t generate such staff without learning it from somewhere. Talking about GPT-3, this text corpora is probably more extensive at a scale. But, again, the technology behind the one inside that black box remains the same: pure statistics.
Consequently, the texts generated by both GPT-2 and GPT-3 are simply mirrors of the texts the trainers exposed them to during the training process. This fact has one remarkable consequence: both GPT-3 and GPT-3 mirror the digitalized content found over and there, all around the web. Consequently, all the human biases, misconceptions, misunderstandings, misbeliefs, and so on are simply injected into the neurons of the models. Now, let’s think about the volume of the accurate and scientifically proved, validated content on the Internet versus all of t the so-called “I know the truth”. Consider this for a moment, and you’ll have a profound insight of what is inside the GPT-2. And that is even worse for GPT-3. Because GPT-2 is overpopulated with irrelevant text data, remember Java code and weblogs? And, GPT-3 is much more overpopulated with irrelevant texts, simply because the more you want to put in, the less possibility you have to control what you are putting inside. In other words, just take literally speaking everything you can find, put it into the number-crunching machine, and train the model.
Whatever nonsense you get from GPT-3 is simply there because there are much more nonsenses than scientifically meaningful texts. Hey, the number of scientists and professionals is a couple of orders of magnitude lower than the number of YouTube watchers that grabbed their relevancy by reading posts on Facebook and Twitter. Ok, there are scientists on Facebook and Twitter, I’m following several of them by myself, but the volume of the available text is the thing that counts here.
Several decades ago, humans believed that widespread ignorance is based on hardly reachable quality information. These days, we know that it wasn’t. And, GPT-3 simply reflects that. It is nothing more but a statistically averaged human being. It’s us, and we have to cope with it.
To be honest, I couldn’t believe when I read how much “scientific” researches is conducted to prove that GPT-3 “doesn’t reason”. Such a perfect example of wasting precious Ph.D. hours. They could just learn how GPT- works, and they would figure out quickly what they can expect from it.
All of this is well elaborated in a politically correct voice in the following Yannic Kilcher video:
Hey, there is still a chance to find some quality out there in the wild 🙂
Let’s cheer for that.