First, we created Neural Networks with neurons as building blocks to analogy the neurons in the human brain.
Then, we started to think that we can figure out what is going on in the human brain, analyzing the behavior of the artificial neural networks.
What arrogance: we are not nearly close to the behavior of the neurons in the human brain and eons far away from figuring out what is going on in the human brains.
Neuroscience is on the fast track these days/months:
AI researchers publish theory to explain how deep learning actually works – SiliconANGLE
Nevertheless, we experienced several “surprises” observing even our artificial creatures. The ability to preserve the accuracy, with 50% of the neurons of the fully trained neural network is one of them:
Poor Man’s BERT – Exploring layer pruning | dair.ai (medium.com)
In addition to this: “Neural network pruning techniques can reduce the parameter counts of trained networks by over 90%”:
[1803.03635] The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks (arxiv.org)
But, now, guess what: it’s officialized that we don’t understand in depth how our own creatures work, even with such “surprises”.
Recently I read this article: AI researchers publish theory to explain how deep learning actually works – SiliconANGLE
“Artificial intelligence researchers from Facebook Inc., Princeton University, and the Massachusetts Institute of Technology have teamed up to publish a new manuscript that they say offers a theoretical framework describing for the first time how deep neural networks actually work.”
They compare the discovery of neural networks to the discovery of the steam engine several centuries ago. The humans built the first steam engine without fully understanding how it works. Consequently, to make any adjustment, the only way is a trial and error way.
The same situation we have these days currently with neural networks.
There are a lot of hyperparameters that we have to adjust to execute successful and optimal training properly
I know this by my own experience, trying to train BERT language model in Macedonian properly.
With a given training set, I successfully trained the smaller model with a lower number of neurons (hidden layers, their sizes, number of attention heads, and so on).
And, led by my own “intuition”, I thought that exposing the training set to a bigger model will lead to a “smarter” model. What a misbelief. The bigger model, richer with neurons, simply didn’t want to learn, no matter the learning rate and the used optimizer. After more than 250 experiments with grid-like hyperparameter experiments, I draw a fine border between the hyperparameter values for the models that agree to learn and the bigger ones that didn’t obey to learn at all. The intriguing part is here is: why the bigger models didn’t want to learn. As for now, I have no idea.
Maybe, after reading carefully the following article, where it is claimed that the behavior of the neural networks is explained, I’ll know better.