It seemed to me that Google, Facebook, and OpenAI are the only ones that work out something in the AI area.
Still, tech giants like Microsoft seem to work on something interesting as well.
Recently Microsoft unveiled its Turing-NLG model with its 17B parameters, and it was the most prominent language model in the time of its announcement, or at least, announced publicly. That was true until OpenAI’s GPT-3 appeared.
This puts small and medium vendors in not so good position not to be able to produce something comperative or even better.
However, Microsoft unveiled also the tools that allow parallel processing, and he used to train his Turing-NLG. Now, perhaps it seems to be possible for other vendors to step into the game as well
Zero Redundacy Optimizer(ZeRO) is an optimization module that maximizes both memory and scaling efficiency.
From an algorithmic perspective, ZeRO has three main stages that correspond to the partitioning of optimizer states, gradients, and parameters respectively.
1. Optimizer State Partitioning (Pos) — 4x memory reduction, same communication volume as data parallelism
2. Add Gradient Partitioning (Pos+g) — 8x memory reduction, same communication volume as data parallelism
3. Add Parameter Partitioning (Pos+g+p) — Memory reduction is linear with data parallelism degree Nd. For example, splitting across 64 GPUs (Nd = 64) will yield a 64x memory reduction. There is a modest 50% increase in communication volume.
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
10x Larger Models
10x Faster Training
Minimal Code Change
DeepSpeed can train DL models with over a hundred billion parameters on current generation of GPU clusters, while achieving over 10x in system performance compared to the state-of-art. Early adopters of DeepSpeed have already produced a language model (LM) with over 17B parameters called Turing-NLG, establishing a new SOTA in the LM category.
And link to the source code on GitHub:
It seems to me that finally, Microsoft gave up on building his own Deep Learning framework.
This is not some breaktrhru in research, but is definitelly interesting and usefull:
Hummingbird is a library for compiling trained traditional ML models into tensor computations. Hummingbird allows users to seamlessly leverage neural network frameworks (such as PyTorch) to accelerate traditional ML models. Thanks to Hummingbird, users can benefit from: (1) all the current and future optimizations implemented in neural network frameworks; (2) native hardware acceleration; (3) having a unique platform to support for both traditional and neural network models; and have all of this (4) without having to re-engineer their models.https://github.com/microsoft/hummingbird