- dataset in only 2.2 milliseconds - well under the 10-ms processing threshold for many real-time applications, and a sharp improvement from over 40 milliseconds measured with highly optimized CPU code.
- Largest model: With a focus on developers' ever-increasing need for larger models, Nvidia Research built and trained the world's largest language model based on Transformers, the technology building block used for BERT and a growing number of other natural language AI models. Nvidia's custom model, with 8.3 billion parameters, is 24 times the size of BERT-Large.
The company has made the software optimizations available to developers. Continuous optimizations to accelerate training of BERT and Transformer for GPUs on multiple frameworks are freely available on NGC, the company’s hub for accelerated software. Nvidia’s BERT GitHub repository has code today to reproduce the single-node training performance cited by the company, and in the near future will be updated with the scripts necessary to reproduce cited large-scale training performance numbers.
To see the Nvidia Research team’s natural langusge processing (NLP) code on Project Megatron, which the company launched to investigate billion-plus Transformer-based networks, visit the Megatron Language Model GitHub repository.