Do large Transformer models yield better performance compared to small ones?

Ask Questions Forum: ask Machine Learning Questions to our readersCategory: Deep LearningDo large Transformer models yield better performance compared to small ones?
Chris Staff asked 2 weeks ago
1 Answers
Best Answer
Chris Staff answered 2 weeks ago

Generally, yes.
 
“We can see that larger models lead to a strict accuracy improvement across all (…) datasets, even for [small datasets]”.
 
Source:
 
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.