1 Answers
Best Answer
Generally, yes.
“We can see that larger models lead to a strict accuracy improvement across all (…) datasets, even for [small datasets]”.
Source:
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.