Can I use Transformers with small datasets?

Chris Staff asked 2 weeks ago
1 Answers
Best Answer
Chris Staff answered 2 weeks ago

Yes, Transformers can definitely be used when you have a small dataset!
 
The general recommendation in this case is to use a Transformer based architecture (such as BERT or GPT) that has been pretrained on a large unlabeled corpus, and finetune it on your smaller dataset for the task that you want to use it for.
 
Doing so yields really good results as reported by the authors of BERT in this paper:
 
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.