Can BERT be used in a feature-based pipeline?

Chris Staff asked 2 weeks ago
1 Answers
Chris Staff answered 2 weeks ago

Yes, it is possible to train another Machine Learning model on top of BERT, where you use BERT’s output as features for your Machine Learning model!
This approach has been demonstrated to work quite well in the original BERT paper with F1 scores ranging between 91.0 and 96.1 on a Named Entity Recognition task for different feature based approaches.
In fact, doing so has a great benefit: it lowers computational cost significantly if you train lower-intensity models often while they benefit from a large-scale, computationally intensive pretrained model trained just once.
Still, the finetuning approach works slightly better, with 96.4-96.6 F1 scores for the same task.
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.