In feature-based training in NLP, a pretrained language model (such as a Transformer like BERT) is used for generating features based on some input tokens. These features are then used for training a smaller and possibly different model, like an LSTM or ConvNet, for a specific language task.
This is the opposite of a fine-tuning based task, where the pretrained model is used instead and fine-tuned based on your dataset.
In BERT, this can for example be achieved by taking the class output token C as a joint, sentence-level representation of all the input tokens. Since BERT (like any Transformer) processes tokens in parallel through its self-attention mechanism, some information from the tokens spills over in C as well. For this reason, C can be a good sentence-level representation, and can be used for generating sentence-level features to be used in different models.
While the benefits for the feature-based NLP lie mostly in training and inference speed (and hence saving computational costs), fine-tuning based approaches seem to work a bit better (according to Devlin et al., 2018).
Source:
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
What is 'feature-based' training for NLP models?
1 Answers
Best Answer
Your Answer