In the fine-tuning based approach to training a NLP model (mostly used with Transformer architectures), training involves two steps:
- Pretraining a model with a large, unlabeled dataset. Specific language tasks are designed for this, such as language modeling, next sentence prediction and masked language modeling.
- Finetune your model with a small- to medium-sized labeled dataset. You use the pretrained model for this and effectively tune the model that has generic language understanding capabilities to your own dataset.
For example, you can pretrain a model on a large corpus such as CommonCrawl and then fine-tune it using your own data, which could e.g. be tailored to answering engineering questions.
This approach is the opposite of the feature-based approach for training NLP models, where a pretrained model is used for generating features, which are then used in a smaller model that is better trainable.