Fine-tuning is part of the pretraining/fine-tuning paradigm which essentially removes the need for really big labeled datasets. It does so as follows:
- First, a very big but unlabeled – and hence easily collectable – dataset is fed to the model. The model is then asked to predict either the next token (LM) or what is hidden behind some masked tokens (MLM). Doing so, it builds knowledge about language patterns.
- Then, a smaller and labeled dataset is used that connects the pretrained model to a specific task by showing it specific outcomes for specific inputs. For example, it shows a specific summary for some bigger input text. In another case, it shows a sentiment label for some text. In another training setting, the pretrained model that is capable of detecting basic language patterns adapts these to perform task-oriented behavior.
In other words, fine-tuning a model involves taking a pretrained model, then training it further, but then with a different type of loss function that takes into account a specific task – such as summarization, question answering, or natural language entailment.