# Does adding Language Modeling as an objective help Transformer fine-tuning?

Chris Staff asked 11 months ago
Chris Staff answered 11 months ago

The answer is a yes sometimes. The GPT paper i.e. Radford et al. (2018) has performed a set of experiments with an “auxiliary language modeling task added to fine-tuning”.

In other words, they used the following loss for optimization during fine-tuning:

$$L_{combined}(C) = L_{ft}(C) + \lambda \times L_{u}(C)$$

The combined loss function is composed of the fine-tuning loss function (which is task-specific) and a $$\lambda$$ weighted unsupervised training loss function.

Radford et al. (2018) shows that adding this additional Language Modeling loss function (and hence language modeling objective) during fine-tuning improves the results in some cases:

• On Natural Language Inference (text entailment) tasks, performance is improved sometimes.
• On Semantic Similarity inference (question pairs) tasks, performance is improved sometimes.
• Performance of fine-tuning is especially apparent on larger datasets.

Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training.