How can language modeling be added as an additional task to fine-tuning?

Ask Questions Forum: ask Machine Learning Questions to our readersCategory: Machine LearningHow can language modeling be added as an additional task to fine-tuning?
Chris Staff asked 11 months ago
1 Answers
Chris Staff answered 11 months ago

Suppose that we have a loss metric \(L_{ft}(C)\) which describes loss of the fine-tuning operation; more specifically, of the task performed (in the case of binary classification, this can for example be binary crossentropy loss).

Fine-tuning happens on a model that was first pretrained with an unlabeled dataset and hence in an unsupervised fashion, using some loss function \(L_u(C)\). This loss function is a language modeling loss function.

Radford et al. (2018) show that performance of the fine-tuning operation improves even further in some cases (specific fine-tuning tasks; large fine-tuning datasets) when \(L_{ft}(C)\) is added to \(L_u(C)\):

\(L_{combined}(C) = L_{ft}(C) + \lambda \times L_{u}(C)\)

Here, the \(\lambda\) serves as a weight for the language modeling loss addition.

Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training.

Your Answer

8 + 14 =