Does pre-training a language model actually help?

Chris Staff asked 11 months ago
1 Answers
Best Answer
Chris Staff answered 9 months ago

Definitely. Self-attention is essentially a function that is capable of generating a supervision signal for unlabeled data. By consequence, language models are now capable of learning language patterns from unlabeled data – a form of self-supervised learning that all happens in the pretraining stage. These patterns are general. We can subsequently fine-tune these pretrained models to adapt them to specific language tasks, using labeled datasets.

Your Answer

6 + 2 =