What percentage of tokens is masked in BERT's MLM objective?

Ask Questions Forum: ask Machine Learning Questions to our readersCategory: Deep LearningWhat percentage of tokens is masked in BERT's MLM objective?
Chris Staff asked 11 months ago
1 Answers
Best Answer
Chris Staff answered 11 months ago

15 percent of the tokens is masked at random. This is further distributed as follows:
 

  • 80 percent of the 15 percent is masked with a <MASK> token.
  • 10 percent of the 15 percent is masked by picking another token at random.
  • 10 percent of the 15 percent is masked by not changing the token, keeping it at the original one.

 
Source:
 
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

Your Answer

15 + 9 =