How does bidirectionality perform compared to concatenated LTR/RTL?

Ask Questions Forum: ask Machine Learning Questions to our readersCategory: Deep LearningHow does bidirectionality perform compared to concatenated LTR/RTL?
Chris Staff asked 2 weeks ago
1 Answers
Best Answer
Chris Staff answered 2 weeks ago

Better. The work proposing the BERT Transformer architecture empirically demonstrates that a bidirectional model yields better performance compared to a left-to-right architecture.
 
“[LTR/RTL] is strictly less powerful than a deep bidirectional model, since [the latter] can use both left and right context at every layer.”
 
Source:
 
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.