How do Transformers perform on language tasks compared to LSTMs?

Ask Questions Forum: ask Machine Learning Questions to our readersCategory: Machine LearningHow do Transformers perform on language tasks compared to LSTMs?
Chris Staff asked 4 months ago
1 Answers
Best Answer
Chris Staff answered 1 month ago

Generally, much better. They can be trained with bigger architectures and with bigger datasets, leading to extreme performance boosts compared to LSTMs. For example, the largest models that are out there these days have billions of parameters, compared to a few hundreds of thousands to a few millions for LSTMs.
 
The reason why these perform so well over LSTMs is because they learn lingual patterns differently. Whereas LSTMs perform a sequential operation over input sequences, adapting memory on the fly with forget, update and output gates, Transformers can process sequences in parallel by means of self-attention.
More information here.

Your Answer

0 + 6 =