Main difference between transformer models

Chetan Ambi asked 4 weeks ago

Hi Chris,
If it’s possible, I would like to know what are important differences between transformer models. For example, Pegasus models have an input limit of 512 or 1024 tokens, LED can process up to 16k tokens, etc. Any other important difference wrt its architecture. This will come in super handy. 

