Data Scientist

John Mark Agosta asked 2 months ago

In your article ‘introduction to transformers in machine learning” you work through the basic encoder-decoder architecture, using an machine translation example. In machine translation, there is the source language (German) and the target language (English). In the encoder-decoder, there are the inputs, outputs and pseudo probabilities. Apparently the “outputs” are an input — as you mentioned, one token shifted from the inputs, and the pseudo probabilities “predict the next token” It’s not at all clear how this applies to your example for the machine translation.

Your Answer

5 + 0 =