What are unidirectional language models?

Chris Staff asked 2 weeks ago
1 Answers
Best Answer
Chris Staff answered 2 weeks ago

Language models process text: entire phrases are first tokenized (e.g. text to integer) and subsequently passed as a “sequence”. Depending on the situation, this can lead to autoregressive, autoencoding or Seq2Seq behavior within a model.
Unidirectional processing here means “processing into one direction”.
Text can either be processed from left-to-right or right-to-left (unidirectional processing) or left-to-right and right-to-left (bidirectional processing). BERT is a classic example of a bidirectional model, whereas the original OpenAI GPT model is a an unidirectional one.
Unidirectionality in most attention based models is implemented by means of “masking” future inputs in the attention segment. From the classic Transformer architecture, the decoder is hence an unidirectional model; the encoder is bidirectional.
(This also explains why BERT makes use of the Transformer encoder whereas GPT utilizes the decoder.)