Why are LSTMs limited to short-range language structures?

Ask Questions Forum: ask Machine Learning Questions to our readersCategory: Machine LearningWhy are LSTMs limited to short-range language structures?
Chris Staff asked 11 months ago
1 Answers
Chris Staff answered 11 months ago

From our article about LSTMs, we know that they have a memory cell like structure with a memory component, in the image below illustrated in green:
 

 
This memory structure passes through cells and is only adapted in a minor way, through a forget gate, an input/update gate and an output gate.
 
The forget gate has the responsibility to remove aspects from memory given a Sigmoid-ed input combination of the previous output and the current input.
 
The input/update gate has the responsibility to add (normalized) aspects into memory given the Sigmoid-ed input combination of the previous output and the current input.
 
The output gate has the responsibility to predict and consequently output the expected outcome given current input, previous output and state of affairs in memory.
 
As you can imagine, memory adaptation only happens based on short-term inputs (i.e. previous output and current input). Hence, while longer-term information gets passed to downstream memory cells, it slowly fades over time. While theoretically LSTMs can therefore hold long-term information, in practice this remains relatively problematic especially with longer sequences.
 
Transformers, which use a different approach, solve this problem; they can handle large phrases relatively easily.
 

Your Answer

9 + 6 =