Mechanistic View of Transformers: Patterns, Messages, Residual Stream… and LSTMs | Towards Data Science

What happens when you stop concatenating and start decomposing: a new way to think about attention.

By · · 1 min read
Mechanistic View of Transformers: Patterns, Messages, Residual Stream… and LSTMs | Towards Data Science

Source: Towards Data Science

What happens when you stop concatenating and start decomposing: a new way to think about attention.