The encoder-decoder architecture was the original transformer architecture that was proposed by Vaswani et al. They are now largely legacy architectures except for some specific applications like translation as they have been supplanted by encoder models for understanding (i.e.. embeddings) and increasingly decoder-only models for understanding and generation.
It is also worth noting that encoder models are sometimes still preferred over decoders for embeddings because of their transparency and their bidirectional attention allows each token to attend to the full context in both directions which produces richer embeddings. However, the largest decoder models can match or exceed encoder performance even on understanding tasks although they are not as transparent.
Next page
Five criteria for choosing a language model in AI psychometrics
Last page
What is a decoder architecture?
Return home
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).
image