Decoder architectures

Decoder architectures

To go with our explanation of encoder architectures, we've undertaken a similarly detailed analysis of decoder-only models, focusing on GPT-2. Unlike MiniLM, which processes full sequences bidirectionally, GPT-2 is an auto-regressive model, it predicts the next token based solely on the tokens it has seen so far.

We extract necessary model components from the open source GPT-2 model for the rebuild. Continuing the “Open-source LLMs rock.” example, we rebuild the GPT-2 pipeline block-by-block until the last stage where we reproduce the model's logits to within tolerable floating point precision error. The description follows soon, the notebook is available now.

View the GPT-2 notebook—>

Bug reports and corrections are welcome!

Next page

Encoder-decoder architectures

Last page

Encoder architectures

Return home

Psychometrics.ai