To go with our explanation of encoder architectures, we've undertaken a similarly detailed analysis of decoder-only models, focusing on GPT-2. Unlike MiniLM, which processes full sequences bidirectionally, GPT-2 is an auto-regressive model, it predicts the next token based solely on the tokens it has seen so far.
We extract necessary model components from the open source GPT-2 model for the rebuild. Continuing the “Open-source LLMs rock.” example, we rebuild the GPT-2 pipeline block-by-block until the last stage where we reproduce the model's logits to within tolerable floating point precision error. The description follows soon, the notebook is available now.
Bug reports and corrections are welcome!
Next page
Last page
Return home