- Open source LLMs are examinable
- Part 1. Download a local copy of the model
- Part 2. Understand the model’s architecture
- Part 3. Build and verify interim outputs
- Part 4. Traps to avoid
Open source LLMs are examinable
All the King’s horses and all the King’s men, Couldn’t put the LLM together again.
As this twist on the famous nursery rhyme suggests, knowing the parts of an LLM conceptually is one thing, disassembling and rebuilding one accurately is another together. This section discusses a strategy for doing so, focusing on the three primary transformer architectures, encoders, decoders, and encoder-decoders. In the next sections we put the strategy into action with applied reconstructions of widely used LLMs. We’ll use the MiniLM sentence encoder, GPT-2 decoder, and T5-small encoder-decoder.
Encoders, like MiniLM, create contextualised embeddings that are used in later analyses like prediction and classification. They use bidirectional self-attention, meaning they attend to tokens before and after the current token simultaneously during inference. This makes them suited to tasks that need full sentence understanding. Decoders, like GPT-2, use autoregressive (causal) attention with masked prediction, meaning they only see previous tokens. They are suited to generative tasks like producing text. Encoder-decoders, like T5, are good for sequence-to-sequence tasks like translating. They use bidirectional self-attention in the encoder and causal attention in the decoder.
Large Language Model’s (LLMs) are transformers (i.e., an encoder, decoder, or encoder-decoder) that have been trained on vast quantities of natural language data. ‘Large’ refers here to the amount of language data used and the number of model parameters.
LLMs are sophisticated tools that are commonly described as black boxes. This usually means their internal mechanisms can’t be scrutinized. While commercial models do not allow scrutiny of internal designs (model parameters are kept server side), open source LLMs do. Open source LLMs are only functionally black boxes meaning it’s just impractical to reverse engineer them. But LLMs can be disassembled and reassembled in a manner that is decoupled from the official models if they are open source.
As an example, in the upcoming demonstrations we use Pytorch’s register_forward_hook on each layer of several LLMs to capture matrix projections. The motivation is primarily instructional but recognizes the utility of the understanding gained should parameters of LLMs allow psychological interpretations in future. Reverse engineering here means replicating the inner workings of the LLM at inference, rather than training the LLM from the ground ups. Training and fine-tuning are addressed in subsequent sections.
Part 1. Download a local copy of the model
The first step in reverse engineering is choosing an open-source transformer to reverse engineer. MiniLM, GPT-2, and T5-small are good examples. Hugging Face’s transformers package stores the model configuration in config.json files and weights in a pytorch_model.bin file. See Hugging Face’s save_pretrained / from_pretrained commands.
Choices that we will use here are MiniLM for an encoder, GPT-2 for a decoder, and T5-small as the encoder-decoder. In each case, the important first step is to download an offline copy of the model and all its components, such as learned position encodings, token type embeddings where relevant, and weight matrices Q_w, K_w, and V_w (see Vaswani et al., 2017).
Part 2. Understand the model’s architecture
The next stage is to get a solid understanding of architecture, or how data flows through the model and how it transforms inputs into representations and responses. The architecture describes the format the LLM takes as input, its tokenization approach, along with any positional encodings, the feed forward multilayer perceptron, the attention mechanisms it uses (self-attention or cross-attention), the number of layers and how they're configured, and normalization strategies.
The model architecture also defines how outputs are produced, i.e., whether they will take the form of numerical embeddings or as token sequences for text generation. With this knowledge, we can begin the replication process one step at a time. Knowing where you are in the rebuild and how much remains to be mapped is reassuring. You will likely also discover that once you have the first transformer block coded correctly, the rest are identical clones.
Part 3. Build and verify interim outputs
From this point, the process involves running the official LLM and stopping the model using a hook that allows you to capture interim outputs, such as the tokenized input text, the input tensor that enters the first transformer block, the projection of the input tensor into different planes using weight matrices and so on.
The information to capture is the exact values in the matrices as ground truth and the dimensionality of every input matrix and every generated matrix. When you reimplement in Python using Pytorch, you’ll have checkpoints against which to compare you progress. This is helpful for debugging errors, if your output is a different dimension from the model’s at any checkpoint, you may be further away from the correct path than if it’s only the weights that differ but have the correct matrix dimensions.
If it’s working, you should expect accuracy to at least 5 decimal places when making element wise comparisons of floating point values from official and reimplemented blocks (noting that close matching requires identical hardware and software environments). You can repeat this process for every block in your architecture, checking that your precision is acceptable at every stage, until you reach the end.
Part 4. Traps to avoid
A number of these steps I only realised part way through the process, so my reconstructions of MiniLM and GPT2 did not follow every step. For example, I call the live model weight matrices on occasion instead of looking up the offline model counterparts. This does not invalidate the reconstruction, but the decoupling in my notebooks would have been more complete had I used these offline versions of the inner components.
Another sticking point is transformer models may depart from textbook methods. Perhaps they add positional embeddings or token type encodings learned in training instead of sinusoidal positional encodings or fixed type embeddings. Also, for a thoroughly decoupled reconstruction, you want to avoid accidentally dragging an interim model output matrix into the deconstructed pipeline, or accidentally comparing an interim output to itself during a ground truth comparison.
Finally, it may be impossible to discern how the output from one stage leads to the next in some instances. In these cases, your options are limited. (So far in our demonstrations that follow we have not encountered this). However, faced with such situations, you may decide to take the interim model output forward in your decoupled pipeline, particularly if the goal is primarily instructional.
Next section
Last section
Reasons for AI psychometric progress
Return home
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).