Psychometrics.ai

Search

Book purpose

Table of contents

References

Endorsements

Note of thanks

Author bio

License

Update tracking

Psychometrics.ai
Substitutability assumption

Substitutability assumption

  • The power of transformers
  • Notable parallels between human ratings & embeddings
  • Representational and generative substitutability assumptions
  • References

The power of transformers

AI psychometrics is making fast progress because of Transformer models. Transformers are pre-trained neural network models that are extremely efficient at modeling sequence data, such as dependencies between tokenized language, a task at which they have been tremendously successful.

The same attention-based architecture has been successfully adapted to other modalities like images through Vision Transformers, which we also explore later in this book. In this page we will discuss the transformer architectures in the context of language, but keep in mind other data modalities are now common.

Transformers include encoder models that create embeddings for natural language understanding (NLU), decoder models used for Generative AI text generation tasks, and encoder-decoder models which are used for sequence-to-sequence tasks such as translation. Hussain et al. (2025) and Brickman et al. (2025) provided useful introductory overviews of transformers for behavioural science.

Researchers have shown that models estimated on empirical item response data (e.g. latent variable models like IRT, Factor Analysis, and also Networks) can be applied to item embeddings. Embeddings are used to form a matrix of embeddings associations for analysis with conventional methods.

Interestingly, conventional psychometric approaches applied to embeddings yield measurement model parameters that are strongly related to parameters estimated on empirical data. Demonstrations so far have focused on showing the practical utility of these parameter correspondences. The precise reasons are not yet clear, here we explore early insights.

A representation of the way AI substitutes for tasks humans usually perform in the psychometric testing process.
💡

Consider this intentionally provocative analogy: Just as in vitro biology can replicate biochemical patterns without replicating the full complexity of living systems, psychometric analysis of LLM based item embeddings and responses can show human-like factor structures without possessing the underlying psychological mechanisms that produce these patterns in human minds.

Notable parallels between human ratings & embeddings

Early discussion by Guenole et al. (2024) noted parallels between the processes that generate the data from A.I vector representations and empirical item responses. Person responses to scale items are human encodings of the descriptiveness of scale items. These are aggregated to form a sample matrix of item response covariances or correlations for the test taking population.

Pre-trained encoder model’s item embedding representations are also encodings used to form an association matrix. If test takers read scraped training content, it may directly influence ratings. If scraped content reflects cultural norms, it may indirectly reflect human test takers’ knowledge and world views. Web content can plausibly influence human and machine encodings.

In the case of decoder models and generative AI, the substitution is more direct. Generative AI models have been used to produce human like responses to assessments. When AI responds like a human hundreds or even thousands of times, an item response data set is produced that can be analyzed in conventional ways, i.e., the AI directly substitutes for the test taker.

Here we consider further parallels that may exist between human ratings of scale statements and LLM embeddings of those same statements. We consider parallels both prior to and following the encoding stage, as we expect these parallels may explain the parameter correspondences across the two methods.

Table 1. Common stages between human responses and encoder decoder applications in psychometrics

Process Stage
Human Ratings
LLM Encodings
Content authoring
Test designers write items or statements to measure constructs
Humans post text content on the Internet
Pre-processing
Items reviewed by experts for content and relevance
Text is cleaned and tokenized
Task framing
Test instructions by developers shape perceptions
Pre-training objectives by model designers shape model design
World view source
Humans draw on self knowledge from personal learning, and cultural experience
Models acquire knowledge from shared cultural data via web scraping
Stimulus interpretation
Humans activate self knowledge in response to test items and encode self descriptiveness as ratings
Encoder based models encode items using distributional knowledge of language encoded in model weights
Matrix representation
Individual responses collected to and turned into a correlation matrix
Embeddings are collated and turned into a cosine similariy matrix
Matrix interpretation
Shared human self knowledge is reflected in the correlational patterns
Shared corpus language usage is reflected in the cosine similarity patterns
Response Generation & Processing
Same process as stimulus interpretation row above.
Generative AI models follow instructions that generate responses and rationales or act as humans to form artificial crowds.
Analysis
Psychometric analysis via latent variable models (e.g. Factor Analysis, IRT) or network models
Psychometric analysis via latent variable models (e.g. Factor Analysis, IRT) or network models

Representational and generative substitutability assumptions

Guenole et al., (2025) has proposed a heuristic explanation called the substitutability assumption to begin to understand the complex process of explaining why AI psychometrics works, something we currently know very little about. In the case of encoders that generate embeddings for representational AI, the substitutability assumption suggests that item embedding vectors can substitute empirical vector of human item responses under certain yet-to-be specified conditions.

In the case of decoder models and generative AI, the AI can directly substitute for the test taker to create data sets based on responses ‘artificial crowds’ e.g. Wang et al. (2025), or alternatively, the LLM can act as a psychologist in zero or few shot prompting approaches to tasks such item allocation to scales or in making predictions of traits based on input text from personality relevant content from subjects (e.g. Maharjan et al. 2025).

The substitutability assumption relies heavily on analogy. It is unclear what properties of embeddings let them capture trait related variance, what the effects of training data composition are, and whether fine-tuning can bring training data closer into line with assessment populations. Despite not knowing the precise mechanisms through which embedding and empirical psychometrics correspondences occur, the parallels offer powerful psychometric capabilities.

While the transformer discussion here ceters on language applications, which has been the primary driver of current AI psychometrics progress, the architecture's flexibility extends to vision and beyond, as seen in models like Vision Transformers.

References

Brickman, J., Gupta, M., & Oltmanns, J. R. (2025). Large language models for psychological assessment: A comprehensive overview. Advances in Methods and Practices in Psychological Science, 8(3), 25152459251343582.

Guenole, N., D'Urso, E. D., Samo, A., Sun, T., & Haslbeck, J. (2025). Enhancing Scale Development: Pseudo Factor Analysis of Language Embedding Similarity Matrices.

Hussain, Z., Binz, M., Mata, R., & Wulff, D. U. (2025). A tutorial on open-source large language models for behavioral science. Behavior Research Methods, 56(8), 8214-8237.

Maharjan, J., Jin, R., Zhu, J., & Kenne, D. (2025). Psychometric Evaluation of Large Language Model Embeddings for Personality Trait Prediction. Journal of Medical Internet Research, 27, e75347.

Wang, Y., Zhao, J., Ones, D. S., He, L., & Xu, X. (2025). Evaluating the ability of large language models to emulate personality. Scientific reports, 15(1), 519.

Next section

Reverse-Engineering Open Source Transformers

Last section

What is AI psychometrics?

Return home

Psychometrics.ai

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).

image
Google scholar profile for Nigel Guenole - AI psychometrics research
Linkedin profile for Nigel Guenole - AI assessment consulting and strategy