- Section 1: AI Psychometric foundations
- Section 2: Methods and tools
- Section 3: Psychometric analyses
- Section 4: Hybrid models
- Section 5: Critical reflections
Section 1: AI Psychometric foundations
Canagasuriam, D., & Lukacik, E. R. (2025). ChatGPT, can you take my job interview? Examining artificial intelligence cheating in the asynchronous video interview. International Journal of Selection and Assessment, 33(1), e12491.
Casabianca, J. M., McCaffrey, D. F., Johnson, M. S., Alper, N., & Zubenko, V. (2025). Validity Arguments For Constructed Response Scoring Using Generative Artificial Intelligence Applications. arXiv preprint arXiv:2501.02334.
Dai, W., Tsai, Y. S., Lin, J., Aldino, A., Jin, H., Li, T., ... & Chen, G. (2024). Assessing the proficiency of large language models in automatic feedback generation: An evaluation study. Computers and Education: Artificial Intelligence, 7, 100299.
Financial Times. (2025, May 14). Anthropic’s Claude: Character Training with Psychometric Principles. Financial Times.
Guenole, N., Samo, A., & Sun, T. (2024). Pseudo-Discrimination Parameters from Language Embeddings. OSF Preprint. https://osf.io/preprints/psyarxiv/9a4qx_v1
Guenole, N., D'Urso, E. D., Samo, A., Sun, T., & Haslbeck, J. (2025). Enhancing Scale Development: Pseudo Factor Analysis of Language Embedding Similarity Matrices. OSF Preprint. https://osf.io/preprints/psyarxiv/vf3se_v2
Hernandez, I., & Nie, W. (2023). The AI‐IP: Minimizing the guesswork of personality scale item development through artificial intelligence. Personnel Psychology, 76(4), 1011-1035.
Hommel, B. E., Wollang, F. J. M., Kotova, V., Zacher, H., & Schmukle, S. C. (2022). Transformer-based deep neural language modeling for construct-specific automatic item generation. Psychometrika, 87(2), 749-772.
Hickman, L., Bosch, N., Ng, V., Saef, R., Tay, L., & Woo, S. E. (2022). Automated video interview personality assessments: Reliability, validity, and generalizability investigations. Journal of Applied Psychology, 107(8), 1323.
Hussain, Z., Binz, M., Mata, R., & Wulff, D. U. (2025). A tutorial on open-source large language models for behavioral science. Behavior Research Methods, 56(8), 8214-8237.
Jung, J. Y., Tyack, L., & von Davier, M. (2024). Combining machine translation and automated scoring in international large-scale assessments. Large-scale Assessments in Education, 12(1), 10.
Laverghetta Jr, A., Nighojkar, A., Mirzakhalov, J., & Licato, J. (2021, July). Predicting human psychometric properties using computational language models. In The Annual Meeting of the Psychometric Society (pp. 151-169). Cham: Springer International Publishing.
Ormerod, C., Jafari, A., Lottridge, S., Patel, M., Harris, A., & van Wamelen, P. (2021). The effects of data size on Automated Essay Scoring engines. arXiv preprint arXiv:2108.13275.
Russell-Lasalandra, L. L., Christensen, A. P., & Golino, H. (2024, September). Generative Psychometrics via AI-GENIE: Automatic Item Generation and Validation via Network-Integrated Evaluation.
Shrestha, I., Tay, L., & Srinivasan, P. (2025). Robust Bias Detection in MLMs and its Application to Human Trait Ratings. arXiv preprint arXiv:2502.15600.
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. (2017). Attention is all you need. Advances in neural information processing systems 30.
von Davier, A. A. (2017). Computational psychometrics in support of collaborative educational assessments. Journal of Educational Measurement, 54(1), 3-11.
von Davier, M. (2019). Training Optimus prime, MD: Generating medical certification items by fine-tuning OpenAI's gpt2 transformer model. arXiv preprint arXiv:1908.08594.
Wulff, D. U., & Mata, R. (2025). Wulff, D. U., & Mata, R. (2025). Semantic embeddings reveal and address taxonomic incommensurability in psychological measurement. Nature Human Behaviour, 1-11.
Zhao, H., Chen, H., Yang, F., Liu, N., Deng, H., Cai, H., ... & Du, M. (2024). Explainability for large language models: A survey. ACM Transactions on Intelligent Systems and Technology, 15(2), 1-38.
Section 2: Methods and tools
Lee, P., Fyffe, S., Son, M., Jia, Z., & Yao, Z. (2023). A paradigm shift from “human writing” to “machine generation” in personality test development: An application of state-of-the-art natural language processing. Journal of Business and Psychology, 38(1), 163-190.
Section 3: Psychometric analyses
Arnulf, J. K., Larsen, K. R., Martinsen, Ø. L., & Nimon, K. F. (2021). Editorial: Semantic Algorithms in the Assessment of Attitudes and Personality. Frontiers in Psychology, 12, 720559. https://doi.org/10.3389/fpsyg.2021.720559
Fyffe, S., Lee, P., & Kaplan, S. (2024). “Transforming” personality scale development: Illustrating the potential of state-of-the-art natural language processing. Organizational Research Methods, 27(2), 265-300. https://journals.sagepub.com/doi/abs/10.1177/10944281231155771
Hernandez, I., & Nie, W. (2023). The AI‐IP: Minimizing the guesswork of personality scale item development through artificial intelligence. Personnel Psychology, 76(4), 1011-1035. https://onlinelibrary.wiley.com/doi/full/10.1111/peps.12543
Hickman, L., Liff, J., Rottman, C., & Calderwood, C. (2024). The Effects of the Training Sample Size, Ground Truth Reliability, and NLP Method on Language-Based Automatic Interview Scores’ Psychometric Properties. Organizational Research Methods, 10944281241264027. https://journals.sagepub.com/doi/abs/10.1177/10944281241264027
Hommel, B. E., Wollang, F. J. M., Kotova, V., Zacher, H., & Schmukle, S. C. (2022). Transformer-based deep neural language modeling for construct-specific automatic item generation. Psychometrika, 87(2), 749-772. https://www.cambridge.org/core/journals/psychometrika/article/transformerbased-deep-neural-language-modeling-for-constructspecific-automatic-item-generation
Hussain, Z., Binz, M., Mata, R., & Wulff, D. U. (2024). A tutorial on open-source large language models for behavioral science. Behavior Research Methods, 56(8), 8214-8237. https://link.springer.com/article/10.3758/s13428-024-02455-8
Russell-Lasalandra, L. L., Christensen, A. P., & Golino, H. (2024, September). Generative Psychometrics via AI-GENIE: Automatic Item Generation and Validation via Network-Integrated Evaluation. https://osf.io/preprints/psyarxiv/fgbj4_v1
von Davier, A. A., Mislevy, R. J., & Hao, J. (Eds.). (2022). Computational psychometrics: New methodologies for a new generation of digital learning and assessment: With examples in R and Python. Springer Nature.
Wulff, D. U., & Mata, R. (2025). Semantic embeddings reveal and address taxonomic incommensurability in psychological measurement. Nature Human Behaviour, 1-11. Nature Semantic embeddings reveal and address taxonomic incommensurability in psychological measurement
Section 4: Hybrid models
Section 5: Critical reflections
Previous page
Return home
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).