References

References

Section 1: AI Psychometric foundations

Canagasuriam, D., & Lukacik, E. R. (2025). ChatGPT, can you take my job interview? Examining artificial intelligence cheating in the asynchronous video interview. International Journal of Selection and Assessment33(1), e12491.

Casabianca, J. M., McCaffrey, D. F., Johnson, M. S., Alper, N., & Zubenko, V. (2025). Validity Arguments For Constructed Response Scoring Using Generative Artificial Intelligence Applications. arXiv preprint arXiv:2501.02334.

Dai, W., Tsai, Y. S., Lin, J., Aldino, A., Jin, H., Li, T., ... & Chen, G. (2024). Assessing the proficiency of large language models in automatic feedback generation: An evaluation study. Computers and Education: Artificial Intelligence7, 100299.

Financial Times. (2025, May 14). Anthropic’s Claude: Character Training with Psychometric Principles. Financial Times. 

Guenole, N., Samo, A., & Sun, T. (2024). Pseudo-Discrimination Parameters from Language Embeddings. OSF Preprint. https://osf.io/preprints/psyarxiv/9a4qx_v1

Guenole, N., D'Urso, E. D., Samo, A., Sun, T., & Haslbeck, J. (2025). Enhancing Scale Development: Pseudo Factor Analysis of Language Embedding Similarity Matrices. OSF Preprint. https://osf.io/preprints/psyarxiv/vf3se_v2

Hernandez, I., & Nie, W. (2023). The AI‐IP: Minimizing the guesswork of personality scale item development through artificial intelligence. Personnel Psychology, 76(4), 1011-1035.

Hommel, B. E., Wollang, F. J. M., Kotova, V., Zacher, H., & Schmukle, S. C. (2022). Transformer-based deep neural language modeling for construct-specific automatic item generation. Psychometrika87(2), 749-772.

Hickman, L., Bosch, N., Ng, V., Saef, R., Tay, L., & Woo, S. E. (2022). Automated video interview personality assessments: Reliability, validity, and generalizability investigations. Journal of Applied Psychology107(8), 1323.

Hussain, Z., Binz, M., Mata, R., & Wulff, D. U. (2025). A tutorial on open-source large language models for behavioral science. Behavior Research Methods56(8), 8214-8237.

Jung, J. Y., Tyack, L., & von Davier, M. (2024). Combining machine translation and automated scoring in international large-scale assessments. Large-scale Assessments in Education12(1), 10.

Laverghetta Jr, A., Nighojkar, A., Mirzakhalov, J., & Licato, J. (2021, July). Predicting human psychometric properties using computational language models. In The Annual Meeting of the Psychometric Society (pp. 151-169). Cham: Springer International Publishing.

Ormerod, C., Jafari, A., Lottridge, S., Patel, M., Harris, A., & van Wamelen, P. (2021). The effects of data size on Automated Essay Scoring engines. arXiv preprint arXiv:2108.13275.

Russell-Lasalandra, L. L., Christensen, A. P., & Golino, H. (2024, September). Generative Psychometrics via AI-GENIE: Automatic Item Generation and Validation via Network-Integrated Evaluation.

Shrestha, I., Tay, L., & Srinivasan, P. (2025). Robust Bias Detection in MLMs and its Application to Human Trait Ratings. arXiv preprint arXiv:2502.15600.

Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. (2017). Attention is all you need. Advances in neural information processing systems 30.

von Davier, A. A. (2017). Computational psychometrics in support of collaborative educational assessments. Journal of Educational Measurement, 54(1), 3-11.

von Davier, M. (2019). Training Optimus prime, MD: Generating medical certification items by fine-tuning OpenAI's gpt2 transformer model. arXiv preprint arXiv:1908.08594.

Wulff, D. U., & Mata, R. (2025). Wulff, D. U., & Mata, R. (2025). Semantic embeddings reveal and address taxonomic incommensurability in psychological measurement. Nature Human Behaviour, 1-11.

Zhao, H., Chen, H., Yang, F., Liu, N., Deng, H., Cai, H., ... & Du, M. (2024). Explainability for large language models: A survey. ACM Transactions on Intelligent Systems and Technology15(2), 1-38.

Section 2: Methods and tools

Lee, P., Fyffe, S., Son, M., Jia, Z., & Yao, Z. (2023). A paradigm shift from “human writing” to “machine generation” in personality test development: An application of state-of-the-art natural language processing. Journal of Business and Psychology38(1), 163-190.

Section 3: Psychometric analyses

Arnulf, J. K., Larsen, K. R., Martinsen, Ø. L., & Nimon, K. F. (2021). Editorial: Semantic Algorithms in the Assessment of Attitudes and Personality. Frontiers in Psychology, 12, 720559. https://doi.org/10.3389/fpsyg.2021.720559

Fyffe, S., Lee, P., & Kaplan, S. (2024). “Transforming” personality scale development: Illustrating the potential of state-of-the-art natural language processing. Organizational Research Methods, 27(2), 265-300. https://journals.sagepub.com/doi/abs/10.1177/10944281231155771

Hernandez, I., & Nie, W. (2023). The AI‐IP: Minimizing the guesswork of personality scale item development through artificial intelligence. Personnel Psychology, 76(4), 1011-1035. https://onlinelibrary.wiley.com/doi/full/10.1111/peps.12543

Hickman, L., Liff, J., Rottman, C., & Calderwood, C. (2024). The Effects of the Training Sample Size, Ground Truth Reliability, and NLP Method on Language-Based Automatic Interview Scores’ Psychometric Properties. Organizational Research Methods, 10944281241264027. https://journals.sagepub.com/doi/abs/10.1177/10944281241264027

Hommel, B. E., Wollang, F. J. M., Kotova, V., Zacher, H., & Schmukle, S. C. (2022). Transformer-based deep neural language modeling for construct-specific automatic item generation. Psychometrika, 87(2), 749-772. https://www.cambridge.org/core/journals/psychometrika/article/transformerbased-deep-neural-language-modeling-for-constructspecific-automatic-item-generation

Hussain, Z., Binz, M., Mata, R., & Wulff, D. U. (2024). A tutorial on open-source large language models for behavioral science. Behavior Research Methods, 56(8), 8214-8237. https://link.springer.com/article/10.3758/s13428-024-02455-8

Russell-Lasalandra, L. L., Christensen, A. P., & Golino, H. (2024, September). Generative Psychometrics via AI-GENIE: Automatic Item Generation and Validation via Network-Integrated Evaluation. https://osf.io/preprints/psyarxiv/fgbj4_v1

von Davier, A. A., Mislevy, R. J., & Hao, J. (Eds.). (2022). Computational psychometrics: New methodologies for a new generation of digital learning and assessment: With examples in R and Python. Springer Nature.

Wulff, D. U., & Mata, R. (2025). Semantic embeddings reveal and address taxonomic incommensurability in psychological measurement. Nature Human Behaviour, 1-11. Nature Semantic embeddings reveal and address taxonomic incommensurability in psychological measurementNature Semantic embeddings reveal and address taxonomic incommensurability in psychological measurement

Section 4: Hybrid models

Section 5: Critical reflections

Previous page

Model fine tuning

Return home

Psychometrics.ai

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).