Search

Book purpose

Table of contents

Note of thanks

Endorsements

References

License and referencing

Update tracking

Testing standards

Testing standards

The objectives in psychological measurement and how they are conventionally achieved are clearly elaborated in foundational journal articles in psychology (e.g., Clark & Watson, 1995; Borsboom et al., 2004; Drasgow, 1989; Loevinger, 1957) and classic texts in the field (e.g., Bollen, 1989; Embretson & Reise, 2000; Lord & Novick, 1968; McDonald, 1999).

The emergence of AI does not change these ultimate objectives in psychological measurement. In fact, many of the most promising applications of AI in measurement demonstrate a bridge between psychometric methods described in the aforementioned resources and the techniques that are now available from computer science.

Many of the AI methods proposed in upcoming sections are novel and experimental. When they are applied in practice, they must demonstrate compliance with established standards for reliability, validity, and bias (AERA, APA, NCME, 2014; EEOC, 1978; International Test Commission, 2014; SIOP, 2018).

AI-based assessments introduce additional transparency challenges regarding data provenance, both in model training data and in how scores are derived from inputs. These issues make compliance with privacy regulations and legal frameworks particularly important, including GDPR, the EU AI Act, and applicable US legislation governing automated assessment and decision-making.

Novel concepts, such as validity evidence based on semantic relatedness, represent extensions, rather than replacements, of conventional psychometric criteria. These new methods must be integrated into our foundational quality frameworks, showing how they contribute to the achieving the same standards required under non-AI measurement.

Ultimately, new AI methods must be judged using empirical candidate responses against conventional psychometric standards. They must demonstrate reliability, validity, absence of bias, meet criteria for perceived fairness, and provide acceptable candidate experiences. Only evidence meeting these standards can establish their utility.

References

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. American Educational Research Association.

Bollen, K. A. (1989). Structural equations with latent variables. John Wiley & Sons.

Borsboom, D., Mellenbergh, G. J., & Van Heerden, J. (2004). The concept of validity. Psychological review111(4), 1061.

Clark, L. A., & Watson, D. (1995). Constructing validity: Basic issues in objective scale development. Psychological Assessment, 7(3), 309-319. https://doi.org/10.1037/1040-3590.7.3.309

Drasgow, F. (1987). Study of the measurement bias of two standardized psychological tests. Journal of Applied psychology72(1), 19.

Equal Employment Opportunity Commission (EEOC), Civil Service Commission, Department of Labor, & Department of Justice. (1978). Uniform guidelines on employee selection procedures. Federal Register, 43, 38290–39315.

Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Psychology Press.

Evers, A., Muñiz, J., Hagemeister, C., Høstmælingen, A., Lindley, P., Sjöberg, A., & Bartram, D. (2013). Assessing the quality of tests: Revision of the EFPA review model. Psicothema, 25(3), 283-291. https://doi.org/10.7334/psicothema2013.97

International Test Commission. (2014). ITC guidelines on quality control in scoring, test analysis, and reporting of test scores. International Journal of Testing, 14(3), 195-217. https://doi.org/10.1080/15305058.2014.918040

Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. IAP.

Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological reports3(3), 635-694.

McDonald, R. P. (2013). Test theory: A unified treatment. psychology press.

Society for Industrial and Organizational Psychology. (2018). Principles for the validation and use of personnel selection procedures (5th ed.). Industrial and Organizational Psychology, 11(S1), 1-97. https://doi.org/10.1017/iop.2018.195

Next page

Faithful and Plausible AI Measurement

Last page

Emergent LLM capabilities

Return home

Psychometrics.ai