Generating items via an API

Generating items via an API

Step 1. Get your OpenAI account

An account is required with an AI model provider such as OpenAI or Anthropic so that you can access the language models via an API. You can sign up for an account here with your email address.: https://platform.openai.com/signup.

Step 2. Get your OpenAI API key

Step 2. We’ll need an API key, which is a code that lets you access OpenAI’s  models securely and monitors usage for charging. You can generate this API key at https://platform.openai.com/account/api-keys by selecting ‘Create new secret key’.

You’ll be asked to give the key a name and you will only get to see the key once, so you’ll need to save it securely for use in our python scripts. This key must not be disclosed as others will be able to use your account and incur charges. If it is exposed or forgotten, you can delete the key.

Step 3. Set up a payment method

We’ll need to set up a payment method. To do so, go to settings, select billing and add a payment method. Then choose an amount to add by clicking ‘buy credits’. A few US dollars is enough for small projects and to start seeing how much different model tasks cost. It is possible to add a line of code to print the number of tokens consumed, see ‘token usage” in the script.

Input processing tokens are cheaper than output generation. These costs are astonishingly low. To generate 1200 items using scripts on this page, 2274 tokens were used to process the prompt and 12795 tokens were consumed generating items for a total of 15069 tokens overall. The OpenAI app showed the total cost of executing this script was US$0.41 (May, 2025).

Step 4. Choose between local or cloud-based coding

The script on this page for generating 1200 items uses a local central processing unit (CPU) as the local processing is minimal, it just sends and receives text. The item generation itself happens remotely on OpenAI servers. If we were running a local LLM instead of using OpenAI servers or if we were fine tuning an LLM, we may decide to use a local graphics pricessing unit (GPU) for faster processing or even cloud-based computing option like Google Collab to manage the heavier computing demands.

Step 5. Install the OpenAI Python library

To install the OpenAI Python library (only required once) execute the following command in a terminal window. This lets us send requests from our Python code to the OpenAI API and manages authentication.

# Run as shell command to install OpenAI Python library

pip install openai

We also suggest setting your API key as an environment variable to avoid exposing our key in our code by running ‘export OPENAI_API_KEY="your key goes here"; this will need resetting each session.

# Set your API key as an environment variable to avoid exposing it in code

export OPENAI_API_KEY="your key goes here"

Step 6. Open a Jupyter notebook

Now navigate in the terminal to the folder where you are storing your work and run the command ‘jupyter notebook’, which will launch a new Jupyter notebook Interface on local host.

jupyter notebook

Now once we select ‘new notebook’ we are ready to code.

Step 7. Generate items

We’ll need to import the Python OS module so we can interact with the operating system, import the OpenAI class from the Openai library for API access, and create a client object to connect to OpenAI services.

# import Python OS module
import os

# import OpenAI class 
from openai import OpenAI

# Create OpenAI client with API key from environment
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

Now we’re ready to generate items. Let’s make 12 API calls to systematically generate items for each moral foundations. For each moral foundation, we will generate 100 positively keyed items and 100 reverse scored items. In the example here we will demonstrate guided item generation without sample items but sample items can easily be incorporated when they are available. With more detailed instruction it is possible to generate items reflecting levels in between low and high on the latent trait.

While 200 items per construct might sound like a lot of items for an assessment that will ultimately only be around 50 or 60 items long, generating this many items gives us the flexibility we need to choose the items most likely to perform well in empirical testing using pseudo-discrimination (i.e., semantic alignment indices) and pseudo-factor analysis.

import os, csv, time
from openai import OpenAI

# Moral foundation definitions

foundation_definitions = {
    "Care/Harm": """Caring and compassion toward others' suffering versus indifference or acceptance of harm if it is in pursuit of business goals.""",
    "Fairness/Cheating": """Commitment to fairness and proportionality in dealings with others; contrasted with comfort with exploitation or uneven application of standards.""",
    "Loyalty/Betrayal": """Loyalty, patriotism, and self-sacrifice toward one's group; contrasted with indifference to group ties or little concern with disloyalty.""",
    "Authority/Subversion": """Obedience and deference toward legitimate authority and traditions; contrasted with skepticism of authority or comfort with challenging established norms.""",
    "Sanctity/Degradation": """Respect for what is considered pure or noble; contrasted with indifference or disregard toward those standards.""",
    "Liberty/Oppression": """Respect for autonomy and aversion toward excessive control; contrasted with willingness to dominate others and restrict their legitimate freedoms."""
}

foundations = [
    "Care/Harm", "Fairness/Cheating", "Loyalty/Betrayal",
    "Authority/Subversion", "Sanctity/Degradation", "Liberty/Oppression"
]

with open("moral_foundations_items.csv", "w", newline="") as file:
    writer = csv.writer(file)
    writer.writerow(["Foundation", "Item", "Keyed"])
    
    for f in foundations:
        for k, label in [("positively keyed", "Pos"), ("reverse-scored", "Rev")]:
            print(f"Generating {label} items for {f}...")
            prompt = f"""
You are a psychological assessment expert. You write work-related, clear, Likert-style items for measuring moral values in executive leaders.
Generate 100 {k} items for the {f} moral foundation.
Foundation definition:
{foundation_definitions[f]}
Each item must:
- Reflect **attitudes**, not behaviors
- Be relevant to **leadership or workplace contexts**
- Be **no more than 10 words long**
- Make **reversed items subtle so they are not socially undesirable**
- Avoid **double-barreled statements**
- Be written at a **simple reading level** (grade 6–8)
- Use **plain, concise language**
- Avoid **repetitive item phrasing**
Likert scale:
1 = Strongly Disagree ... 5 = Strongly Agree
"""
            try:
                res = client.chat.completions.create(
                    model="gpt-4",  # Updated model name
                    messages=[{"role": "user", "content": prompt}],
                    max_tokens=3000,
                    timeout=60
                )
                # print("Token usage:", res.usage) #
                items = res.choices[0].message.content.strip().split("\n")
                for line in items:
                    writer.writerow([f, line.strip(), label])
                print(f"✓ Done: {len(items)} items")
            except Exception as e:
                print(f"❌ Failed for {f} ({label}):", e)
            time.sleep(2)  # Pause between requests

Step 8. Item post processing

Initial results can often have slght departures from what humans might expect. The generative A.I model might not, for example, create the exact number of items you request. It creates and approximate number. The model may inconsistently intersperse headings between items for different constructs. It may use numbering that we will need to strip before subsequent encoding.

These minor formatting issues can be addressed by further editing the prompt with constraints and checks or, given the modest number of items generated here (1200 items), manually edited to the desired format required for pseudo discrimination analyses in the next step. This is a plain text .csv format with a column for the parent construct, a column for the item, and a column for the key.

Human item reviews are highly recommended

In all cases, we strongly recommend human reviews of the content of all items proposed to go to empirical trials and also stress the need for these empirical evaluations to get indications of actual item discrimination. Human reviews will allow evaluation of when items are too similar and It is always empirical discrimination that is the ultimate parameter of interest in item analysis.

Next section

Model fine tuning

Last section

AI item generation strategies

Return to main page

Psychometrics.ai

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).