Generate Psychometric Scale Items Locally with Ollama: Privacy-First LLM Guide

Michael John Ilagan, PhD

University of Alberta

Carl F. Falk, PhD

Department of Psychology, McGill University

Step 1. Install Ollama and start it
Step 2. Download the local LLM
Step 3. Install the Ollama Python library
Step 4. Open a Jupyter notebook
Step 5. Generate items
Steps 6 & 7. Item post processing and human item reviews

In the previous section, we demonstrated AI item generation using an LLM via API access to a cloud-based system. Under the hood, this required sending the prompts to OpenAI servers as well as receiving a reply from them. At the time of writing this, there is more than one way to run an LLM locally (e.g., on your or your organization's computer), instead of using a model in the cloud. Major options include Huggingface's transformers module, or Ollama.

There are generally security, speed/performance, and cost tradeoffs involved in deciding to run a model locally. Processing everything locally means that there is no need to pay an AI provider (e.g., OpenAI) and no potentially sensitive data is transmitted via the internet. On the other hand, there are overhead costs: One may need to invest in a powerful local machine and GPU in order to be able to run large models locally and obtain results in a reasonable amount of time.

In this chapter, we generate items using Ollama and an open-source model (e.g. gpt-oss) of your choice. The present chapter closely follows the chapter "Generating items via an API", but note that there are different steps, as some steps (e.g. getting an API key) are not applicable to working locally. Some small prompt changes were also implemented. You can download the accompanying Jupyter notebook to run these analyses on your local machine.

Step 1. Install Ollama and start it

Models from Ollama exist in an environment independent from Python yet can be accessed (entirely locally) via an API. One can even start a graphical user interface to chat with Ollama models, entirely locally. It is currently available for major operating systems (Windows, macOS, Linux). To get started, download and install Ollama from here: https://ollama.com/

Once Ollama is downloaded and installed, you may have to start the service. On Linux/MacOS, executing the following command in a terminal allows you to send requests to it in later steps. Note that the ampersand is important, as it indicates that Ollama is to be run as a background process. This can't be run in a Jupyter cell because Jupyter does not support background processes.

# Run as shell command to start Ollama
ollama serve &

If using Windows, Ollama may already be opened as a background process by default. Look for the Ollama icon (a llama) in the system tray. Alternatively, the process can be started at the command-line terminal with the same command above (but omitting the ampersand), or by finding and launching Ollama via the start menu/search bar.

Step 2. Download the local LLM

To download a local LLM for use with Ollama (only required once), execute the following command in a terminal window or from within Jupyter by prefixing with ! (in Jupyter is fine here, as the command does not involve a background process).

In this example, we use the model gpt-oss:20b (i.e. Open AI's open weight reasoning model with 20 billion parameters), which you can substitute with the model of your choice.

Please note that gpt-oss:20b takes 13 GB and it can take 16 minutes to download. If you don’t have enough memory for this model, you may get the error "model requires more system memory than is available”.

# Run as shell command to download an LLM, first time only
!ollama pull gpt-oss:20b

To see the possible models you can use, go to Ollama's model catalog.

Of course, it is possible to have multiple models downloaded. When you send a request to Ollama in a later step, you will specify which model to use. To see a list of models you have downloaded, execute the following command in a terminal window.

# Run as shell command to see list of downloaded LLMs
!ollama list

Step 3. Install the Ollama Python library

To install the Ollama Python library (only required once), execute the following command in a terminal window. Doing so lets us send requests to Ollama from our Python code.

# Run as shell command to install Ollama Python library
# !pip install ollama

Step 4. Open a Jupyter notebook

If desired to execute Python comments in a Jupyter Notebook, navigate in the terminal to the folder where you are storing your work and run the command jupyter notebook. This will launch a new Jupyter notebook Interface on the local host.

# Run as shell command to start Jupyter
# jupyter notebook

Now once we select ‘new notebook’ we are ready to code.

Step 5. Generate items

Requests to Ollama have multiple parameters or arguments. Most important is the model handle, which is the name of the model you previously downloaded and now want to use.

In addition, there are options, such and the maximum number of tokens generated (called num_predict in Ollama), the reproducibility seed (called seed in Ollama), and ensure model requests do not time out by picking a large number for keep_alive (here, 1 hour). In Python, we store this information now to be used in later code.

# ollama arguments
llm_handle = 'gpt-oss:20b'
ollama_options = {'seed' : 781, 'num_predict' : 3000, 'keep_alive' : "1h"}

Now we’re ready to generate items. In this example, we will make 12 API calls to systematically generate items for the moral foundations. In this example, we generate 25 positively keyed items and 25 reverse scored items for each moral foundation. Although one may attempt to generate a larger number of items as in the "Generating items via an API" chapter, testing and debugging local LLMs may be easier with smaller batches of items.

Without more powerful hardware (use of GPUs is not yet demonstrated) or a much smaller parameter model, responses from the LLM will be slower. In the example here, we again demonstrate guided item generation without sample items but sample items can easily be incorporated when they are available.

If the model runs correctly, you can expect to see the following output, with a .csv file of items saved to your current working directory:

Generating Pos items for Care/Harm... ✓ Done: 25 items Generating Rev items for Care/Harm... ✓ Done: 26 items Generating Pos items for Fairness/Cheating... ✓ Done: 25 items Generating Rev items for Fairness/Cheating... ✓ Done: 25 items Generating Pos items for Loyalty/Betrayal... ✓ Done: 25 items Generating Rev items for Loyalty/Betrayal... ✓ Done: 25 items Generating Pos items for Authority/Subversion... ✓ Done: 25 items Generating Rev items for Authority/Subversion... ✓ Done: 25 items Generating Pos items for Sanctity/Degradation... ✓ Done: 25 items Generating Rev items for Sanctity/Degradation... ✓ Done: 25 items Generating Pos items for Liberty/Oppression... ✓ Done: 25 items Generating Rev items for Liberty/Oppression... ✓ Done: 25 items

Steps 6 & 7. Item post processing and human item reviews

From this point forward, Ollama is no longer used. These remaining steps are the same as in the chapter "Generating items via an API". As a re-cap, the generative A.I. model might not always perform exactly as expected. It may generate a slightly different number of items than expected and the formatting may not be quite ready for pilot/field testing with humans due to additional numbering or added text explanation. Some of these issues can sometimes be addressed with additional prompt engineering, which we have attempted to do so in this example.

In addition, not all items may follow exactly what is expected of the prompt. In some test cases with models with fewer parameters, we noticed that the model had more frequent problems with understanding positively/negatively keyed items and getting this wording correct. If more items are desired, consider changing the random number seed and the output file name, then re-running the code.

In all cases, we strongly recommend human reviews of the content of all items proposed to go to empirical trials and also stress the need for these empirical evaluations to get indications of actual item discrimination. Human reviews will allow evaluation of when items are too similar and It is always empirical discrimination that is the ultimate parameter of interest in item analysis.

Next page

Retrieval Augmented Generation (RAG)

Previous page

Generating items via an API

Return home

Psychometrics.ai

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).