- The importance of prompt quality
- What is prompt engineering?
- Criteria for inclusion on the list
- Challenges compiling the list
- Conclusion
- References
The importance of prompt quality
Throughout this book I show examples of written instructions in prompts that allow LLMs undertake psychometric tasks. The tasks range from generating items to checking item assignments to scales and scoring free text passages. See for example, the section on zero shot and few shot item generation prompts.
In this section, I emphasize the need to think carefully about prompts and to experiment with and carefully document all instructions used in psychological assessment. In addition, I now give a brief overview of helpful practices of creating high quality instructions with references for further reading, or prompt engineering.
What is prompt engineering?
Prompt engineering in LLMs refers to ways to structure your instructions to the LLM in a way that’s likely to lead to useful and accurate task performance. The internet is crowded with prompt engineering guides. Some of the best are available from the large model providers, including OpenAI, Anthropic, and Google.
While many of the recommendations resemble good writing (e.g., clarity, specificity), they can have LLM-specific mechanisms. For example, in earlier models, placing instructions at the ends of prompts helped leverage later representations in attention layers (although this is not so necessary with newer models, and system prompts and format prefixes (e.g., JSON, XML) tags can have bigger effects).
Nonetheless, given that many of the tips frequently advocated reflect good writing principles, a case might be made that a prompt engineering overview is not required for this book. In practice, engineering high quality prompts is the primary way that psychologists are likely to interact with LLMs to improve their output on psychometric tasks. We are more likely to carefully craft a prompt than to pre-train a model, or even to fine-tune one.
Criteria for inclusion on the list
I set the following criteria for inclusion of a recommendation to avoid a bland list that doesn’t stretch beyond a good writing guide.
- The recommendation needed to be specific to AI beyond what we might suggest for human-to-human written communication or have an LLM architectural basis such as the instruction placement within the prompt discussed earlier.
- The recommendation needed to be either supported by documented evidence from a key model vendor, on the assumption that those building the models have the most insight into how they work.
Challenges compiling the list
I encountered somechallenges in compiling the list. Including prompts that go beyond common sense was a challenging criterion, because one person’s common sense might sound wise to others. Next, the prompts offer no guarantee of portability. It seems that what works for one model might not work on a second, or even the same model on a subsequent occasion. Finally, there were occasionally nuances.
Assigning personas is recommended in some guidance for priming domain knowledge, while other guidance warns it consumes your limited context budget. Another nuance the widely known 'think step-by-step' instruction. Some treat this as a core technique while others argue any instruction that prompts longer output works equally well. In addition, some newer models with extended thinking modes respond to thinking keywords by allocating additional compute.
Tip # | Prompt engineering suggestion | Source | Description |
1 | Assign LLMs a role. | Anthropic | Role prompting assigns a specific persona (e.g., "experienced data scientist") via a system prompt for enhanced accuracy", "tailored tone", and improved focus. |
2 | Prompting often out-performs fine tuning. | Anthropic | Fine-tuning risks model losing general knowledge during retraining, requires expensive GPU resources and days of time, while prompt engineering is instant, cheaper, and preserves base capabilities |
3 | Avoid overspecified logic. | Anthropic | Encoding brittle if-else logic directly in prompts creates maintenance complexity and breaks easily with edge cases; better to provide high-level guidance with examples in system commands |
4 | Low temperature for factual responses. | OpenAI | Factual extraction and Q&A generally require temperature 0, while creative writing varies by needs, no one-size-fits-all creative needs. |
5 | Too much context causes context rot. | Anthropic | The more information you give a model at once, the worse it gets at remembering details from earlier in the conversation. |
6 | Instructions at start or end improves performance. | Anthropic | OpenAI recommend instructions at the start and encasing the context in triple quotes. Anthropic recommends placing instructions at the end of the prompt for better performance. |
7 | Chain-of-thought prompting improves performance. | Anthropic/ OpenAI | Encouraging LLMs to think arefully before begin and to give reasoning can improve their performance on complex tasks. |
8 | Use XML tagging to separate parts of posts. | Anthropic | Anthropic recommends using XML tags to structure prompts with multiple components to help Claude parse prompts more accurately |
9 | Few-shot prompting is important for pattern learning. | Google | Showing the model examples of what you want is more effective than just describing it as it helps achieve better formatting, accuracy, and pattern matching. |
10 | Lead outputs with partial completions. | Google | You can guide output formatting by starting the response structure yourself (e.g., "Outline: I. Introduction *") and letting the model complete the pattern. |
11 | Use prefixes for prompt sections. | Google | Use prefixes like "Text:" for inputs, "JSON:" for outputs, and labels in examples to signal semantically meaningful parts of your prompt and guide the model's response format. |
12 | Chain prompts for sequential tasks. | Google | Break complex tasks into sequential prompts where each prompt's output becomes the next prompt's input, with the final prompt producing the end result. |
13 | Experiment with sampling parameters. | Google | Experiment with parameters like max tokens (output length), temperature (randomness), topK/topP (token selection), and stop sequences to optimize model responses for your specific task. |
Conclusion
In short, the appealing idea of a perfect prompt is not only elusive but imaginary. This is not to undervalue the importance of prompt enhancements, because LLMs are sensitive to the way prompts are created. It is simply to say that there are many ways to write prompts that achieve high quality psychometric outcomes. Readers should experiment with different versions of prompts using the ideas in the table below and document your prompts thoroughly for inclusion in technical documentation.
References
Google. (n.d.). Prompt design strategies. Google AI for Developers. Retrieved November 10, 2025, from https://ai.google.dev/gemini-api/docs/prompting-strategies
OpenAI. (n.d.). Prompt engineering. OpenAI Platform Documentation. https://platform.openai.com/docs/guides/prompt-engineering
Anthropic. (n.d.). Give Claude a role (prompt engineering). https://docs.anthropic.com/claude/docs/give-claude-a-role
Anthropic. (n.d.). Prompt engineering. https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview
Anthropic. (n.d.). Effective context engineering for AI agents. https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents
OpenAI. (n.d.). Best practices for prompt engineering with the OpenAI API. https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-the-openai-api
Anthropic. (n.d.). Effective context engineering for AI agents. https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents
Anthropic. (n.d.). Prompt engineering overview. https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview
OpenAI. (n.d.). Best practices for prompt engineering with the OpenAI API. https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-the-openai-api
Anthropic. (n.d.). Chain of thought prompting. https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/chain-of-thought
Anthropic. (n.d.). Use XML tags. https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/use-xml-tags
Google. (n.d.). Prompting strategies. https://ai.google.dev/gemini-api/docs/prompting-strategies
Next section
Last section
Semantic convergent and discriminant validity
Return home
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).