Prompt design enables users who are new to machine learning (ML) control model output with minimal overhead. By carefully crafting prompts, you can nudge the model to generate a desired result. Prompt design is an efficient way to experiment with adapting a language model for a specific use case.
Language models, especially large language models (LLM), are trained on vast amounts of text data to learn the patterns and relationships between words. When given some text (the prompt), language models can predict what is likely to come next, like a sophisticated autocompletion tool. Therefore, when designing prompts, consider the different factors that can influence what a model predicts comes next.
While there's no right or wrong way to design a prompt, there are common strategies that you can use to affect the model's responses. This section introduces you to some common prompt design strategies.
Give clear instructions
Giving the model instructions means telling the model what to do. This strategy can be an effective way to customize model behavior. Ensure that the instructions you give are clear and concise.
The following prompt provides a block of text and tells the model to summarize it:
The model provided a concise summary, but maybe you want the summary to be written in a way that's easier to understand. For example, the following prompt includes an instruction to write a summary that's simple enough for a fifth grader to understand:
The instruction to write the summary so a fifth grader can understand it resulted in a response that's easier to understand.
- Give the models instructions to customize its behavior.
- Make each instruction clear and concise.
You can include examples in the prompt that show the model what getting it right looks like. The model attempts to identify patterns and relationships from the examples and apply them to form a response. Prompts that contain a few examples are called few-shot prompts, while prompts that provide no examples are called zero-shot prompts. Few-shot prompts are often used to regulate the formatting, phrasing, scoping, or general patterning of model responses.
Zero-shot vs few-shot prompts
The following zero-shot prompt asks the model to choose the best explanation.
If your use case requires the model to produce concise responses, you can include examples in the prompt that give preference to concise responses.
The following prompt provides two examples that show preference to the shorter explanations. In the response, you can see that the examples guided the model to choose the shorter explanation (Explanation2) as opposed to the longer explanation (Explanation1) like it did previously.
Find the optimal number of examples
You can experiment with the number of examples to provide in the prompt for the most desired results. Models like PaLM can often pick up on patterns using a few examples, though you may need to experiment with what number of examples lead to the desired results. For simpler models like BERT, you may need more examples. At the same time, if you include too many examples, the model may start to overfit the response to the examples.
Use examples to show patterns instead of antipatterns
Using examples to show the model a pattern to follow is more effective than using examples to show the model an antipattern to avoid.
⛔ Negative pattern:
✅ Positive pattern:
- Including prompt-response examples in the prompt helps the model learn how to respond.
- Give the model examples of the patterns to follow instead of examples of patterns to avoid.
- Experiment with the number of prompts to include. Depending on the model, too few examples are ineffective at changing model behavior. Too many examples cause the model to overfit.
Let the model complete partial input
Generative language models work like an advanced autocompletion tool. When you provide partial content, the model can provide the rest of the content or what it thinks is a continuation of that content as a response. When doing so, if you include any examples or context, the model can take those examples or context into account.
The following example provides a prompt with an instruction and an entity input:
While the model did as prompted, writing out the instructions in natural language can sometimes be challenging. In this case, you can give an example and a response prefix and let the model complete it:
Notice how "waffles" was excluded from the output because it wasn't listed in the context as a valid field.
Prompt the model to format its response
The completion strategy can also help format the response. The following example prompts the model to create an essay outline:
The prompt didn't specify the format for the outline and the model chose a format for you. To get the model to return an outline in a specific format, you can add text that represents the start of the outline and let the model complete it based on the pattern that you initiated.
- If you give the model a partial input, the model completes that input based on any available examples or context in the prompt.
- Having the model complete an input may sometimes be easier than describing the task in natural language.
- Adding a partial answer to a prompt can guide the model to follow a desired pattern or format.
Add contextual information
You can include in the prompt instructions and information that the model needs to solve a problem instead of assuming that the model has all of the required information.
The following example asks the model to give troubleshooting guidance for a router:
- Include information (context) in the prompt that you want the model to use when generating a response.
- Give the model instructions on what to do.
A prefix is a word or phrase that you add to the prompt content that can serve several purposes, depending on where you put the prefix:
- Input prefix: Adding a prefix to the input signals semantically meaningful parts of the input to the model. For example, the prefixes "English:" and "French:" demarcate two different languages.
- Output prefix: Even though the output is generated by the model, you can add a prefix for the output in the prompt. The output prefix gives the model information about what's expected as a response. For example, the output prefix "JSON:" signals to the model that the output should be in JSON format.
- Example prefix: In few-shot prompts, adding prefixes to the examples provide labels that the model can use when generating the output, which makes it easier to parse output content.
In the following example, "Text:" is the input prefix and "The answer is:" is the output prefix.
Experiment with different parameter values
Each call that you send to a model includes parameter values that control how the model generates a response. The model can generate different results for different parameter values. Experiment with different parameter values to get the best values for the task. The parameters available for different models may differ. The most common parameters are the following:
- Max output tokens
Max output tokensMaximum number of tokens that can be generated in the response. A token is approximately four characters. 100 tokens correspond to roughly 60-80 words.
Specify a lower value for shorter responses and a higher value for longer responses.
TemperatureThe temperature is used for sampling during response generation, which occurs when
topKare applied. Temperature controls the degree of randomness in token selection. Lower temperatures are good for prompts that require a more deterministic and less open-ended or creative response, while higher temperatures can lead to more diverse or creative results. A temperature of
0is deterministic, meaning that the highest probability response is always selected.
For most use cases, try starting with a temperature of
the model returns a response that's too generic, too short, or the model gives a
fallback response, try increasing the temperature.
Top-KTop-K changes how the model selects tokens for output. A top-K of
1means the next selected token is the most probable among all tokens in the model's vocabulary (also called greedy decoding), while a top-K of
3means that the next token is selected from among the three most probable tokens by using temperature.
For each token selection step, the top-K tokens with the highest probabilities are sampled. Then tokens are further filtered based on top-P with the final token selected using temperature sampling.
Specify a lower value for less random responses and a higher value for more
random responses. The default top-K is
Top-PTop-P changes how the model selects tokens for output. Tokens are selected from the most (see top-K) to least probable until the sum of their probabilities equals the top-P value. For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1 and the top-P value is
0.5, then the model will select either A or B as the next token by using temperature and excludes C as a candidate.
Specify a lower value for less random responses and a higher value for more
random responses. The default top-P is
Prompt iteration strategies
Prompt design is an iterative process that often requires a few iterations before you get the desired response consistently. This section provides guidance on some things you can try when iterating on your prompts.
Use different phrasing
Using different words or phrasing in your prompts often yield different responses from the model even though they all mean the same thing. If you're not getting the expected results from your prompt, try rephrasing it.
Switch to an analogous task
If you can't get the model to follow your instructions for a task, try giving it instructions for an analogous task that achieves the same result.
This prompt tells the model to categorize a book by using predefined categories.
The response is correct, but the model didn't stay within the bounds of the options. You also want to model to just respond with one of the options instead of in a full sentence. In this case, you can rephrase the instructions as a multiple choice question and ask the model to choose an option.
Change the order of prompt content
The order of the content in the prompt can sometimes affect the response. Try changing the content order and see how that affects the response.
Version 1: [examples] [context] [input] Version 2: [input] [examples] [context] Version 3: [examples] [input] [context]
A fallback response is a response returned by the model when either the prompt or the response triggers a safety filter. An example of a fallback response is "I'm not able to help with that, as I'm only a language model."
If the model responds with a fallback response, try increasing the temperature.
Things to avoid
- Avoid relying on models to generate factual information.
- Use with care on math and logic problems.
This page provides general prompt design guidance. To learn about task-specific guidance for common use cases see the following pages:
- Text prompts
- Chat prompts
- Code generation prompts
- Code chat prompts
- Code completion prompts
- Image generation and editing prompts
- See some sample prompts.
- Learn how to test prompts.
- Learn about responsible AI best practices and Vertex AI's safety filters.
- Learn how to tune a model.