Tuning a foundation model can improve its performance. Foundation models are pretrained using prompt design strategies, such as few-shot prompting. Sometimes the pretrained models don't perform tasks as well as you'd like them to. This might be because the tasks you want the model to perform are specialized tasks that are difficult to teach a model with only prompt design. In those cases, you can use model tuning to improve the performance of a model on specific tasks. Model tuning can also help it adhere to specific output requirements when instructions aren't sufficient. This page provides an overview of model tuning, describes the tuning options available on Vertex AI, and helps you determine when each tuning option should be used.
Model tuning overview
Model tuning works by providing a model with a training dataset that contains many examples of a unique task. For unique, niche tasks, you can get significant improvements in model performance by tuning the model on a modest number of examples. After you tune a model, fewer examples are required in its prompts.
Vertex AI supports the following methods to tune language models:
Supervised tuning - Supervised tuning of a text model is a good option when the output of your model isn't complex and is relatively easy to define. Supervised tuning is recommended for classification, sentiment analysis, entity extraction, summarization of content that's not complex, and writing domain-specific queries. For code models, supervised tuning is the only option.
Reinforcement learning from human feedback (RLHF) tuning - RLHF tuning is a good option when the output of your model is complex. RLHF works well on models with two objectives that aren't easily differentiated with supervised tuning. RLHF tuning is recommended for question answering, summarization of complex content, and content creation, such as a rewrite. RLHF tuning isn't supported by code models.
Supervised tuning
Supervised tuning improves the performance of a model by teaching it a new skill. Data that contains hundreds of labeled examples is used to teach the model to mimic a desired behavior or task. Each labeled example demonstrates what you want the model to output during inference.
When you run a supervised tuning job, the model learns additional parameters that help it encode the necessary information to perform the desired task or learn the desired behavior. These parameters are used during inference. The output of the tuning job is a new model that combines the newly learned parameters with the original model.
Models that support supervised tuning
The following text foundation models support supervised tuning:
- Text generation -
text-bison@001
- Text chat -
chat-bison@001
- Code generation -
code-bison@001
(preview) - Code chat -
codechat-bison@001
(preview) - Text embeddings -
textembedding-gecko@001
(preview)
Workflow for supervised model tuning
The supervised model tuning workflow on Vertex AI includes the following steps:
- Prepare your model tuning dataset.
- Upload the model tuning dataset to a Cloud Storage bucket.
- Create a supervised model tuning job.
After model tuning completes, the tuned model is deployed to a Vertex AI endpoint. The name of the endpoint is the same as the name of the tuned model. Tuned models are available to select in Generative AI Studio when you want to create a new prompt.
To learn about tuning a text model with supervised tuning, see Tune text models by using supervised tuning. To learn about tuning a code model with supervised tuning, see Tune code models.
RLHF model tuning
Reinforcement learning from human feedback (RLHF) uses preferences specified by humans to optimize a language model. By using human feedback to tune your models, you can make the models better align with human preferences and reduce undesired outcomes in scenarios where people have complex intuitions about a task. For example, RLHF can help with an ambiguous task, such as how to write a poem about the ocean, by offering a human two poems about the ocean and letting that person choose their preferred one.
Models that support RLHF tuning
The following text models support RLHF tuning:
- The text generation foundation model,
text-bison@001
. For more information, see Text generation model. - The
t5-small
,t5-large
,t5-xl
, andt5-xxl
Flan text-to-text transfer transformer (Flan-T5) models. Flan-T5 models can be fine-tuned to perform tasks such as text classification, language translation, and question answering. For more information, see Flan-T5 checkpoints.
Code models don't support RLHF tuning.
Workflow for RLHF model tuning
The RLHF model tuning workflow on Vertex AI includes the following steps:
- Prepare your human preference dataset.
- Prepare your prompt dataset.
- Upload your datasets to Cloud Storage bucket. They don't need to be in the same Cloud Storage bucket.
- Create a RLHF model tuning job.
After model tuning completes, the tuned model is deployed to a Vertex AI endpoint. The name of the endpoint is the same as the name of the tuned model. Tuned models are available to select in Generative AI Studio when you want to create a new prompt.
To learn about tuning a text model by using RLHF tuning, see Tune text models by using RLHF tuning.
RLHF tuning region settings
RLHF tuning supports the following two regions:
us-central1
- If you choose this region, then 8 Nvidia A100 80GB GPUs are used.europe-west4
- If you choose this region, then 64 cores of the TPU v3 pod are used.
The region you choose is where Vertex AI tunes the model and then uploads the tuned model.
Quota
Each Google Cloud project should have enough default quota to run one tuning job. However, if your project doesn't have enough quota for one tuning job, or if you want to run multiple concurrent tuning jobs in your project, you need to request additional quota.
The following table shows the type and amount of quota to request depending on the region where you specified for tuning to take place:
Region | Resource quota | Amount per concurrent job |
---|---|---|
|
|
8 |
|
96 | |
|
|
64 |
Pricing
When you tune a foundation model, you pay the cost to run the tuning pipeline. When you deploy a tuned foundation model to a Vertex AI endpoint, you aren't charged for hosting. For serving predictions, you pay the same price as you pay when serving predictions using an untuned foundation model. To learn which foundation models can be tuned, see Foundation models. For pricing details, see Pricing for Generative AI on Vertex AI.
What's next
- Learn how to tune a foundation model using supervised tuning.
- Learn how to tune a foundation model using RLHF tuning.
- Learn how to tune a code model.
- Learn how to evaluate a tuned model.