Generate grounded answers

As part of your RAG experience in Vertex AI Agent Builder, you can generate grounded answers to prompts based on the following grounding sources:

  • Google Search: Use grounding with Google Search if you want to connect the model with world knowledge, a wide range of topics, or up-to-date information on the internet.
  • Inline text: Use grounding with inline text to ground the answer in pieces of text called fact text that are provided in the request. A fact text is a user-provided statement that is considered to be factual for a given request. The model doesn't check the authenticity of the fact text.
  • Vertex AI Search data stores: Use grounding with Vertex AI Search if you want to connect the model to your enterprise documents from Vertex AI Search data stores.

This page describes how to generate grounded answers based on these grounding sources using the following approaches:

  • Single-turn answer generation
  • Multi-turn answer generation

Additionally, you can choose to stream the answers from the model. Generating a grounded answer by streaming is an Experimental feature.

Terminology

Before you use the grounded answer generation method, it helps to understand the inputs and outputs and how to structure your request.

Input data

The grounded answer generation method requires the following inputs in the request:

  • Role: The sender of a given text that's either a user (user) or a model (model).

  • Text: When the role is user, the text is a prompt and when the role is model, the text is a grounded answer. How you specify the role and text in a request is determined as follows:

    • For a single-turn answer generation, the user sends the prompt text in the request and the model sends the answer text in the response.
    • For a multi-turn answer generation, the request contains the prompt-answer pair for all the previous turns and the prompt text from the user for the current turn. Therefore, in such a request, the role is user for a prompt text and it is model for the answer text.
  • Grounding source: The source in which the answer is grounded and can be one or more of the following:

    • Google Search: Ground the answers in Google Search results.
    • Inline text: Ground the answer in fact text that is provided in the request. A fact text is a user-provided statement that is considered to be factual for a given request. The model doesn't check the authenticity of the fact text. You can provide a maximum of 100 fact texts in each inline text source. The fact texts can be supported using meta attributes, such as title, author and URI. These meta attributes are returned in the response when quoting the chunks that support the answer.
    • Vertex AI Search data stores: Ground the answer in the documents from Vertex AI Search data stores.

    In a given request, you can provide both an inline text source and a Vertex AI Search data store source. You cannot combine Google Search with either of these sources. Therefore, if you want to ground your answers in Google Search results, you must send a separate request specifying Google Search as the only grounding source.

    You can provide a maximum of 10 grounding sources in any order. For example, suppose that you provide the grounding sources with the following count, in the following order to obtain a total of 10 grounding sources:

    • Three inline text sources, each of which can contain a maximum of 100 fact texts
    • Six Vertex AI Search data stores
    • Four inline text sources, each of which can contain a maximum of 100 fact texts

    Each source is assigned an index in the order in which it is specified in the request. For example, if you have specified a combination of sources in your request, then the source index is assigned as illustrated in the following table:

    Grounding source Index
    Inline text #1 0
    Inline text #2 1
    Vertex AI Search data store #1 2
    Inline text #3 3
    Vertex AI Search data store #2 4

    This index is cited in the response and is helpful when tracing the provenance.

  • Generation specifications: The specifications for model configuration that consist of the following information:

    • Model ID: Specifies the Vertex AI Gemini model to use for answer generation. For a list of models that you can use to generate grounded answers, see Supported models.
    • Model parameters: Specify the parameters that you can set for the model that you choose to use. These parameters are: language, temperature, top-P, and top-K. For details about these parameters, see Gemini model parameters.

Output data

The response that the model generates is called a candidate and it contains the following:

  • Role: The sender of the grounded answer. The response always contains the grounded answer text. Therefore, the role in a response is always a model.

  • Text: A grounded answer.

  • Grounding score: A float value between 0 and 1 that indicates how well an answer is grounded in the given sources.

  • Grounding metadata: Metadata about the grounding source. Grounding metadata contains the following information:

    • Support chunks: A list of chunks that support the answer. Each support chunk is assigned a support chunk index that is helpful when tracing the provenance. Each support chunk contains the following:

      • Chunk text: A portion of text quoted verbatim from the source from which the answer or a part of answer (called the claim text) is extracted. This might not always be present in the response.
      • Source: An index assigned to the source in the request.
      • Source metadata: Metadata about the chunk. Depending on the source, the source metadata can be any of the following:
        • For an inline source, it can be the additional details that were specified in the request such as title, author, or URI.
        • For the Vertex AI Search data store, it can be the document ID, document title, the URI (Cloud Storage location), or the page number.
    • Grounding support: Grounding information for a claim in the answer. Grounding support contains the following information:

      • Claim text: The answer or a part of the answer that is substantiated with the support chunk text.
      • Support chunk index: An index assigned to the support chunk in the order in which the chunk appears in the list of support chunks.
      • Web search queries: The suggested search queries for the Google Search entry point.
      • Search entry point: A Google Search entry point that you must include in the rendered results for your search deployment. For more information, see Use Google Search entry point.

Generate a grounded answer in a single turn

This section describes how to generate answers grounded in the following sources:

Ground the answer in inline text and Vertex AI Search data store

The following sample shows how to send prompt text by specifying an inline text and a Vertex AI Search data store as the grounding source. This sample uses the generateGroundedContent method.

REST

  1. Send the prompt in the following curl request.

    curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    "http://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global:generateGroundedContent" \
    -d '
    {
    "contents": [
     {
       "role": "user",
       "parts": [
         {
           "text": "PROMPT_TEXT"
         }
       ]
     }
    ],
    "groundingSpec": {
     "groundingSources": [
       {
         "inlineSource": {
           "groundingFacts": [
             {
               "factText": "FACT_TEXT_1",
               "attributes": {
                 "title": "TITLE_1",
                 "uri": "URI_1",
                 "author": "AUTHOR_1"
               }
             }
           ]
         }
       },
       {
         "inlineSource": {
           "groundingFacts": [
             {
               "factText": "FACT_TEXT_2",
               "attributes": {
                 "title": "TITLE_2",
                 "uri": "URI_2"
               }
             },
             {
               "factText": "FACT_TEXT_3",
               "attributes": {
                 "title": "TITLE_3",
                 "uri": "URI_3"
               }
             }
           ]
         }
       },
       {
         "searchSource": {
           "servingConfig": "projects/PROJECT_ID/locations/global/collections/default_collection/engines/APP_ID_1/servingConfigs/default_search"
         }
       },
       {
         "searchSource": {
           "servingConfig": "projects/PROJECT_ID/locations/global/collections/default_collection/engines/APP_ID_2/servingConfigs/default_search"
         }
       }
      ]
    },
    "generationSpec": {
    "modelId": "MODEL_ID",
    "temperature": TEMPERATURE,
    "topP": TOP_P,
    "topK": TOP_K
    }
    }'
    

    Replace the following:

    • PROJECT_ID: the ID of your Google Cloud project.
    • PROMPT_TEXT: the prompt from the user.
    • FACT_TEXT_N: the inline text to ground the answer. You can provide a maximum of 100 fact texts.
    • TITLE_N: the title meta attribute for the inline text.
    • URI_N: the uri meta attribute for the inline text.
    • AUTHOR_N: the author meta attribute for the inline text.
    • APP_ID_N: the ID of the Vertex AI Search app.
    • MODEL_ID: the model ID of the Gemini model that you'd like to use to generate the grounded answer. For a list of available model IDs, see Supported models.
    • TEMPERATURE: an optional field to set the temperature used for sampling. Google recommends a temperature of 0.0. For more information, see Gemini model parameters.
    • TOP_P: an optional field to set the top-P value for the model. For more information, see Gemini model parameters.
    • TOP_K: an optional field to set the top-K value for the model. For more information, see Gemini model parameters.

Example for single-turn answer generation grounded in inline text and Vertex AI Search

In the following example, the request specifies the following grounding sources: one inline text fact and one Vertex AI Search data store. This sample uses the generateGroundedContent method.

REST

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"http://discoveryengine.googleapis.com/v1/projects/{project_id}/locations/global:generateGroundedContent" \
-d '
{
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "text": "How did google do in 2020? Where can I find Bigquery docs?"
        }
      ]
    }
  ],
  "groundingSpec": {
    "groundingSources": [
      {
        "inline_source": {
          "grounding_facts": [
            {
              "fact_text": "The BigQuery documentation can be found at http://cloud.go888ogle.com.fqhub.com/bigquery/docs/introduction",
              "attributes": {
                "title": "BigQuery Overview",
                "uri": "http://cloud.go888ogle.com.fqhub.com/bigquery/docs/introduction"
              }
            }
          ]
        }
      },
      {
        "searchSource": {
          "servingConfig": "projects/{project_id}/locations/global/collections/default_collection/engines/{app_id}/servingConfigs/default_search"
        }
      }
    ]
  },
  "generationSpec": {
    "modelId": "gemini-1.5-flash"
  }
}'

Ground the answer in Google Search

The following sample shows how to send prompt text by specifying Google Search as the grounding source.

This sample uses the generateGroundedContent method.

REST

  1. Send the prompt in the following curl request.

    curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    "http://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global:generateGroundedContent" \
    -d '
    {
    "contents": [
     {
       "role": "user",
       "parts": [
         {
           "text": "PROMPT_TEXT"
         }
       ]
     }
    ],
    "groundingSpec": {
     "groundingSources": [
       {
         "googleSearchSource": {}
       }
     ]
    },
    "generationSpec": {
     "modelId": "MODEL_ID",
     "temperature": TEMPERATURE,
     "topP": TOP_P,
     "topK": TOP_K
    }
    }'
    

    Replace the following:

    • PROJECT_ID: the ID of your Google Cloud project.
    • PROMPT_TEXT: the prompt from the user.
    • MODEL_ID: the model ID of the Gemini model that you'd like to use to generate the grounded answer. For a list of available model IDs, see Supported models.
    • TEMPERATURE: an optional field to set the temperature used for sampling. Google recommends a temperature of 0.0. For more information, see Gemini model parameters.
    • TOP_P: an optional field to set the top-P value for the model. For more information, see Gemini model parameters.
    • TOP_K: an optional field to set the top-K value for the model. For more information, see Gemini model parameters.

In the following example, the request specifies Google Search as the grounding source. This sample uses the generateGroundedContent method.

REST

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"http://staging-discoveryengine.sandbox.googleapis.com/v1alpha/projects/189082903055/locations/global:generateGroundedContent" \
-d '
{
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "text": "What is vertex ai agent builder?"
        }
      ]
    }
  ],
  "groundingSpec": {
    "groundingSources": [
      {
        "googleSearchSource": {}
      }
    ]
  },
  "generationSpec": {
    "modelId": "gemini-1.5-flash"
  }
}'

Generate a grounded answer in multiple turns

In multi-turn answer generation, in each request you must send all the text exchanged between the user and the model in all the previous turns. This ensures continuity and maintains context to generate the answer for the latest prompt.

To obtain a grounded answer by multi-turn answer generation, do the following:

REST

The following samples show how to send follow-up prompt text over multiple turns. These samples use the generateGroundedContent method and ground the answers in Google Search. You can use similar steps to generate grounded answers using other grounding sources.

  1. Send the first prompt in the following curl request.

    curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    "http://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global:generateGroundedContent" \
    -d '
    {
    "contents": [
     {
       "role": "user",
       "parts": [
         {
           "text": "PROMPT_TEXT_TURN_1"
         }
       ]
     }
    ],
    "groundingSpec": {
     "groundingSources": [
       {
         "googleSearchSource": {}
       }
     ]
    },
    "generationSpec": {
     "modelId": "MODEL_ID",
     "temperature": TEMPERATURE,
     "topP": TOP_P,
     "topK": TOP_K
    }
    }'
    

    Replace the following:

    • PROJECT_ID: the ID of your Google Cloud project.
    • PROMPT_TEXT_TURN_1: the prompt text from the user in the first turn.
    • MODEL_ID: the model ID of the Gemini model that you'd like to use to generate the grounded answer. For a list of available model IDs, see Supported models.
    • TEMPERATURE: an optional field to set the temperature used for sampling. Google recommends a temperature of 0.0. For more information, see Gemini model parameters.
    • TOP_P: an optional field to set the top-P value for the model. For more information, see Gemini model parameters.
    • TOP_K: an optional field to set the top-K value for the model. For more information, see Gemini model parameters.
  2. Send the second prompt as a follow-up. Add the first prompt from the user followed by its corresponding answer from the model for context.

    curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    "http://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global:generateGroundedContent" \
    -d '
    {
    "contents": [
     {
       "role": "user",
       "parts": [
         {
           "text": "PROMPT_TEXT_TURN_1"
         }
       ]
     },
     {
       "role": "model",
       "parts": [
         {
           "text": "ANSWER_TEXT_TURN_1"
         }
       ]
     },
     {
       "role": "user",
       "parts": [
         {
           "text": "PROMPT_TEXT_TURN_2"
         }
       ]
     }
    ],
    "groundingSpec": {
     "groundingSources": [
       {
         "googleSearchSource": {}
       }
     ]
    },
    "generationSpec": {
     "modelId": "MODEL_ID",
     "temperature": TEMPERATURE,
     "topP": TOP_P,
     "topK": TOP_K
    }
    }'
    

    Replace the following:

    • PROJECT_ID: the ID of your Google Cloud project.
    • PROMPT_TEXT_TURN_1: the prompt text from the user in the first turn.
    • ANSWER_TEXT_TURN_1: the answer text from the model in the first turn.
    • PROMPT_TEXT_TURN_2: the prompt text from the user in the second turn.
    • MODEL_ID: the model ID of the Gemini model that you'd like to use to generate the grounded answer. For a list of available model IDs, see Supported models.
    • TEMPERATURE: an optional field to set the temperature used for sampling. Google recommends a temperature of 0.0. For more information, see Gemini model parameters.
    • TOP_P: an optional field to set the top-P value for the model. For more information, see Gemini model parameters.
    • TOP_K: an optional field to set the top-K value for the model. For more information, see Gemini model parameters.
  3. Repeat this process to get further follow-up answers. In each turn, add all the previous prompts from the user followed by their corresponding answers from the model.

Example for multi-turn answer generation

In the following example, the request specifies three inline fact texts as the grounding source to generate answers over two turns. This sample uses the generateGroundedContent method.

REST

  1. Send the first prompt in the following curl request.

    curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    "http://discoveryengine.googleapis.com/v1/projects/201636599127/locations/global:generateGroundedContent" \
    -d '
    {
    "contents": [
     {
       "role": "user",
       "parts": [
         {
           "text": "Summarize what happened in 2023 in one paragraph."
         }
       ]
     }
    ],
    "grounding_spec": {
     "grounding_sources": [
       {
         "inline_source": {
           "grounding_facts": [
             {
               "fact_text": "In 2023, the world population surpassed 8 billion. This milestone marked a significant moment in human history, highlighting both the rapid growth of our species and the challenges of resource management and sustainability in the years to come.",
               "attributes": {
                 "title": "title_1",
                 "uri": "some-uri-1"
               }
             }
           ]
         }
       },
       {
         "inline_source": {
           "grounding_facts": [
             {
               "fact_text": "In 2023, global e-commerce sales reached an estimated $5.7 trillion. The continued rise of online shopping solidified its position as a dominant force in retail, with major implications for traditional brick-and-mortar stores and the logistics networks supporting worldwide deliveries.",
               "attributes": {
                 "title": "title_2",
                 "uri": "some-uri-2"
               }
             }
           ]
         }
       },
       {
         "inline_source": {
           "grounding_facts": [
             {
               "fact_text": "In 2023, the global average surface temperature was approximately 0.2 degrees Celsius higher than the 20th-century average. This continued the worrying trend of global warming, underscoring the urgency of worldwide climate initiatives, carbon reduction efforts, and investment in renewable energy sources.",
               "attributes": {
                 "title": "title_3",
                 "uri": "some-uri-3"
               }
             }
           ]
         }
       }
     ]
    },
    "generationSpec": {
     "modelId": "gemini-1.5-flash"
    }
    }'
    
  2. Send the second prompt as a follow-up. Add the first prompt from the user followed by its corresponding answer from the model for context.

    curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    "http://discoveryengine.googleapis.com/v1/projects/201636599127/locations/global:generateGroundedContent" \
    -d '
    {
    "contents": [
    {
    "role": "user",
    "parts": [
    {
      "text": "Summarize what happened in 2023 in one paragraph."
    }
    ]
    },
    {
    "role": "model",
    "parts": [
    {
      "text": "In 2023, the global average surface temperature increased, highlighting the urgency of climate action. The world population surpassed 8 billion, emphasizing the need for sustainable resource management. Global e-commerce sales reached $5.7 trillion, solidifying online shopping as a dominant force in retail. \n"
    }
    ]
    },
    {
    "role": "user",
    "parts": [
    {
      "text": "Rephrase the answer in an abstracted list."
    }
    ]
    }
    ],
    "grounding_spec": {
    "grounding_sources": [
    {
    "inline_source": {
      "grounding_facts": [
        {
          "fact_text": "In 2023, the world population surpassed 8 billion. This milestone marked a significant moment in human history, highlighting both the rapid growth of our species and the challenges of resource management and sustainability in the years to come.",
          "attributes": {
            "title": "title_1",
            "uri": "some-uri-1"
          }
        }
      ]
    }
    },
    {
    "inline_source": {
      "grounding_facts": [
        {
          "fact_text": "In 2023, global e-commerce sales reached an estimated $5.7 trillion. The continued rise of online shopping solidified its position as a dominant force in retail, with major implications for traditional brick-and-mortar stores and the logistics networks supporting worldwide deliveries.",
          "attributes": {
            "title": "title_2",
            "uri": "some-uri-2"
          }
        }
      ]
    }
    },
    {
    "inline_source": {
      "grounding_facts": [
        {
          "fact_text": "In 2023, the global average surface temperature was approximately 0.2 degrees Celsius higher than the 20th-century average. This continued the worrying trend of global warming, underscoring the urgency of worldwide climate initiatives, carbon reduction efforts, and investment in renewable energy sources.",
          "attributes": {
            "title": "title_3",
            "uri": "some-uri-3"
          }
        }
      ]
    }
    }
    ]
    },
    "generationSpec": {
    "modelId": "gemini-1.5-flash"
    }
    }
    '
    

Stream grounded answers

You can choose to stream the answers from the model. This is useful in those use cases where the answer is especially long and sending the entire response all at once causes a significant delay. Streaming the answer breaks down the response into an array of several candidates that contain sequential parts of the answer text.

To obtain a streamed, grounded answer, do the following:

REST

The following sample shows how to stream a grounded answer. This sample uses the streamGenerateGroundedContent method and grounds the answer in Google Search. You can use similar steps to generate grounded answers using other grounding sources.

  1. Send the prompt in the following curl request.

    curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    "http://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global:streamGenerateGroundedContent" \
    -d '
    [
    {
     "contents": [
       {
         "role": "user",
         "parts": [
           {
             "text": "PROMPT_TEXT"
           }
         ]
       }
     ],
     "groundingSpec": {
       "groundingSources": [
         {
           "googleSearchSource": {}
         }
       ]
     },
    "generationSpec": {
     "modelId": "MODEL_ID",
     "temperature": TEMPERATURE,
     "topP": TOP_P,
     "topK": TOP_K
    }
    }'
    

    Replace the following:

    • PROJECT_ID: the ID of your Google Cloud project.
    • PROMPT_TEXT: the prompt from the user.
    • MODEL_ID: the model ID of the Gemini model that you'd like to use to generate the grounded answer. For a list of available model IDs, see Supported models.
    • TEMPERATURE: an optional field to set the temperature used for sampling. Google recommends a temperature of 0.0. For more information, see Gemini model parameters.
    • TOP_P: an optional field to set the top-P value for the model. For more information, see Gemini model parameters.
    • TOP_K: an optional field to set the top-K value for the model. For more information, see Gemini model parameters.

Example for streaming grounded answers

In the following example, the request specifies fast search as the grounding source to stream an answer. The streamed answer is distributed over several response candidates. This sample uses the streamGenerateGroundedContent method.

REST

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"http://discoveryengine.googleapis.com/v1alpha/projects/{project_id}/locations/global:streamGenerateGroundedContent" \
-d '
[
{
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "text": "Summarize How to delete a data store in Vertex AI Agent Builder?"
        }
      ]
    }
  ],
  "groundingSpec": {
    "groundingSources": [
      {
        "googleSearchSource": {}
      }
    ]
  },
  "generationSpec": {
    "modelId": "gemini-1.5-flash"
  }
}
]'

Supported models

The following models support grounding:

  • Gemini 1.5 Pro with text input only
  • Gemini 1.5 Flash with text input only
  • Gemini 1.0 Pro with text input only

To learn more about these Gemini models, see Gemini model versions and lifecycle.

When you call the generateGroundedContent method, you can use the following model IDs:

Model ID Auto-updated
default Yes
gemini-1.0-pro Yes
gemini-1.0-pro-001 No
gemini-1.0-pro-002 No
gemini-1.5-flash Yes
gemini-1.5-flash-001 No
gemini-1.5-pro Yes
gemini-1.5-pro-001 No

What's next

Learn how to generate grounded answers from unstructured data by integrating the standalone APIs for RAG.