Batch predictions are a way to efficiently send multiple multimodal prompt requests that are non-latency sensitive. Unlike online prediction, where you are limited to one input request at a time, you can send a large number of multimodal requests in a single batch request. A batch prediction workflow consists of determining your output location, adding your input requests (in JSON), and your responses asynchronously populate in your BigQuery storage output location.
After you submit a batch request to a model and review its results, you can fine-tune the model to return more precise results. You can submit your fine-tuned model for batch generations as usual. To learn more about tuning models, see Overview of model tuning for Gemini.
Multimodal models that support batch predictions
The following multimodal models support batch predictions.
gemini-1.5-flash-001
gemini-1.5-pro-001
gemini-1.0-pro-002
gemini-1.0-pro-001
Prepare your inputs
Batch requests for multimodal models only accept BigQuery storage sources. To learn more, see Overview of BigQuery storage.
BigQuery input format details
- The content in the Request column must be valid JSON.
- The content in the JSON instructions must match the structure of a
GenerateContentRequest
. - Information about models or endpoints included in the request is ignored.
- You can add more columns to the table. Added columns are ignored for content generation. After the job completes, the extra columns are attached to the results.
- The system reserves two column names: Response and Status. These are used to provide information about the outcome of the model request job.
- Batch prediction doesn't support the
fileData
field for Gemini.
BigQuery input example
request |
---|
{ "contents": [ { "role": "user", "parts": { "text": "Give me a recipe for banana bread." } } ], "system_instruction": { "parts": [ { "text": "You are a chef." } ] }, "generation_config": { "top_k": 5 } } |
BigQuery output example
request | response | status |
---|---|---|
'{"content":[{...}]}' | { "candidates": [ { "content": { "role": "model", "parts": [ { "text": "In a medium bowl, whisk together the flour, baking soda, baking powder." } ] }, "finishReason": "STOP", "safetyRatings": [ { "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "probability": "NEGLIGIBLE", "probabilityScore": 0.14057204, "severity": "HARM_SEVERITY_NEGLIGIBLE", "severityScore": 0.14270912 } ] } ], "usageMetadata": { "promptTokenCount": 8, "candidatesTokenCount": 396, "totalTokenCount": 404 } } |
Request a batch response
Depending on the number of input items that you submitted, a batch generation task can take some time to complete.
REST
To test a multimodal prompt by using the Vertex AI API, send a POST request to the publisher model endpoint.
Before using any of the request data, make the following replacements:
- PROJECT_ID: The name of your Google Cloud project.
- BP_JOB_NAME: The job name.
- INPUT_URI: The input source URI. This is either a BigQuery table URI or a JSONL file URI in Cloud Storage.
- OUTPUT_URI: Output target URI.
HTTP method and URL:
POST http://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs
Request JSON body:
{ "name": "BP_JOB_NAME", "displayName": "BP_JOB_NAME", "model": "publishers/google/models/gemini-1.0-pro-001", "inputConfig": { "instancesFormat":"bigquery", "bigquerySource":{ "inputUri" : "INPUT_URI" } }, "outputConfig": { "predictionsFormat":"bigquery", "bigqueryDestination":{ "outputUri": "OUTPUT_URI" } } }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"http://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "http://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
{ "name": "projects/{PROJECT_ID}/locations/us-central1/batchPredictionJobs/{BATCH_JOB_ID}", "displayName": "BP_sample_publisher_BQ_20230712_134650", "model": "projects/{PROJECT_ID}/locations/us-central1/models/gemini-1.0-pro-001", "inputConfig": { "instancesFormat": "bigquery", "bigquerySource": { "inputUri": "bq://sample.text_input" } }, "modelParameters": {}, "outputConfig": { "predictionsFormat": "bigquery", "bigqueryDestination": { "outputUri": "bq://sample.llm_dataset.embedding_out_BP_sample_publisher_BQ_20230712_134650" } }, "state": "JOB_STATE_PENDING", "createTime": "2023-07-12T20:46:52.148717Z", "updateTime": "2023-07-12T20:46:52.148717Z", "labels": { "owner": "sample_owner", "product": "llm" }, "modelVersionId": "1", "modelMonitoringStatus": {} }
The response includes a unique identifier for the batch job.
You can poll for the status of the batch job using
the BATCH_JOB_ID until the job state
is
JOB_STATE_SUCCEEDED
. For example:
curl \ -X GET \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ http://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs/BATCH_JOB_ID
Retrieve batch output
When a batch prediction task completes, the output is stored in the BigQuery table that you specified in your request.
What's next
- Learn how to tune a Gemini model in Overview of model tuning for Gemini