Multimodal Embeddings API

Multimodal Embeddings API 会根据您提供的输入生成向量，可包含图片、文本和视频数据的组合。然后，嵌入向量可用于后续任务，例如图片分类或视频内容审核。

如需了解额外的概念信息，请参阅多模态嵌入。

支持的模型：

模型	代码
Embeddings for Multimodal	`multimodalembedding@001`

示例语法

用于发送多模态嵌入 API 请求的语法。

curl

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \

http://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:predict \
-d '{
"instances": [
  ...
],
}'

Python

from vertexai.vision_models import MultiModalEmbeddingModel

model = MultiModalEmbeddingModel.from_pretrained("multimodalembedding")
model.get_embeddings(...)

参数列表

如需了解实现详情，请参阅示例。

请求正文

参数
`image`	可选：`Image`。要为其生成嵌入的文本。
`text`	可选：`String`。要为其生成嵌入的图片。
`video`	可选：`Video`。要为其生成嵌入的视频片段。
`dimension`	可选：`Int`。响应中包含的嵌入的维度。仅适用于文本和图片输入。接受的值：`128`、`256`、`512` 或 `1408`。

映像

参数

参数
`bytesBase64Encoded`	可选：`String`。以 base64 字符串编码的图片字节数。必须是 `bytesBase64Encoded` 或 `gcsUri` 中的一个。
`gcsUri`	可选。`String` 要执行嵌入的图片的 Cloud Storage 位置。`bytesBase64Encoded` 或 `gcsUri` 之一。
`mimeType`	可选。`String` 图片内容的 MIME 类型。支持的值：`image/jpeg` 和 `image/png`。

bytesBase64Encoded

可选：String。

以 base64 字符串编码的图片字节数。必须是 bytesBase64Encoded 或 gcsUri 中的一个。

gcsUri

可选。String

要执行嵌入的图片的 Cloud Storage 位置。bytesBase64Encoded 或 gcsUri 之一。

mimeType

可选。String

图片内容的 MIME 类型。支持的值：image/jpeg 和 image/png。

VideoSegmentConfig

参数

参数
`startOffsetSec`	可选：`Int`。视频片段的起始偏移量（以秒为单位）。如果未指定，则使用 `max(0, endOffsetSec - 120)` 进行计算。
`endOffsetSec`	可选：`Int`。视频片段的结束偏移量（以秒为单位）。如果未指定，则使用 `min(video length, startOffSec + 120)` 进行计算。如果同时指定了 `startOffSec` 和 `endOffSec`，则 `endOffsetSec` 会调整为 `min(startOffsetSec+120, endOffsetSec)`。
`intervalSec`	可选。`Int` 将生成嵌入的视频的间隔。`interval_sec` 的最小值为 4。如果间隔小于 `4`，则返回 `InvalidArgumentError`。对间隔的最大值没有限制。但是，如果间隔大于 `min(video length, 120s)`，则会影响生成的嵌入的质量。默认值：`16`。

startOffsetSec

可选：Int。

视频片段的起始偏移量（以秒为单位）。如果未指定，则使用 max(0, endOffsetSec - 120) 进行计算。

endOffsetSec

可选：Int。

视频片段的结束偏移量（以秒为单位）。如果未指定，则使用 min(video length, startOffSec + 120) 进行计算。如果同时指定了 startOffSec 和 endOffSec，则 endOffsetSec 会调整为 min(startOffsetSec+120, endOffsetSec)。

intervalSec

可选。Int

将生成嵌入的视频的间隔。interval_sec 的最小值为 4。如果间隔小于 4，则返回 InvalidArgumentError。对间隔的最大值没有限制。但是，如果间隔大于 min(video length, 120s)，则会影响生成的嵌入的质量。默认值：16。

视频

参数

参数
`bytesBase64Encoded`	可选：`String`。以 base64 字符串编码的视频字节数。`bytesBase64Encoded` 或 `gcsUri` 之一。
`gcsUri`	可选：`String`。要为其执行嵌入的视频的 Cloud Storage 位置。`bytesBase64Encoded` 或 `gcsUri` 之一。
`videoSegmentConfig`	可选：`VideoSegmentConfig`。视频片段配置。

bytesBase64Encoded

可选：String。

以 base64 字符串编码的视频字节数。bytesBase64Encoded 或 gcsUri 之一。

gcsUri

可选：String。

要为其执行嵌入的视频的 Cloud Storage 位置。bytesBase64Encoded 或 gcsUri 之一。

videoSegmentConfig

可选：VideoSegmentConfig。

视频片段配置。

示例

从图片生成嵌入

使用以下示例为图片生成嵌入。

REST

在使用任何请求数据之前，请先进行以下替换：

LOCATION：您的项目的区域。例如 us-central1、europe-west2 或 asia-northeast3。如需查看可用区域的列表，请参阅 Vertex AI 上的生成式 AI 位置。
PROJECT_ID：您的 Google Cloud 项目 ID。
TEXT：要获取嵌入的目标文本。例如：a cat。
B64_ENCODED_IMG：要获取嵌入的目标图片。图片必须指定为 base64 编码的字节字符串。

HTTP 方法和网址：

POST http://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict

请求 JSON 正文：

{
  "instances": [
    {
      "text": "TEXT",
      "image": {
        "bytesBase64Encoded": "B64_ENCODED_IMG"
      }
    }
  ]
}

如需发送请求，请选择以下方式之一：

curl

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI，或者使用了 Cloud Shell，这会使您自动登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "http://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict"

PowerShell

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "http://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict" | Select-Object -Expand Content

模型返回的嵌入是 1408 个浮点矢量。以下示例响应会缩短。

{
  "predictions": [
    {
      "textEmbedding": [
        0.010477379,
        -0.00399621,
        0.00576670747,
        [...]
        -0.00823613815,
        -0.0169572588,
        -0.00472954148
      ],
      "imageEmbedding": [
        0.00262696808,
        -0.00198890246,
        0.0152047109,
        -0.0103145819,
        [...]
        0.0324628279,
        0.0284924973,
        0.011650892,
        -0.00452344026
      ]
    }
  ],
  "deployedModelId": "DEPLOYED_MODEL_ID"
}

Python

如需了解如何安装或更新 Python 版 Vertex AI SDK，请参阅安装 Python 版 Vertex AI SDK。如需了解详情，请参阅 Python API 参考文档。

import vertexai
from vertexai.vision_models import Image, MultiModalEmbeddingModel

# TODO(developer): Update values for project_id, image_path & contextual_text
vertexai.init(project=project_id, location="us-central1")

model = MultiModalEmbeddingModel.from_pretrained("multimodalembedding")
image = Image.load_from_file(image_path)

embeddings = model.get_embeddings(
    image=image,
    contextual_text=contextual_text,
)
print(f"Image Embedding: {embeddings.image_embedding}")
print(f"Text Embedding: {embeddings.text_embedding}")

Node.js

在尝试此示例之前，请按照《Vertex AI 快速入门：使用客户端库》中的 Node.js 设置说明执行操作。如需了解详情，请参阅 Vertex AI Node.js API 参考文档。

如需向 Vertex AI 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。

/**
 * TODO(developer): Uncomment these variables before running the sample.\
 * (Not necessary if passing values as arguments)
 */
// const project = 'YOUR_PROJECT_ID';
// const location = 'YOUR_PROJECT_LOCATION';
// const bastImagePath = "YOUR_BASED_IMAGE_PATH"
// const textPrompt = 'YOUR_TEXT_PROMPT';
const aiplatform = require('@google-cloud/aiplatform');

// Imports the Google Cloud Prediction service client
const {PredictionServiceClient} = aiplatform.v1;

// Import the helper module for converting arbitrary protobuf.Value objects.
const {helpers} = aiplatform;

// Specifies the location of the api endpoint
const clientOptions = {
  apiEndpoint: 'us-central1-aiplatform.googleapis.com',
};
const publisher = 'google';
const model = 'multimodalembedding@001';

// Instantiates a client
const predictionServiceClient = new PredictionServiceClient(clientOptions);

async function predictImageFromImageAndText() {
  // Configure the parent resource
  const endpoint = `projects/${project}/locations/${location}/publishers/${publisher}/models/${model}`;

  const fs = require('fs');
  const imageFile = fs.readFileSync(baseImagePath);

  // Convert the image data to a Buffer and base64 encode it.
  const encodedImage = Buffer.from(imageFile).toString('base64');

  const prompt = {
    text: textPrompt,
    image: {
      bytesBase64Encoded: encodedImage,
    },
  };
  const instanceValue = helpers.toValue(prompt);
  const instances = [instanceValue];

  const parameter = {
    sampleCount: 1,
  };
  const parameters = helpers.toValue(parameter);

  const request = {
    endpoint,
    instances,
    parameters,
  };

  // Predict request
  const [response] = await predictionServiceClient.predict(request);
  console.log('Get image embedding response');
  const predictions = response.predictions;
  console.log('\tPredictions :');
  for (const prediction of predictions) {
    console.log(`\t\tPrediction : ${JSON.stringify(prediction)}`);
  }
}

await predictImageFromImageAndText();

Java

在尝试此示例之前，请按照《Vertex AI 快速入门：使用客户端库》中的 Java 设置说明执行操作。如需了解详情，请参阅 Vertex AI Java API 参考文档。

如需向 Vertex AI 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。


import com.google.cloud.aiplatform.v1beta1.EndpointName;
import com.google.cloud.aiplatform.v1beta1.PredictResponse;
import com.google.cloud.aiplatform.v1beta1.PredictionServiceClient;
import com.google.cloud.aiplatform.v1beta1.PredictionServiceSettings;
import com.google.gson.Gson;
import com.google.gson.JsonObject;
import com.google.protobuf.InvalidProtocolBufferException;
import com.google.protobuf.Value;
import com.google.protobuf.util.JsonFormat;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.Base64;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class PredictImageFromImageAndTextSample {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace this variable before running the sample.
    String project = "YOUR_PROJECT_ID";
    String textPrompt = "YOUR_TEXT_PROMPT";
    String baseImagePath = "YOUR_BASE_IMAGE_PATH";

    // Learn how to use text prompts to update an image:
    // http://cloud.go888ogle.com.fqhub.com/vertex-ai/docs/generative-ai/image/edit-images
    Map<String, Object> parameters = new HashMap<String, Object>();
    parameters.put("sampleCount", 1);

    String location = "us-central1";
    String publisher = "google";
    String model = "multimodalembedding@001";

    predictImageFromImageAndText(
        project, location, publisher, model, textPrompt, baseImagePath, parameters);
  }

  // Update images using text prompts
  public static void predictImageFromImageAndText(
      String project,
      String location,
      String publisher,
      String model,
      String textPrompt,
      String baseImagePath,
      Map<String, Object> parameters)
      throws IOException {
    final String endpoint = String.format("%s-aiplatform.googleapis.com:443", location);
    final PredictionServiceSettings predictionServiceSettings =
        PredictionServiceSettings.newBuilder().setEndpoint(endpoint).build();

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests.
    try (PredictionServiceClient predictionServiceClient =
        PredictionServiceClient.create(predictionServiceSettings)) {
      final EndpointName endpointName =
          EndpointName.ofProjectLocationPublisherModelName(project, location, publisher, model);

      // Convert the image to Base64
      byte[] imageData = Base64.getEncoder().encode(Files.readAllBytes(Paths.get(baseImagePath)));
      String encodedImage = new String(imageData, StandardCharsets.UTF_8);

      JsonObject jsonInstance = new JsonObject();
      jsonInstance.addProperty("text", textPrompt);
      JsonObject jsonImage = new JsonObject();
      jsonImage.addProperty("bytesBase64Encoded", encodedImage);
      jsonInstance.add("image", jsonImage);

      Value instanceValue = stringToValue(jsonInstance.toString());
      List<Value> instances = new ArrayList<>();
      instances.add(instanceValue);

      Gson gson = new Gson();
      String gsonString = gson.toJson(parameters);
      Value parameterValue = stringToValue(gsonString);

      PredictResponse predictResponse =
          predictionServiceClient.predict(endpointName, instances, parameterValue);
      System.out.println("Predict Response");
      System.out.println(predictResponse);
      for (Value prediction : predictResponse.getPredictionsList()) {
        System.out.format("\tPrediction: %s\n", prediction);
      }
    }
  }

  // Convert a Json string to a protobuf.Value
  static Value stringToValue(String value) throws InvalidProtocolBufferException {
    Value.Builder builder = Value.newBuilder();
    JsonFormat.parser().merge(value, builder);
    return builder.build();
  }
}

从视频用例生成嵌入

使用以下示例生成视频内容的嵌入。

REST

以下示例使用位于 Cloud Storage 中的视频。您还可以使用 video.bytesBase64Encoded 字段提供视频的 base64 编码字符串表示形式。

在使用任何请求数据之前，请先进行以下替换：

LOCATION：您的项目的区域。例如 us-central1、europe-west2 或 asia-northeast3。如需查看可用区域的列表，请参阅 Vertex AI 上的生成式 AI 位置。
PROJECT_ID：您的 Google Cloud 项目 ID。
VIDEO_URI：要为其获取嵌入的目标视频的 Cloud Storage URI。例如 gs://my-bucket/embeddings/supermarket-video.mp4。
您还可以以 base64 编码的字节字符串形式提供视频：
```
[...]
"video": {
  "bytesBase64Encoded": "B64_ENCODED_VIDEO"
}
[...]
```
videoSegmentConfig（START_SECOND、END_SECOND、INTERVAL_SECONDS）。可选。为其生成嵌入的特定视频片段（以秒为单位）。
您为 videoSegmentConfig.intervalSec 设置的值会影响收费的价格层级。如需了解详情，请参阅视频嵌入模式部分和价格页面。

例如：
```
[...]
"videoSegmentConfig": {
  "startOffsetSec": 10,
  "endOffsetSec": 60,
  "intervalSec": 10
}
[...]
```
使用此配置可指定从 10 秒到 60 秒的视频数据，并为以下 10 秒的视频间隔生成嵌入：[10, 20), [20, 30), [30, 40), [40, 50), [50, 60)。此视频间隔 ("intervalSec": 10) 属于标准视频嵌入模式，用户按标准模式价格费率计费。

如果省略 videoSegmentConfig，则服务使用以下默认值： "videoSegmentConfig": { "startOffsetSec": 0, "endOffsetSec": 120, "intervalSec": 16 }。此视频间隔 ("intervalSec": 16) 属于基本视频嵌入模式，用户按基本模式价格费率计费。

HTTP 方法和网址：

POST http://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict

请求 JSON 正文：

{
  "instances": [
    {
      "video": {
        "gcsUri": "VIDEO_URI",
        "videoSegmentConfig": {
          "startOffsetSec": START_SECOND,
          "endOffsetSec": END_SECOND,
          "intervalSec": INTERVAL_SECONDS
        }
      }
    }
  ]
}

如需发送请求，请选择以下方式之一：

curl

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "http://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict"

PowerShell

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "http://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict" | Select-Object -Expand Content

模型返回的嵌入是 1408 个浮点矢量。以下响应示例缩短了空间。

响应（7 秒视频，未指定 videoSegmentConfig）：

{
  "predictions": [
    {
      "videoEmbeddings": [
        {
          "endOffsetSec": 7,
          "embedding": [
            -0.0045467657,
            0.0258095954,
            0.0146885719,
            0.00945400633,
            [...]
            -0.0023291884,
            -0.00493789,
            0.00975185353,
            0.0168156829
          ],
          "startOffsetSec": 0
        }
      ]
    }
  ],
  "deployedModelId": "DEPLOYED_MODEL_ID"
}

响应（59 秒视频，采用以下视频片段配置："videoSegmentConfig": { "startOffsetSec": 0, "endOffsetSec": 60, "intervalSec": 10 }）：

{
  "predictions": [
    {
      "videoEmbeddings": [
        {
          "endOffsetSec": 10,
          "startOffsetSec": 0,
          "embedding": [
            -0.00683252793,
            0.0390476175,
            [...]
            0.00657121744,
            0.013023301
          ]
        },
        {
          "startOffsetSec": 10,
          "endOffsetSec": 20,
          "embedding": [
            -0.0104404651,
            0.0357737206,
            [...]
            0.00509833824,
            0.0131902946
          ]
        },
        {
          "startOffsetSec": 20,
          "embedding": [
            -0.0113538112,
            0.0305239167,
            [...]
            -0.00195809244,
            0.00941874553
          ],
          "endOffsetSec": 30
        },
        {
          "embedding": [
            -0.00299320649,
            0.0322436653,
            [...]
            -0.00993082579,
            0.00968887936
          ],
          "startOffsetSec": 30,
          "endOffsetSec": 40
        },
        {
          "endOffsetSec": 50,
          "startOffsetSec": 40,
          "embedding": [
            -0.00591270532,
            0.0368893594,
            [...]
            -0.00219071587,
            0.0042470959
          ]
        },
        {
          "embedding": [
            -0.00458270218,
            0.0368121453,
            [...]
            -0.00317760976,
            0.00595594104
          ],
          "endOffsetSec": 59,
          "startOffsetSec": 50
        }
      ]
    }
  ],
  "deployedModelId": "DEPLOYED_MODEL_ID"
}

Python

如需了解如何安装或更新 Python 版 Vertex AI SDK，请参阅安装 Python 版 Vertex AI SDK。如需了解详情，请参阅 Python API 参考文档。

import vertexai

from vertexai.vision_models import MultiModalEmbeddingModel, Video

# TODO(developer): Update values for project_id,
#               video_path, contextual_text, dimension, video_segment_config
vertexai.init(project=project_id, location="us-central1")

model = MultiModalEmbeddingModel.from_pretrained("multimodalembedding")
video = Video.load_from_file(video_path)

embeddings = model.get_embeddings(
    video=video,
    video_segment_config=video_segment_config,
    contextual_text=contextual_text,
    dimension=dimension,
)

# Video Embeddings are segmented based on the video_segment_config.
print("Video Embeddings:")
for video_embedding in embeddings.video_embeddings:
    print(
        f"Video Segment: {video_embedding.start_offset_sec} - {video_embedding.end_offset_sec}"
    )
    print(f"Embedding: {video_embedding.embedding}")

print(f"Text Embedding: {embeddings.text_embedding}")

高级用例

使用以下示例获取视频、文本和图片内容的嵌入。

对于视频嵌入，您可以指定视频片段和嵌入密度。

REST

以下示例使用图片、文本和视频数据。您可以在请求正文中使用这些数据类型的任意组合。

此示例会使用 Cloud Storage 中的视频。您还可以使用 video.bytesBase64Encoded 字段提供视频的 base64 编码字符串表示形式。

在使用任何请求数据之前，请先进行以下替换：

LOCATION：您的项目的区域。例如 us-central1、europe-west2 或 asia-northeast3。如需查看可用区域的列表，请参阅 Vertex AI 上的生成式 AI 位置。
PROJECT_ID：您的 Google Cloud 项目 ID。
TEXT：要获取嵌入的目标文本。例如：a cat。
IMAGE_URI：要为其获取嵌入的目标视频的 Cloud Storage URI。例如 gs://my-bucket/embeddings/supermarket-img.png。
您还可以以 base64 编码的字节字符串形式提供图片：
```
[...]
"image": {
  "bytesBase64Encoded": "B64_ENCODED_IMAGE"
}
[...]
```
VIDEO_URI：要为其获取嵌入的目标视频的 Cloud Storage URI。例如 gs://my-bucket/embeddings/supermarket-video.mp4。
您还可以以 base64 编码的字节字符串形式提供视频：
```
[...]
"video": {
  "bytesBase64Encoded": "B64_ENCODED_VIDEO"
}
[...]
```
videoSegmentConfig（START_SECOND、END_SECOND、INTERVAL_SECONDS）。可选。为其生成嵌入的特定视频片段（以秒为单位）。
您为 videoSegmentConfig.intervalSec 设置的值会影响收费的价格层级。如需了解详情，请参阅视频嵌入模式部分和价格页面。

例如：
```
[...]
"videoSegmentConfig": {
  "startOffsetSec": 10,
  "endOffsetSec": 60,
  "intervalSec": 10
}
[...]
```
使用此配置可指定从 10 秒到 60 秒的视频数据，并为以下 10 秒的视频间隔生成嵌入：[10, 20), [20, 30), [30, 40), [40, 50), [50, 60)。此视频间隔 ("intervalSec": 10) 属于标准视频嵌入模式，用户按标准模式价格费率计费。

如果省略 videoSegmentConfig，则服务使用以下默认值： "videoSegmentConfig": { "startOffsetSec": 0, "endOffsetSec": 120, "intervalSec": 16 }。此视频间隔 ("intervalSec": 16) 属于基本视频嵌入模式，用户按基本模式价格费率计费。

HTTP 方法和网址：

POST http://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict

请求 JSON 正文：

{
  "instances": [
    {
      "text": "TEXT",
      "image": {
        "gcsUri": "IMAGE_URI"
      },
      "video": {
        "gcsUri": "VIDEO_URI",
        "videoSegmentConfig": {
          "startOffsetSec": START_SECOND,
          "endOffsetSec": END_SECOND,
          "intervalSec": INTERVAL_SECONDS
        }
      }
    }
  ]
}

如需发送请求，请选择以下方式之一：

curl

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "http://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict"

PowerShell

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "http://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict" | Select-Object -Expand Content

模型返回的嵌入是 1408 个浮点矢量。以下示例响应会缩短。

{
  "predictions": [
    {
      "textEmbedding": [
        0.0105433334,
        -0.00302835181,
        0.00656806398,
        0.00603460241,
        [...]
        0.00445805816,
        0.0139605571,
        -0.00170318608,
        -0.00490092579
      ],
      "videoEmbeddings": [
        {
          "startOffsetSec": 0,
          "endOffsetSec": 7,
          "embedding": [
            -0.00673126569,
            0.0248149596,
            0.0128901172,
            0.0107588246,
            [...]
            -0.00180952181,
            -0.0054573305,
            0.0117037306,
            0.0169312079
          ]
        }
      ],
      "imageEmbedding": [
        -0.00728622358,
        0.031021487,
        -0.00206603738,
        0.0273937676,
        [...]
        -0.00204976718,
        0.00321615417,
        0.0121978866,
        0.0193375275
      ]
    }
  ],
  "deployedModelId": "DEPLOYED_MODEL_ID"
}

Python

如需了解如何安装或更新 Python 版 Vertex AI SDK，请参阅安装 Python 版 Vertex AI SDK。如需了解详情，请参阅 Python API 参考文档。

import vertexai

from vertexai.vision_models import Image, MultiModalEmbeddingModel, Video

# TODO(developer): Update values for project_id,
#            image_path, video_path, contextual_text, video_segment_config
vertexai.init(project=project_id, location="us-central1")

model = MultiModalEmbeddingModel.from_pretrained("multimodalembedding")
image = Image.load_from_file(image_path)
video = Video.load_from_file(video_path)

embeddings = model.get_embeddings(
    image=image,
    video=video,
    video_segment_config=video_segment_config,
    contextual_text=contextual_text,
    dimension=dimension,
)

print(f"Image Embedding: {embeddings.image_embedding}")

# Video Embeddings are segmented based on the video_segment_config.
print("Video Embeddings:")
for video_embedding in embeddings.video_embeddings:
    print(
        f"Video Segment: {video_embedding.start_offset_sec} - {video_embedding.end_offset_sec}"
    )
    print(f"Embedding: {video_embedding.embedding}")

print(f"Text Embedding: {embeddings.text_embedding}")

后续步骤

如需查看详细文档，请参阅以下内容：

获取多模态嵌入