지원되는 모델:
- multimodalembedding@001
문법
- PROJECT_ID =
PROJECT_ID
- REGION =
us-central1
curl
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ http://${REGION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${REGION}/publishers/google/models/${MODEL_ID}:predict \ -d '{ "instances": [ ... ], }'
Python
from vertexai.vision_models import MultiModalEmbeddingModel model = MultiModalEmbeddingModel.from_pretrained("multimodalembedding") model.get_embeddings(...)
매개변수 목록
요청 본문
매개변수 | |
---|---|
|
선택사항. 임베딩을 생성하려는 텍스트입니다. |
|
선택사항. 임베딩을 생성하려는 이미지입니다. |
|
선택사항. 임베딩을 생성하려는 동영상 세그먼트입니다. |
|
선택사항. 이 매개변수는 128, 256, 512, 1408 값 중 하나를 허용합니다. 응답에는 해당 차원의 임베딩이 포함됩니다. 텍스트 및 이미지 입력에만 적용됩니다. |
이미지
매개변수 | |
---|---|
|
선택사항. base64 문자열로 인코딩된 이미지 바이트입니다. |
|
선택사항. 임베딩을 수행할 이미지의 Cloud Storage 위치입니다. |
|
선택사항. 이미지 콘텐츠의 MIME 유형입니다. |
VideoSegmentConfig
매개변수 | |
---|---|
|
선택사항. 동영상 세그먼트의 시작 오프셋(초)입니다. 시작 오프셋을 지정하지 않으면 |
|
선택사항. 동영상 세그먼트의 종료 오프셋(초)입니다. 종료 오프셋을 지정하지 않으면 |
|
선택사항. 임베딩이 생성되는 동영상의 간격입니다. |
동영상
매개변수 | |
---|---|
|
선택사항. base64 문자열로 인코딩된 동영상 바이트입니다. |
|
선택사항. 임베딩을 수행할 동영상의 Cloud Storage 위치입니다. |
|
선택사항. 동영상 세그먼트 구성입니다. |
예시
- PROJECT_ID =
PROJECT_ID
- REGION =
us-central1
- MODEL_ID =
multimodalembedding@001
기본 사용 사례
멀티모달 임베딩 모델은 제공된 입력에 따라 벡터를 생성하며, 여기에는 이미지, 텍스트, 동영상 데이터의 조합이 포함될 수 있습니다.
curl
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ http://${REGION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${REGION}/publishers/google/models/${MODEL_ID}:predict \ -d '{ "instances": [ { "image": { "gcsUri": "gs://your-public-uri-test/flower.jpg" }, "text": "white shoes", "video": { "gcsUri": "gs://your-public-uri-test/Okabashi.mp4" }, } ], }'
Python
# @title Client for multimodal embedding import base64 import time import typing from dataclasses import dataclass from absl import app from absl import flags # Need to do pip install google-cloud-aiplatform for the following two imports. # Also run: gcloud auth application-default login. from google.cloud import aiplatform from google.protobuf import struct_pb2 PROJECT_ID = {PROJECT_ID} IMAGE_URI = "gs://your-public-uri-test/flower.jpg" # @param {type:"string"} TEXT = "white shoes" # @param {type:"string"} VIDEO_URI = "gs://your-public-uri-test/Okabashi.mp4" # @param {type:"string"} VIDEO_START_OFFSET_SEC=0 VIDEO_END_OFFSET_SEC=120 VIDEO_EMBEDDING_INTERVAL_SEC=16 # Inspired from http://stackoverflow.com/questions/34269772/type-hints-in-namedtuple. class EmbeddingResponse(typing.NamedTuple): @dataclass class VideoEmbedding: start_offset_sec: int end_offset_sec: int embedding: typing.Sequence[float] text_embedding: typing.Sequence[float] image_embedding: typing.Sequence[float] video_embeddings: typing.Sequence[VideoEmbedding] class EmbeddingPredictionClient: """Wrapper around Prediction Service Client.""" def __init__(self, project: str, location: str = "us-central1", api_regional_endpoint: str = "us-central1-aiplatform.googleapis.com"): client_options = {"api_endpoint": api_regional_endpoint} # Initialize client that will be used to create and send requests. # This client only needs to be created once, and can be reused for multiple requests. self.client = aiplatform.gapic.PredictionServiceClient(client_options=client_options) self.location = location self.project = project def get_embedding(self, text: str = None, image_uri: str = None, video_uri: str = None, start_offset_sec: int = 0, end_offset_sec: int = 120, interval_sec: int = 16): if not text and not image_uri and not video_uri: raise ValueError('At least one of text or image_uri or video_uri must be specified.') instance = struct_pb2.Struct() if text: instance.fields['text'].string_value = text if image_uri: image_struct = instance.fields['image'].struct_value image_struct.fields['gcsUri'].string_value = image_uri if video_uri: video_struct = instance.fields['video'].struct_value video_struct.fields['gcsUri'].string_value = video_uri video_config_struct = video_struct.fields['videoSegmentConfig'].struct_value video_config_struct.fields['startOffsetSec'].number_value = start_offset_sec video_config_struct.fields['endOffsetSec'].number_value = end_offset_sec video_config_struct.fields['intervalSec'].number_value = interval_sec instances = [instance] endpoint = (f"projects/{self.project}/locations/{self.location}" "/publishers/google/models/multimodalembedding@001") response = self.client.predict(endpoint=endpoint, instances=instances) text_embedding = None if text: text_emb_value = response.predictions[0]['textEmbedding'] text_embedding = [v for v in text_emb_value] image_embedding = None if image_uri: image_emb_value = response.predictions[0]['imageEmbedding'] image_embedding = [v for v in image_emb_value] video_embeddings = None if video_uri: video_emb_values = response.predictions[0]['videoEmbeddings'] video_embeddings = [ EmbeddingResponse.VideoEmbedding(start_offset_sec=v['startOffsetSec'], end_offset_sec=v['endOffsetSec'], embedding=[x for x in v['embedding']]) for v in video_emb_values] return EmbeddingResponse( text_embedding=text_embedding, image_embedding=image_embedding, video_embeddings=video_embeddings) # client can be reused. client = EmbeddingPredictionClient(project=PROJECT_ID) start = time.time() response = client.get_embedding(text=TEXT, image_uri=IMAGE_URI, video_uri=VIDEO_URI, start_offset_sec=VIDEO_START_OFFSET_SEC, end_offset_sec=VIDEO_END_OFFSET_SEC, interval_sec=VIDEO_EMBEDDING_INTERVAL_SEC) end = time.time() print(response) print('Time taken: ', end - start)
고급 사용 사례
사용자는 텍스트 및 이미지 임베딩의 차원을 지정할 수 있습니다. 동영상 임베딩의 경우 사용자는 동영상 세그먼트와 임베딩 밀도를 지정할 수 있습니다.
curl - 이미지
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ http://${REGION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${REGION}/publishers/google/models/${MODEL_ID}:predict \ -d '{ "instances": [ { "image": { "gcsUri": "gs://your-public-uri-test/flower.jpg" }, "text": "white shoes", } ], "parameters": { "dimension": 128 } }'
curl - 동영상
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ http://${REGION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${REGION}/publishers/google/models/${MODEL_ID}:predict \ -d '{ "instances": [ { "video": { "gcsUri": "gs://your-public-uri-test/Okabashi.mp4", "videoSegmentConfig": { "startOffsetSec": 10, "endOffsetSec": 60, "intervalSec": 10 } }, } ], }'
Python
# @title Client for multimodal embedding import base64 import time import typing from dataclasses import dataclass from absl import app from absl import flags # Need to do pip install google-cloud-aiplatform for the following two imports. # Also run: gcloud auth application-default login. from google.cloud import aiplatform from google.protobuf import struct_pb2 PROJECT_ID = {PROJECT_ID} IMAGE_URI = "gs://your-public-uri-test/flower.jpg" TEXT = "white shoes" VIDEO_URI = "gs://your-public-uri-test/brahms.mp4" VIDEO_START_OFFSET_SEC=10 VIDEO_END_OFFSET_SEC=60 VIDEO_EMBEDDING_INTERVAL_SEC=10 DIMENSION= 128 # Inspired from http://stackoverflow.com/questions/34269772/type-hints-in-namedtuple. class EmbeddingResponse(typing.NamedTuple): @dataclass class VideoEmbedding: start_offset_sec: int end_offset_sec: int embedding: typing.Sequence[float] text_embedding: typing.Sequence[float] image_embedding: typing.Sequence[float] video_embeddings: typing.Sequence[VideoEmbedding] class EmbeddingPredictionClient: """Wrapper around Prediction Service Client.""" def __init__(self, project: str, location: str = "us-central1", api_regional_endpoint: str = "us-central1-aiplatform.googleapis.com"): client_options = {"api_endpoint": api_regional_endpoint} # Initialize client that will be used to create and send requests. # This client only needs to be created once, and can be reused for multiple requests. self.client = aiplatform.gapic.PredictionServiceClient(client_options=client_options) self.location = location self.project = project def get_embedding(self, text: str = None, image_uri: str = None, video_uri: str = None, start_offset_sec: int = 0, end_offset_sec: int = 120, interval_sec: int = 16, dimension=1408): if not text and not image_uri and not video_uri: raise ValueError('At least one of text or image_uri or video_uri must be specified.') instance = struct_pb2.Struct() if text: instance.fields['text'].string_value = text if image_uri: image_struct = instance.fields['image'].struct_value image_struct.fields['gcsUri'].string_value = image_uri if video_uri: video_struct = instance.fields['video'].struct_value video_struct.fields['gcsUri'].string_value = video_uri video_config_struct = video_struct.fields['videoSegmentConfig'].struct_value video_config_struct.fields['startOffsetSec'].number_value = start_offset_sec video_config_struct.fields['endOffsetSec'].number_value = end_offset_sec video_config_struct.fields['intervalSec'].number_value = interval_sec parameters = struct_pb2.Struct() parameters.fields['dimension'].number_value = dimension instances = [instance] endpoint = (f"projects/{self.project}/locations/{self.location}" "/publishers/google/models/multimodalembedding@001") response = self.client.predict(endpoint=endpoint, instances=instances, parameters=parameters) text_embedding = None if text: text_emb_value = response.predictions[0]['textEmbedding'] text_embedding = [v for v in text_emb_value] image_embedding = None if image_uri: image_emb_value = response.predictions[0]['imageEmbedding'] image_embedding = [v for v in image_emb_value] video_embeddings = None if video_uri: video_emb_values = response.predictions[0]['videoEmbeddings'] video_embeddings = [ EmbeddingResponse.VideoEmbedding(start_offset_sec=v['startOffsetSec'], end_offset_sec=v['endOffsetSec'], embedding=[x for x in v['embedding']]) for v in video_emb_values] return EmbeddingResponse( text_embedding=text_embedding, image_embedding=image_embedding, video_embeddings=video_embeddings) # client can be reused. client = EmbeddingPredictionClient(project=PROJECT_ID) start = time.time() response = client.get_embedding(text=TEXT, image_uri=IMAGE_URI, video_uri=VIDEO_URI, start_offset_sec=VIDEO_START_OFFSET_SEC, end_offset_sec=VIDEO_END_OFFSET_SEC, interval_sec=VIDEO_EMBEDDING_INTERVAL_SEC, dimension=DIMENSION) end = time.time() print(response) print('Time taken: ', end - start)
더 살펴보기
자세한 문서는 다음을 참조하세요.