Vertex AI Pipelines에서 커스텀 데이터로 이미지 분류 모델 미세 조정

이 튜토리얼에서는 Vertex AI Pipelines를 사용하여 엔드 투 엔드 ML 워크플로를 실행하는 방법을 보여주며, 여기에는 다음 작업이 포함됩니다.

데이터를 가져오고 변환합니다.
변환된 데이터를 사용하여 TFHub의 이미지 분류 모델을 미세 조정합니다.
학습된 모델을 Vertex AI 모델 레지스트리로 가져옵니다.
선택사항: Vertex AI Prediction으로 온라인 서빙을 위한 모델을 배포합니다.

시작하기 전에

Google Cloud 프로젝트 및 개발 환경 설정의 1~3 단계를 완료했는지 확인합니다.
격리된 Python 환경을 만들고 Python용 Vertex AI SDK를 설치합니다.

Kubeflow Pipelines SDK를 설치합니다.

python3 -m pip install "kfp<2.0.0" "google-cloud-aiplatform>=1.16.0" --upgrade --quiet

ML 모델 학습 파이프라인 실행

이 샘플 코드는 다음을 수행합니다.

파이프라인 빌딩 블록으로 사용할 구성요소 저장소에서 구성요소를 로드합니다.
구성요소 태스크를 만들고 인수를 사용하여 구성요소 태스크 간에 데이터를 전달하여 파이프라인을 구성합니다.
Vertex AI Pipelines에서 실행할 파이프라인을 제출합니다. Vertex AI Pipelines 가격 책정을 참조하세요.

다음 샘플 코드를 개발 환경에 복사하고 실행합니다.

이미지 분류

# python3 -m pip install "kfp<2.0.0" "google-cloud-aiplatform>=1.16.0" --upgrade --quiet
from kfp import components
from kfp.v2 import dsl

# %% Loading components
upload_Tensorflow_model_to_Google_Cloud_Vertex_AI_op = components.load_component_from_url('http://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/google-cloud/Vertex_AI/Models/Upload_Tensorflow_model/component.yaml')
deploy_model_to_endpoint_op = components.load_component_from_url('http://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/google-cloud/Vertex_AI/Models/Deploy_to_endpoint/component.yaml')
transcode_imagedataset_tfrecord_from_csv_op = components.load_component_from_url('http://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/main/community-content/pipeline_components/image_ml_model_training/transcode_tfrecord_image_dataset_from_csv/component.yaml')
load_image_classification_model_from_tfhub_op = components.load_component_from_url('http://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/b5b65198a6c2ffe8c0fa2aa70127e3325752df68/community-content/pipeline_components/image_ml_model_training/load_image_classification_model/component.yaml')
preprocess_image_data_op = components.load_component_from_url('http://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/main/community-content/pipeline_components/image_ml_model_training/preprocess_image_data/component.yaml')
train_tensorflow_image_classification_model_op = components.load_component_from_url('http://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/main/community-content/pipeline_components/image_ml_model_training/train_image_classification_model/component.yaml')

# %% Pipeline definition
def image_classification_pipeline():
    class_names = ['daisy', 'dandelion', 'roses', 'sunflowers', 'tulips']
    csv_image_data_path = 'gs://cloud-samples-data/ai-platform/flowers/flowers.csv'
    deploy_model = False

    image_data = dsl.importer(
        artifact_uri=csv_image_data_path, artifact_class=dsl.Dataset).output

    image_tfrecord_data = transcode_imagedataset_tfrecord_from_csv_op(
        csv_image_data_path=image_data,
        class_names=class_names
    ).outputs['tfrecord_image_data_path']

    loaded_model_outputs = load_image_classification_model_from_tfhub_op(
        class_names=class_names,
    ).outputs

    preprocessed_data = preprocess_image_data_op(
        image_tfrecord_data,
        height_width_path=loaded_model_outputs['image_size_path'],
    ).outputs

    trained_model = (train_tensorflow_image_classification_model_op(
        preprocessed_training_data_path = preprocessed_data['preprocessed_training_data_path'],
        preprocessed_validation_data_path = preprocessed_data['preprocessed_validation_data_path'],
        model_path=loaded_model_outputs['loaded_model_path']).
                   set_cpu_limit('96').
                   set_memory_limit('128G').
                   add_node_selector_constraint('cloud.go888ogle.com.fqhub.com/gke-accelerator', 'NVIDIA_TESLA_A100').
                   set_gpu_limit('8').
                   outputs['trained_model_path'])

    vertex_model_name = upload_Tensorflow_model_to_Google_Cloud_Vertex_AI_op(
        model=trained_model,
    ).outputs['model_name']

    # Deploying the model might incur additional costs over time
    if deploy_model:
        vertex_endpoint_name = deploy_model_to_endpoint_op(
            model_name=vertex_model_name,
        ).outputs['endpoint_name']

pipeline_func = image_classification_pipeline

# %% Pipeline submission
if __name__ == '__main__':
    from google.cloud import aiplatform
    aiplatform.PipelineJob.from_pipeline_func(pipeline_func=pipeline_func).submit()

제공된 샘플 코드에서 다음 사항에 유의하세요.

Kubeflow 파이프라인은 Python 함수로 정의됩니다.
파이프라인의 워크플로 단계는 Kubeflow 파이프라인 구성요소를 사용하여 생성됩니다. 구성요소 출력을 다른 구성요소의 입력으로 사용하여 파이프라인의 워크플로를 그래프로 정의합니다. 예를 들어 preprocess_image_data_op 구성요소 태스크는 transcode_imagedataset_tfrecord_from_csv_op 구성요소 태스크의 tfrecord_image_data_path 출력에 따라 달라집니다.
Python용 Vertex AI SDK를 사용하여 Vertex AI Pipelines에서 파이프라인 실행을 만듭니다.

파이프라인 모니터링

Google Cloud console의 Vertex AI 섹션에서 파이프라인 페이지로 이동하여 실행 탭을 엽니다.

파이프라인 실행으로 이동

다음 단계

Vertex AI Pipelines에 대해 자세히 알아보려면 Vertex AI Pipelines 소개를 참조하세요.