Training classes

The Vertex AI SDK includes several classes that you use when you train your model. Most of the training classes are used to create, train, and return your model. Use the HyperparameterTuningJob to tune the training job's hyperparameters. Use the PipelineJob manage your machine learning (ML) workflow so you can you automate and monitor your ML systems.

The following topics provide a high-level description of each training-related class in the Vertex AI SDK.

AutoML training classes for structured data

Vertex AI SDK includes the following classes that are used to train a structured AutoML model.

AutoMLForecastingTrainingJob

The AutoMLForecastingTrainingJob class uses the AutoML training method to train and run a forecasting model. The AutoML training method is a good choice for most forecasting use cases. If your use case doesn't benefit from the Seq2seq or the Temporal fusion transformer training method offered by the SequenceToSequencePlusForecastingTrainingJob and TemporalFusionTransformerForecastingTrainingJob classes respectively, then AutoML is likely the best training method for your forecasting predictions.

For sample code that shows you how to use AutoMLForecastingTrainingJob, see the Create a training pipeline forecasting sample on GitHub.

AutoMLTabularTrainingJob

The AutoMLTabularTrainingJob class represents a job that creates, trains, and returns an AutoML tabular model. For more information about training tabular models and Vertex AI, see Tabular data and Tabular data overview.

The following sample code snippet shows how you might use the Vertex AI SDK to create and run an AutoML tabular model:

dataset = aiplatform.TabularDataset('projects/my-project/location/us-central1/datasets/{DATASET_ID}')

job = aiplatform.AutoMLTabularTrainingJob(
  display_name="train-automl",
  optimization_prediction_type="regression",
  optimization_objective="minimize-rmse",
)

model = job.run(
    dataset=dataset,
    target_column="target_column_name",
    training_fraction_split=0.6,
    validation_fraction_split=0.2,
    test_fraction_split=0.2,
    budget_milli_node_hours=1000,
    model_display_name="my-automl-model",
    disable_early_stopping=False,
)

SequenceToSequencePlusForecastingTrainingJob

The SequenceToSequencePlusForecastingTrainingJob class uses the Seq2seq+ training method to train and run a forecasting model. The Seq2seq+ training method is a good choice for experimentation. Its algorithm is simpler and uses a smaller search space than the AutoML option. Seq2seq+ is a good option if you want fast results and your datasets are smaller than 1 GB.

For sample code that shows you how to use SequenceToSequencePlusForecastingTrainingJob, see the Create a training pipeline forecasting Seq2seq sample on GitHub.

TemporalFusionTransformerForecastingTrainingJob

The TemporalFusionTransformerForecastingTrainingJob class uses the Temporal Fusion Transformer (TFT) training method to train and run a forecasting model. The TFT training method implements an attention-based deep neural network (DNN) model that uses a multi-horizon forecasting task to produce predictions.

For sample code that shows you how to use TemporalFusionTransformerForecastingTrainingJob, see the Create a training pipeline forecasting temporal fusion transformer sample on GitHub.

TimeSeriesDenseEncoderForecastingTrainingJob

The TimeSeriesDenseEncoderForecastingTrainingJob class uses the Time-series Dense Encoder (TiDE) training method to train and run a forecasting model. TiDE uses a multi-layer perceptron (MLP) to provide the speed of forecasting linear models with covariates and non-linear dependencies. For more information about TiDE, see Recent advances in deep long-horizon forecasting and this TiDE blog post.

AutoML training classes for unstructured data

The Vertex AI SDK includes the following classes to train unstructured image, text, and video models:

AutoMLImageTrainingJob

Use the AutoMLImageTrainingJob class to create, train, and return an image model. For more information about working with image data models in Vertex AI, see Image data.

For an example of how to use the AutoMLImageTrainingJob class, see the tutorial in the AutoML image classification notebook.

AutoMLTextTrainingJob

Use the AutoMLTextTrainingJob class to create, train, and return a text model. For more information about working with text data models in Vertex AI, see Text data.

For an example of how to use the AutoMLTextTrainingJob class, see the tutorial in the AutoML training text entity extraction model for online prediction notebook.

AutoMLVideoTrainingJob

Use the AutoMLVideoTrainingJob class to create, train, and return a video model. For more information about working with video data models in Vertex AI, see Video data.

For an example of how to use the AutoMLVideoTrainingJob class, see the tutorial in the AutoML training video action recognition model for batch prediction notebook.

Custom data training classes

You can use the Vertex AI SDK to automate a custom training workflow. For information about using Vertex AI to run custom training applications, see Custom training overview.

The Vertex AI SDK includes three classes that create a custom training pipeline. A training pipeline accepts an input Vertex AI managed dataset that it uses to train a model. Next, it returns the model after the training job completes. Each of the three custom training pipeline classes creates a training pipeline differently. CustomTrainingJob uses a Python script, CustomContainerTrainingJob uses a custom container, and CustomPythonPackageTrainingJob uses a Python package and a prebuilt container.

The CustomJob class creates a custom training job but is not a pipeline. Unlike a custom training pipeline, the CustomJob class can use a dataset that's not a Vertex AI managed dataset to train a model, and it doesn't return the trained model. Because the class accepts different types of datasets and doesn't return a trained model, it's less automated and more flexible than a custom training pipeline.

CustomContainerTrainingJob

Use the CustomContainerTrainingJob class to use a container to launch a custom training pipeline in Vertex AI.

For an example of how to use the CustomContainerTrainingJob class, see the tutorial in the PyTorch Image Classification Multi-Node Distributed Data Parallel Training on GPU using Vertex AI Training with Custom Container notebook.

CustomJob

Use the CustomJob class to use a script to launch a custom training job in Vertex AI.

A training job is more flexible than a training pipeline because you aren't restricted to loading your data in a Vertex AI managed dataset and a reference to your model isn't registered after the training job completes. For example, you might want to use the CustomJob class, its from_local_script method, and a script to load a dataset from scikit-learn or TensorFlow. Or, you might want to analyze or test your trained model before it's registered to Vertex AI.

For more information about custom training jobs, including requirements before submitting a custom training job, what a custom job includes, and a Python code sample, see Create custom training jobs.

Because the CustomJob.run doesn't return the trained model, you need to use a script to write the model artifact to a location, such as a Cloud Storage bucket. For more information, see Export a trained ML model.

The following sample code demonstrates how to create and run a custom job using a sample worker pool specification. The code writes the trained model to a Cloud Storage bucket named artifact-bucket.

# Create a worker pool spec that specifies a TensorFlow cassava dataset and
# includes the machine type and Docker image. The Google Cloud project ID
# is 'project-id'.
worker_pool_specs=[
     {
        "replica_count": 1,
        "machine_spec": { "machine_type": "n1-standard-8", 
                          "accelerator_type": "NVIDIA_TESLA_V100", 
                          "accelerator_count": 1
        },
        "container_spec": {"image_uri": "gcr.io/{project-id}/multiworker:cassava"}
      },
      {
        "replica_count": 1,
        "machine_spec": { "machine_type": "n1-standard-8", 
                          "accelerator_type": "NVIDIA_TESLA_V100", 
                          "accelerator_count": 1
        },
        "container_spec": {"image_uri": "gcr.io/{project-id}/multiworker:cassava"}
      }
]

# Use the worker pool spec to create a custom training job. The custom training 
# job artifacts are stored in the Cloud Storage bucket
# named 'artifact-bucket'.
your_custom_training_job = aiplatform.CustomJob(
                                      display_name='multiworker-cassava-sdk',
                                      worker_pool_specs=worker_pool_specs,
                                      staging_bucket='gs://{artifact-bucket}')

# Run the training job. This method doesn't return the trained model.
my_multiworker_job.run()

CustomPythonPackageTrainingJob

Use the CustomPythonPackageTrainingJob class to use a Python package to launch a custom training pipeline in Vertex AI.

For an example of how to use the CustomPythonPackageTrainingJob class, see the tutorial in the Custom training using Python package, managed text dataset, and TensorFlow serving container notebook.

CustomTrainingJob

Use the CustomTrainingJob class to launch a custom training pipeline in Vertex AI with a script.

For an example of how to use the CustomTrainingJob class, see the tutorial in the Custom training image classification model for online prediction with explainability notebook.

Hyperparameter training class

The Vertex AI SDK includes a class for hyperparameter tuning. Hyperparameter tuning maximizes your model's predictive accuracy by optimizing variables (known as hyperparameters) that govern the training process. For more information, see Overview of hyperparameter tuning.

HyperparameterTuningJob

Use the HyperparameterTuningJob class to automate hyperparameter tuning on a training application.

To learn how to use the HyperparameterTuningJob class create and tune a custom trained model, see the Hyperparameter tuning tutorial on GitHub.

To learn how to use the HyperparameterTuningJob class to run a Vertex AI hyperparameter tuning job for a TensorFlow model, see the Run hyperparameter tuning for a TensorFlow model tutorial on GitHub.

Pipeline training class

A pipeline orchestrates your ML workflow in Vertex AI. You can use a pipeline to automate, monitor, and govern your machine learning systems. To learn more about pipelines in Vertex AI, see Introduction to Vertex AI pipelines.

PipelineJob

An instance of the PipelineJob class represents a Vertex AI pipeline.

There are several tutorial notebooks that demonstrate how to use the PipelineJob class:

For more tutorial notebooks, see Vertex AI notebook tutorials.

What's next