【问题标题】:How to Trigger azure ml pipeline with file change?如何通过文件更改触发 azure ml 管道?
【发布时间】:2022-11-16 13:54:46
【问题描述】:

我是 azure ml 的新手,我想在向数据集添加一些新数据时触发训练管道:

这是训练代码,一切正常:

prep_train_step = PythonScriptStep(
    name=PREPROCESS_TRAIN_PIPELINE_STEP_NAME,
    script_name=PREPROCESS_TRAIN_PIPELINE_SCRIPT_NAME, 
    compute_target=train_compute_instance, 
    source_directory=PREPROCESS_TRAIN_PIPELINE_SCRIPT_SOURCE_DIR,
    runconfig=train_run_config,
    allow_reuse=False,
    arguments=['--classifier-type', "xgBoost", "--train", train_dataset.as_mount(), "--test", test_dataset.as_mount()]
    )

print("Classification model preprocessing and training step created")

pipeline = Pipeline(workspace=ws, steps=[prep_train_step], )
print ("Pipeline is built")

# Submit the pipeline to be run once
experiment_name = PREPROCESS_TRAIN_EXPERIMENT_NAME
pipeline_run1 = Experiment(ws, experiment_name).submit(pipeline)
pipeline_run1.wait_for_completion(show_output=True)

现在我们来看看我从文档中得到的时间表:

published_pipeline = pipeline.publish(name='training_pipeline',
                                      description='Model training pipeline mock',
                                      version='1.0')

检查已发布管道的其余端点:

rest_endpoint = published_pipeline.endpoint
print(rest_endpoint)

到目前为止一切都很好,我们得到了它的 url。

现在到最后一部分,我必须安排管道:

from azureml.pipeline.core import Schedule

reactive_schedule = Schedule.create(ws, name='MyReactiveScheduleTraining',
                                    description='trains based on input file change.',
                                    pipeline_id=published_pipeline.id,
                                    experiment_name='retraining_Pipeline_data_changes',
                                    datastore=blob_storage,
                                    path_on_datastore='./toy_data/train1')

当我上传任何东西到./toy_data/train1时,管道没有被触发,我不知道为什么?!

即使我尝试更改path_on_datastore,并更改上传数据的目标,仍然没有!!!

有什么有用的想法吗?!

【问题讨论】:

    标签: azure-pipelines azure-machine-learning-service azure-ml-pipelines


    【解决方案1】:

    场景如下:[文件] => [数据存储] -> 触发器(带有输入数据参数的 AML 管道)-> [输出文件]。有关如何触发管道的更多详细信息,请参阅 Schedule 类文档 (https://learn.microsoft.com/en-us/python/api/azureml-pipeline-core/azureml.pipeline.core.schedule(class)?view=azure-ml-py):Time intervalAdded 或 modified blob。

    import azureml.core
    from azureml.core import Workspace
    from azureml.pipeline.core import Pipeline, PublishedPipeline
    from azureml.pipeline.core.schedule import ScheduleRecurrence, Schedule
    from azureml.core.experiment import Experiment
    
    ws = Workspace.from_config()
    
    pipeline_id = ""  # Retrieve from GetPublishedPipelines script
    experiment_name = ""
    recurrence = ScheduleRecurrence(
        frequency="Day", interval=1, time_of_day="08:00"
    )  # time_of_day is UTC
    recurring_schedule = Schedule.create(
        ws,
        name=experiment_name + "_RecurringJob",
        description="Based on time",
        pipeline_id=pipeline_id,
        experiment_name=experiment_name,
        recurrence=recurrence,
    )
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2020-08-15
      • 2020-09-22
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多