【问题标题】:How we can create Dataproc cluster through rest API or http request?我们如何通过 REST API 或 http 请求创建 Dataproc 集群?
【发布时间】:2019-12-17 04:45:43
【问题描述】:

我是python新手,在这里我想使用http请求创建dataproc集群。 我正在关注他们在 REST API 部分中提到的 dataproc 文档。见下文 https://cloud.google.com/dataproc/docs/guides/create-cluster#creating_a_cloud_dataproc_cluster

查看下面我正在尝试的代码

Endpoint_URL  = "https://dataproc.googleapis.com/v1/projects/*******/regions/us-central1-b/clusters"
data = {
    "projectId": "*****",
    "clusterName": "cluster-1",
    "config": {
        "configBucket": "",
        "gceClusterConfig": {
            "subnetworkUri": "default",
                "zoneUri": "us-central1-b"
        },
        "masterConfig": {
            "numInstances": 1,
            "machineTypeUri": "n1-standard-1",
            "diskConfig": {
                "bootDiskSizeGb": 500,
                "numLocalSsds": 0
            }
        },
        "workerConfig": {
            "numInstances": 2,
            "machineTypeUri": "n1-standard-1",
            "diskConfig": {
                "bootDiskSizeGb": 100,
                "numLocalSsds": 0
            }
        }
    }
}
r = requests.post(url=Endpoint_URL, data=data)
op_url = r.text
print("The pastebin URL is:%s" % op_url)

我不知道它是否正确,使用此代码我无法创建集群。解决方案是什么 谢谢

【问题讨论】:

    标签: python python-3.x airflow google-cloud-dataproc


    【解决方案1】:

    您应该使用python-client-library 更轻松地访问 Dataproc api。如果一定要手写http调用,可以用json格式post body。以下http请求将起作用:

        uri: https://dataproc.googleapis.com/v1/projects/<project>/regions/<region>/clusters?alt=json
        method: POST
    
        # Headers
        Authorization: <oauth token>
        accept: application/json
        accept-encoding: gzip, deflate
        content-length: <length>
        content-type: application/json
    
        # Body
        {
           "clusterName": "<cluster-name>",
           "config": {
               "gceClusterConfig": {...},
               "masterConfig": {...},
               "softwareConfig": {...},
               "workerConfig": {...}
           },
           "projectId": "<project_id>"
        }
    

    如果你安装了gcloud sdk,你可以通过添加--log-http 标志来查看客户端发出的各种http请求的详细信息。比如——

    gcloud dataproc clusters create <cluster-name> --log-http
    

    【讨论】:

    • REST API 更灵活
    【解决方案2】:

    我建议使用 DataprocClusterCreateOperator (https://airflow.apache.org/_api/airflow/contrib/operators/dataproc_operator/index.html#module-airflow.contrib.operators.dataproc_operator),它已经实现了所需的 HTTP 请求并处理日志记录/重试等,因此您无需自己动手。

    from airflow.contrib.operators import dataproc_operator
    
    create_dataproc_cluster = dataproc_operator.DataprocClusterCreateOperator(
        task_id='create_dataproc_cluster',
        # Give the cluster a unique name by appending the date scheduled.
        # See https://airflow.apache.org/code.html#default-variables
        cluster_name='hadoop-cluster',
        num_workers=2,
        zone='europe-west1-b',
        master_machine_type='n1-standard-1',
        worker_machine_type='n1-standard-1',
        dag=dag)
    

    该 Operator 将向 Google Cloud 发出 HTTP 请求。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2017-12-08
      • 2021-07-20
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2020-03-28
      • 1970-01-01
      相关资源
      最近更新 更多