在pyspark中将JSON对象数组转换为字符串答案

【问题标题】：Convert array of JSON objects to string in pyspark在pyspark中将JSON对象数组转换为字符串
【发布时间】：2021-04-02 03:07:26
【问题描述】：

我有一个要求，我需要从一个 PySpark 数据帧返回的列创建自定义 JSON。所以我写了一个像下面这样的 UDF，它将为每一行从 UDF 返回一个字符串格式的 JSON。

参数“entities”是JSON格式的数组。

def halResponse(entities, admantx, copilot_id): 
  json_resp = "{\"analyzedContent\": {"+json.dumps(entities)+"}}"
  return json_resp

但在响应中，我没有得到正确的 JSON，即不是正确的键：值对，我只是得到值（出于安全目的，实际值替换为 *），而不是键和值。

查找示例响应：

  "analyzedContents": [
    {
      "entities": [
        [
          "******",
          *,
          *********,
          [
            [
              "***********",
              "***********",
              "***********",
              [
                "*****************"
              ],
              **********
            ]
          ],
          "**************"
        ]
      ]
    }
  ]
}

请帮我解决这个问题。修复后，我应该得到以下示例响应

  "analyzedContents": [
    {
      "entities": [
        [
          "key":******",
          "key":*,
          "key":*********,
          [
            [
              "key":"***********",
              "key":"***********",
              "key":"***********",
              [
                "key":"*****************"
              ],
              "key":**********
            ]
          ],
          "key":"**************"
        ]
      ]
    }
  ]
}

【问题讨论】：

尝试使用F.to_json spark.apache.org/docs/latest/api/python/…
以及如何将 JSON 转换为字符串
它是一个字符串，不需要进一步转换。
但是当我在 udf 中连接它时，我得到了这个'TypeError: can only concatenate str (not "NoneType") to str'
您能否编辑您的问题并显示您使用的 udf 和代码（使用 F.to_json）？

标签： json apache-spark pyspark apache-spark-sql

【解决方案1】：

不使用 UDF 试试这个：

import pyspark.sql.functions as F

df2 = df.withColumn(
    'response',
    F.concat(
        F.lit("{\"analyzedContent\": {"),
        F.to_json(F.col("entities")),
        F.lit("}}")
    )
)

【讨论】：