【问题标题】:Read JSON files using dataset schema使用数据集模式读取 JSON 文件
【发布时间】:2022-09-27 17:18:11
【问题描述】:

这是一个添加到 Foundry 数据集中的 JSON 文件:

[
  {
    \"name\": \"Tim\",
    \"born\": \"2000 01 01\",
    \"location\": {\"country\": \"UK\", \"city\": \"London\"},
    \"scores\": [
      {\"date\": \"2022 02 01\", \"score\": 4},
      {\"date\": \"2022 03 01\", \"score\": 4}
    ]
  },
  {
    \"name\": \"Kim\",
    \"born\": \"1999 12 31\",
    \"location\": {\"country\": \"LT\", \"city\": \"Vilnius\"},
    \"scores\": [
      {\"date\": \"2022 02 01\", \"score\": 3},
      {\"date\": \"2022 03 01\", \"score\": 5}
    ]
  }
]

数据集当前没有架构,因此预览仅显示文件:

如何添加模式以便我们可以预览 JSON 文件?

数据类型:
\“姓名\”:细绳
“出生”:日期
\“地点\”:地图
“分数”:结构数组(\“日期\”:日期, \“分数\”:整数)

    标签: json types dataset schema palantir-foundry


    【解决方案1】:

    要读取 JSON 文件,"multiline" 选项必须设置为 true
    (如果是 JSONL 文件,则不需要 "multiline" 选项,即 false。)

    为了地图"mapKeyType""mapValueType" 必须填写。
    为了大批, "arraySubtype" 必须填写。
    为了结构, "subSchemas" 必须填写。
    为了日期, "dateFormat" 选项如果不是 "yyyy-MM-dd" 则可能需要。

    正确设置一切将导致此预览:

    使用的架构:

    {
      "fieldSchemaList": [
        {
          "type": "STRING",
          "name": "name",
          "nullable": null,
          "userDefinedTypeClass": null,
          "customMetadata": {},
          "arraySubtype": null,
          "precision": null,
          "scale": null,
          "mapKeyType": null,
          "mapValueType": null,
          "subSchemas": null
        },
        {
          "type": "DATE",
          "name": "born",
          "nullable": null,
          "userDefinedTypeClass": null,
          "customMetadata": {},
          "arraySubtype": null,
          "precision": null,
          "scale": null,
          "mapKeyType": null,
          "mapValueType": null,
          "subSchemas": null
        },
        {
          "type": "MAP",
          "name": "location",
          "nullable": null,
          "userDefinedTypeClass": null,
          "customMetadata": {},
          "arraySubtype": null,
          "precision": null,
          "scale": null,
          "mapKeyType": {
            "type": "STRING",
            "name": null,
            "nullable": null,
            "userDefinedTypeClass": null,
            "customMetadata": {},
            "arraySubtype": null,
            "precision": null,
            "scale": null,
            "mapKeyType": null,
            "mapValueType": null,
            "subSchemas": null
          },
          "mapValueType": {
            "type": "STRING",
            "name": null,
            "nullable": null,
            "userDefinedTypeClass": null,
            "customMetadata": {},
            "arraySubtype": null,
            "precision": null,
            "scale": null,
            "mapKeyType": null,
            "mapValueType": null,
            "subSchemas": null
          },
          "subSchemas": null
        },
        {
          "type": "ARRAY",
          "name": "scores",
          "nullable": null,
          "userDefinedTypeClass": null,
          "customMetadata": {},
          "arraySubtype": {
            "type": "STRUCT",
            "name": null,
            "nullable": null,
            "userDefinedTypeClass": null,
            "customMetadata": {},
            "arraySubtype": null,
            "precision": null,
            "scale": null,
            "mapKeyType": null,
            "mapValueType": null,
            "subSchemas": [
              {
                "type": "DATE",
                "name": "date",
                "nullable": null,
                "userDefinedTypeClass": null,
                "customMetadata": {},
                "arraySubtype": null,
                "precision": null,
                "scale": null,
                "mapKeyType": null,
                "mapValueType": null,
                "subSchemas": null
              },
              {
                "type": "INTEGER",
                "name": "score",
                "nullable": null,
                "userDefinedTypeClass": null,
                "customMetadata": {},
                "arraySubtype": null,
                "precision": null,
                "scale": null,
                "mapKeyType": null,
                "mapValueType": null,
                "subSchemas": null
              }
            ]
          },
          "precision": null,
          "scale": null,
          "mapKeyType": null,
          "mapValueType": null,
          "subSchemas": null
        }
      ],
      "primaryKey": null,
      "dataFrameReaderClass": "com.palantir.foundry.spark.input.DataSourceDataFrameReader",
      "customMetadata": {
        "format": "json",
        "options": {
          "multiline": true,
          "dateFormat": "yyyy MM dd"
        }
      }
    }
    

    Here 可以找到可用的文件读取选项,包括从 JSON 文件读取。

    【讨论】:

      猜你喜欢
      • 2021-03-27
      • 2013-05-15
      • 2019-06-09
      • 1970-01-01
      • 1970-01-01
      • 2021-08-22
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多