【发布时间】:2021-11-14 22:14:57
【问题描述】:
我能够以预期的格式创建结构化 json,但额外的斜杠出现在 JSON 记录中,并且 json 记录显示为字符串对象。
请详细说明解决方案或让我知道缺少什么或是否存在任何其他方法来实现预期结果。
我目前的结果:
{
"awsservices":[
"{"\key":\"string_value"\, \"key":\numeric_value, "\key":\"amazon\web/services"}",
"{"\key":\"string_value"\, \"key":\numeric_value, "\key":\"amazon\web/services"}",
"{"\key":\"string_value"\, \"key":\numeric_value, "\key":\"amazon\web/services"}",
"{"\key":\"string_value"\, \"key":\numeric_value, "\key":\"amazon\web/services"}"
]
}
预期结果:
{
"awsservices":[
{"key":"string_value", "key":numeric_value, "key":"amazon web services"},
{"key":"string_value", "key":numeric_value, "key":"amazon web services"},
{"key":"string_value", "key":numeric_value, "key":"amazon web services"},
{"key":"string_value", "key":numeric_value, "key":"amazon\web/services"}
]
}
我的代码:
SourceDataDYF = glueContext.create_dynamic_frame.from_options(
format_options = {"quoteChar": '"', "escaper":"", "withHeader":True, "separator":"|", "inferSchema":"false"},
connection_type = "s3",
format = "csv",
connection_options = {"paths": "s3:bucket_name/csv_file_path/"], "recurse":True},
transformation_ctx = "SourceDataDYF"
)
StageDataDF = SourceDataDYF.toDF()
print("*******************************: WRITE JSON :*******************************")
PreStageDataDF1 = StageDataDF.select(to_json(struct(*StageDataDF.columns)).alias("json")) \
.groupBy(spark_partition_id()) \
.agg(collect_list("json").alias("awsservices")) \
.select(col("awsservices").cast("string")).coalesce(1)
targetDataDYF = DynamicFrame.fromDF(PreStageDataDF1,glueContext,"PreStageDataDF1")
targetDataJSON = glueContext.write_dynamic_frame.from_options(
frame = targetDataDYF,
connection_type = "s3",
connection_options = {"path": "s3://result_bucket_name/folder_path/", "partitionKeys": []},
format = "json",
transformation_ctx = "targetDataJSON"
)
【问题讨论】:
-
由于数据基本正确,除了“values”是字符串列表而不是dicts,如果不做
.cast("string")会怎样? -
to_json 应该足够了,你不需要在写之前把它刺痛
-
@JonSG,我尝试在 pandas 中使用 Pandas.converted 逻辑及其工作并获得预期的 josn 格式。
标签: python arrays json apache-spark pyspark