【发布时间】:2016-11-16 04:19:47
【问题描述】:
我有一个在 hdfs 中构造的 json 文件。我正在尝试在我的 spark 上下文中读取 json 文件。json 文件格式如下
{"Request": {"TrancheList": {"Tranche": [{"Id": "123","OwnedAmt": "26500000", "Currency": "USD" }, { "Id": "456", "OwnedAmt": "41000000","Currency": "USD"}]},"FxRatesList": {"FxRatesContract": [{"Currency": "CHF","FxRate": "0.97919983706115"},{"Currency": "AUD", "FxRate": "1.2966804979253"},{ "Currency": "USD","FxRate": "1"},{"Currency": "SEK","FxRate": "8.1561012531034"},{"Currency": "NOK", "FxRate": "8.2454981641398"}]},"isExcludeDeals": "true","baseCurrency": "USD"}}
val inputdf = spark.read.json("hdfs://localhost/user/xyz/request.json")
inputdf.printSchema
printSchema 显示以下输出:
root
|-- Request: struct (nullable = true)
| |-- FxRatesList: struct (nullable = true)
| | |-- FxRatesContract: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- Currency: string (nullable = true)
| | | | |-- FxRate: string (nullable = true)
| |-- TrancheList: struct (nullable = true)
| | |-- Tranche: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- Currency: string (nullable = true)
| | | | |-- OwnedAmt: string (nullable = true)
| | | | |-- Id: string (nullable = true)
| |-- baseCurrency: string (nullable = true)
| |-- isExcludeDeals: string (nullable = true)
在 json 中创建 trancheList 部分的数据帧/RDD 的最佳方法应该是什么,以便它为我提供一个不同的 ID 列表,其中包含 OwnedAmt 和 Currency,如下表所示
Id OwnedAmt Currency
123 26500000 USD
456 41000000 USD
任何帮助都会很棒。 谢谢
【问题讨论】:
标签: apache-spark dataframe rdd