【发布时间】:2018-02-08 04:34:51
【问题描述】:
{
"city": "Tempe",
"state": "AZ",
...
"attributes": [
"BikeParking: True",
"BusinessAcceptsBitcoin: False",
"BusinessAcceptsCreditCards: True",
"BusinessParking: {'garage': False, 'street': False, 'validated': False, 'lot': True, 'valet': False}",
"DogsAllowed: False",
"RestaurantsPriceRange2: 2",
"WheelchairAccessible: True"
],
...
}
您好,我正在使用 PySpark,我正在尝试输出 (state, BusinessAcceptsBitcoin) 的元组,目前我正在做:
csr = (dataset
.filter(lambda e:"city" in e and "BusinessAcceptsBitcoin" in e)
.map(lambda e: (e["city"],e["BusinessAcceptsBitcoin"]))
.collect()
)
但是这个命令失败了。如何获取“BusinessAcceptsBitcoin”和“city”字段?
【问题讨论】:
-
抱歉,不能使用数据框。它只能是 RDD!
标签: python json apache-spark pyspark