【发布时间】:2017-01-24 08:21:22
【问题描述】:
我已将 url 中的 json 数据保存到 spark 文件夹中名为 urljson.json 的 json 文件中。并执行了以下代码以在其上创建数据框 以此
path="urljson.json/"
testdf1=spark.read.json(path)
testdf1.show()
我有这个
执行后
tesdf1.printSchema()
下面的格式显示 根 |-- _corrupt_record: string (nullable = true)
我该如何解决这个问题任何指导将不胜感激 我正在使用火花 2.0
我的 json 数据看起来像这样,它非常大,我已经发布了其中的一部分
result:[{"BldgID":"1006AVE ","BldgName":"100-6th Avenue SW (Oddfellows) ","BldgCity":"Calgary ","BldgState":"AB ","BldgZip":"T2G 2C4 ","BldgAddress1":"100-6th Avenue Southwest ","BldgAddress2":"ZZZ None","BldgPhone":"4035439600 ","BldgLandlord":"1006AV","BldgLandlordName":"100-6 TH Avenue SW Inc. ","BldgManager":"AVANDE","BldgManagerName":"Alyssa Van de Vorst ","BldgManagerType":"Internal","BldgGLA":"34242","BldgEntityID":"1006AVE ","BldgInactive":"N","BldgPropType":"ZZZ None","BldgPropTypeDesc":"ZZZ None","BldgPropSubType":"ZZZ None","BldgPropSubTypeDesc":"ZZZ None","BldgRetailFlag":"N","BldgEntityType":"REIT ","BldgCityName":"Calgary ","BldgDistrictName":"Downtown ","BldgRegionName":"Western Canada ","BldgAccountantID":"KKAUN ","BldgAccountantName":"Kendra Kaun ","BldgAccountantMgrID":"LVALIANT ","BldgAccountantMgrName":"Lorretta Valiant ","BldgFASBStartDate":"2012-10-24","BldgFASBStartDateStr":"2012-10-24"},{"BldgID":"1007AVE ","BldgName":"100-7th Avenue Southwest-Art Central ","BldgCity":"Calgary ","BldgState":"AB ","BldgZip":"T2P 0W4 ","BldgAddress1":"100-7th Avenue Southwest ","BldgAddress2":"ZZZ None","BldgPhone":"4035439600 ","BldgLandlord":"1007AV","BldgLandlordName":"100-7th Avenue SW (Art Central) Inc. ","BldgManager":"LPATER","BldgManagerName":"Lyndsey Paterson ","BldgManagerType":"Internal","BldgGLA":"27127","BldgEntityID":"1007AVE ","BldgInactive":"N","BldgPropType":"ZZZ None","BldgPropTypeDesc":"ZZZ None","BldgPropSubType":"ZZZ None","BldgPropSubTypeDesc":"ZZZ None","BldgRetailFlag":"N","BldgEntityType":"Property Under Dev't ","BldgCityName":"Calgary ","BldgDistrictName":"Downtown ","BldgRegionName":"Western Canada ","BldgAccountantID":"ABRITTON ","BldgAccountantName":"Angie Britton ","BldgAccountantMgrID":"ZZZ None","BldgAccountantMgrName":"ZZZ None","BldgFASBStartDate":"2011-09-01","BldgFASBStartDateStr":"2011-09-01"},{"BldgID":"100LOMB ","BldgName":"100 Lombard Street ","BldgCity":"Toronto ","BldgState":"ON ","BldgZip":"M5C 1M3 ","BldgAddress1":"100 Lombard Street ","BldgAddress2":"ZZZ None","BldgPhone":"4169779002 ","BldgLandlord":"100LOM","BldgLandlordName":"100 Lombard Street Inc. ","BldgManager":"TCHALM","BldgManagerName":"Tiffany Chalmers ","BldgManagerType":"Internal","BldgGLA":"43697.64","BldgEntityID":"100LOMB ","BldgInactive":"N","BldgPropType":"ZZZ None","BldgPropTypeDesc":"ZZZ None","BldgPropSubType":"ZZZ None","BldgPropSubTypeDesc":"ZZZ None","BldgRetailFlag":"N","BldgEntityType":"REIT ","BldgCityName":"Toronto ","BldgDistrictName":"Queen - Richmond ","BldgRegionName":"Central Canada ","BldgAccountantID":"MALLORDE ","BldgAccountantName":"May Ann Allorde ","BldgAccountantMgrID":"TTSANG ","BldgAccountantMgrName":"Tony Tsang ","BldgFASBStartDate":"2005-11-01","BldgFASBStartDateStr":"2005-11-01"},{"BldgID":"10190104","BldgName":"10190-104th Street NW-The Metals Buildi ","BldgCity":"Edmonton ","BldgState":"AB ","BldgZip":"T5J 1A7 ","BldgAddress1":"10190-104st Street SW ","BldgAddress2":"ZZZ None","BldgPhone":"7804234400 ","BldgLandlord":"10190 ","BldgLandlordName":"10190-104 Street Inc. ","BldgManager":"NEWWES","BldgManagerName":"New West Enterprise Property ","BldgManagerType":"Third ","BldgGLA":"20447.75","BldgEntityID":"10190104","BldgInactive":"N","BldgPropType":"ZZZ None","BldgPropTypeDesc":"ZZZ None","BldgPropSubType":"ZZZ None","BldgPropSubTypeDesc":"ZZZ None","BldgRetailFlag":"N","BldgEntityType":"REIT ","BldgCityName":"Edmonton ","BldgDistrictName":"Edmonton ","BldgRegionName":"Western Canada ","BldgAccountantID":"RYANG ","BldgAccountantName":"Raymond Yang ","BldgAccountantMgrID":"LVALIANT ","BldgAccountantMgrName":"Lorretta Valiant ","BldgFASBStartDate":"2011-08-08","BldgFASBStartDateStr":"2011-08-08"}]
【问题讨论】:
-
在不知道你的 json 文件长什么样的情况下很难分辨。
-
问题可能在于 json 文档不在一行中,并且您的 json 文档中有换行符。
-
请输入您的 Json 文件以获取更多详细信息,大多数情况下问题与@RajatMishra 描述的完全相同!
-
我已经发布了 json 数据。这是一个非常大的集合。我已经发布了其中的一部分
标签: json apache-spark pyspark spark-dataframe pyspark-sql