【发布时间】:2019-04-09 10:07:13
【问题描述】:
我的 df 架构的一部分:
-- result: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- error: string (nullable = true)
| | |-- hop: long (nullable = true)
| | |-- resuLt: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- from: string (nullable = true)
| | | | |-- rtt: double (nullable = true)
| | | | |-- size: long (nullable = true)
| | | | |-- ttl: long (nullable = true)
| | |-- result: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- Rtt: double (nullable = true)
| | | | |-- Ttl: long (nullable = true)
| | | | |-- dstoptsize: long (nullable = true)
| | | | |-- dup: boolean (nullable = true)
| | | | |-- edst: string (nullable = true)
| | | | |-- err: string (nullable = true)
| | | | |-- error: string (nullable = true)
| | | | |-- flags: string (nullable = true)
| | | | |-- from: string (nullable = true)
| | | | |-- hdropts: array (nullable = true)
| | | | | |-- element: struct (containsNull = true)
| | | | | | |-- mss: long (nullable = true)
| | | | |-- icmpext: struct (nullable = true)
| | | | | |-- obj: array (nullable = true)
| | | | | | |-- element: struct (containsNull = true)
| | | | | | | |-- class: long (nullable = true)
| | | | | | | |-- mpls: array (nullable = true)
| | | | | | | | |-- element: struct (containsNull = true)
| | | | | | | | | |-- exp: long (nullable = true)
| | | | | | | | | |-- label: long (nullable = true)
| | | | | | | | | |-- s: long (nullable = true)
| | | | | | | | | |-- ttl: long (nullable = true)
| | | | | | | |-- type: long (nullable = true)
| | | | | |-- rfc4884: long (nullable = true)
| | | | | |-- version: long (nullable = true)
| | | | |-- itos: long (nullable = true)
| | | | |-- ittl: long (nullable = true)
| | | | |-- late: long (nullable = true)
| | | | |-- mtu: long (nullable = true)
| | | | |-- rtt: double (nullable = true)
| | | | |-- sIze: long (nullable = true)
| | | | |-- size: long (nullable = true)
| | | | |-- tos: long (nullable = true)
| | | | |-- ttl: long (nullable = true)
| | | | |-- x: string (nullable = true)
如何查询嵌套列,例如result.result.dstopsize?我希望能够显示来自result 甚至result.result 或result.resuLt 的所有内容(在我的spark 配置中区分大小写)
当我尝试时:
file_df.select("result.resuLt.dstopsize").show(10)
我收到此错误:
cannot resolve '`result`.`resuLt`['dstopsize']' due to data type mismatch: argument 2 requires integral type, however, ''dstopsize'' is of string type.;;
编辑:这里是一些示例数据
|_corrupt_record| af| dst_addr| dst_name| endtime| from| fw|group_id|lts| msm_id| msm_name|paris_id|prb_id|proto| result|size| src_addr| timestamp| ttr| type|
+---------------+---+---------------+---------------+----------+-------------+----+--------+---+--------+----------+--------+------+-----+--------------------+----+-------------+----------+----+----------+
| null| 4|213.133.109.134|213.133.109.134|1551658584|78.197.253.14|4940| null| 71| 5019|Traceroute| 3| 13230| UDP|[[, 1,, [[,,,,,,,...| 40|192.168.0.130|1551658577|null|traceroute|
| null| 4| 37.143.33.15| 37.143.33.15|1551658584|78.197.253.14|4940|15254159| 71|15254159|Traceroute| 12| 13230| ICMP|[[, 1,, [[,,,,,,,...| 48|192.168.0.130|1551658583|null|traceroute|
| null| 4| 139.162.27.28| 139.162.27.28|1551658612|78.197.253.14|4940| null| 20| 5027|Traceroute| 3| 13230| UDP|[[, 1,, [[,,,,,,,...| 40|192.168.0.130|1551658606|null|traceroute|
| null| 4| 45.33.72.12| 45.33.72.12|1551658610|78.197.253.14|4940| null| 18| 5029|Traceroute| 3| 13230| UDP|[[, 1,, [[,,,,,,,...| 40|192.168.0.130|1551658608|null|traceroute|
| null| 4|104.237.152.132|104.237.152.132|1551658615|78.197.253.14|4940| null| 23| 5028|Traceroute| 3| 13230| UDP|[[, 1,, [[,,,,,,,...| 40|192.168.0.130|1551658608|null|traceroute|
| null| 4| 94.126.208.18| 94.126.208.18|1551658516|37.14.215.183|4940| 9183324| 20| 9183324|Traceroute| 15| 11958| ICMP|[[, 1,, [[,,,,,,,...| 48| 192.168.22.2|1551658439|null|traceroute|
| null| 4|196.192.112.244|196.192.112.244|1551658554|37.14.215.183|4940| 9181461| 25| 9181461|Traceroute| 15| 11958| ICMP|[[, 1,, [[,,,,,,,...| 48| 192.168.22.2|1551658474|null|traceroute|
| null| 4| 46.234.34.8| 46.234.34.8|1551658539|37.14.215.183|4940| 9180758| 10| 9180758|Traceroute| 15| 11958| ICMP|[[, 1,, [[,,,,,,,...| 48| 192.168.22.2|1551658479|null|traceroute|
| null| 4| 185.2.64.76| 185.2.64.76|1551658560|37.14.215.183|4940| 9181290| 31| 9181290|Traceroute| 15| 11958| ICMP|[[, 1,, [[,,,,,,,...| 48| 192.168.22.2|1551658511|null|traceroute|
| null| 4| 208.80.155.69| 208.80.155.69|1551658597|37.14.215.183|4940| 9183716| 8| 9183716|Traceroute| 15| 11958| ICMP|[[, 1,, [[,,,,,,,...| 48| 192.168.22.2|1551658546|null|traceroute|
+---------------+---+---------------+---------------+----------+-------------+----+--------+---+--------+----------+--------+------+-----+--------------------+----+-------------+----------+----+----------+```
【问题讨论】:
-
放入一些可重现的样本数据。这会很有帮助。
-
示例中似乎有一些拼写错误。我们最多只能猜测。例如,您显示要使用 dstopsize 和 resuLt 查询的代码,但架构应该与 result 和 dstoptsize 匹配。
-
我只发布了
result列的架构,其他我可以毫无问题地查询 -
设计一个最小的问题实例对我们来说非常有趣,也许对您来说也是如此。很多时候,这个过程可以更精确地定位问题的根源,有时甚至可以解决问题。然后,如果您此时仍需要帮助,请向我们提供重现问题的方法(示例数据 + 重现问题的最小代码)。我坚持最小化,因为它可以帮助我们更轻松地找到答案,并帮助将来有类似问题的其他人找到答案而无需询问。
-
我的最小版本:case class C(dstoptsize: Long) case class B(result: Array[C]) case class A(result: Array[B]) val df = List(A(Array (B(Array(C(10)))))).toDF df.select("result.result.dstoptsize").show org.apache.spark.sql.AnalysisException: 无法解析'
result.result['dstoptsize']' 由于数据类型不匹配:参数 2 需要整数类型,但是,''dstoptsize'' 是字符串类型。;;
标签: python apache-spark dataframe pyspark apache-spark-sql