【发布时间】:2018-09-05 13:36:47
【问题描述】:
使用这个查询:
sql("SELECT _location, count(1) FROM tablaTemporal group by _location order by 2 desc" )
我收到这个输出:
+--------------------------------+--------+
|_location |count(1)|
+--------------------------------+--------+
|London, United Kingdom |15 |
|United States |12 |
|Bangalore, India |8 |
|Hyderabad, India |7 |
|Paris, France |6 |
|San Francisco, CA, United States|6 |
|Mountain View, CA, United States|4 |
|Pune, India |4 |
|Bengaluru, Karnataka, India |3 |
+--------------------------------+--------+
但我需要的结果是:
+--------------------------------+--------+
|_location |count(1)|
+--------------------------------+--------+
|United States |22 |
|India |22 |
|United Kingdom |15 |
|France |6 |
+--------------------------------+--------+
因此,我需要使用一些类似的句子:
sql("SELECT SubstringOfLocationFromCharComma(_location), count(1) FROM tablaTemporal group by _location order by 2 desc" )
如何从逗号分隔的字符串中提取最后一个元素?
【问题讨论】:
标签: scala apache-spark apache-spark-sql