【发布时间】:2020-03-30 11:08:02
【问题描述】:
我想计算每个zone 中ID 的平均距离。我在pyspark 工作,我正在使用geospark。
我的桌子是这样的:
+--------------------+--------+----------+--------------------+--------------------+
| ID| zone| date| point| point1|
+--------------------+--------+----------+--------------------+--------------------+
|04607f5b-746e-455...|00295753|2020-03-18|POINT (-80.161590...|POINT (-80.161590...|
|05df916c-6269-485...|01383864|2020-03-17|POINT (-95.581115...|POINT (-95.581115...|
|1973aa17-863f-4de...|01383847|2020-03-17|POINT (-96.864837...|POINT (-96.864837...|
|1bba1026-dcb3-42f...|00465266|2020-03-17|POINT (-95.823860...|POINT (-95.823860...|
|2a16bc8c-a529-42e...|01266994|2020-03-18|POINT (-101.24329...|POINT (-101.24329...|
|352b142f-616e-46b...|01605066|2020-03-17|POINT (-105.73150...|POINT (-105.73150...|
|66952620-0cc2-4ba...|01383943|2020-03-17|POINT (-96.226104...|POINT (-96.226104...|
|7e901a60-9f16-4a9...|01383886|2020-03-19|POINT (-95.496803...|POINT (-95.496803...|
|80fdf1e3-92ca-4b1...|01383813|2020-03-16|POINT (-97.661605...|POINT (-97.661605...|
|81f3eb49-ef3f-48f...|00066975|2020-03-18|POINT (-93.562011...|POINT (-93.562011...|
+--------------------+--------+----------+--------------------+--------------------+
我想计算每个区域中用户的距离总和以及每天每个区域的不同用户总数。我正在使用geospark,我可以运行这样的简单查询
queryDistances = """
SELECT ID, date,
ST_Distance(point, point1) as distance
FROM myTable
"""
我想测量point 和point1 之间的距离,并计算每个区域每个ID 每个date 的平均距离以及每个zone 每天不同ID 的总数。
我想要一张这样的桌子
zone date avg(distance) tot(users)
00295753 2020-03-18 5.5 74
01383864 2020-03-17 7.3 117
【问题讨论】:
标签: python sql pyspark pyspark-sql