【问题标题】:How to optimize a pysapark code to compute the distance by USER?如何优化 pyspark 代码以按 USER 计算距离?
【发布时间】:2020-03-30 11:08:02
【问题描述】:

我想计算每个zoneID 的平均距离。我在pyspark 工作,我正在使用geospark

我的桌子是这样的:

+--------------------+--------+----------+--------------------+--------------------+
|                  ID|    zone|      date|               point|              point1|
+--------------------+--------+----------+--------------------+--------------------+
|04607f5b-746e-455...|00295753|2020-03-18|POINT (-80.161590...|POINT (-80.161590...|
|05df916c-6269-485...|01383864|2020-03-17|POINT (-95.581115...|POINT (-95.581115...|
|1973aa17-863f-4de...|01383847|2020-03-17|POINT (-96.864837...|POINT (-96.864837...|
|1bba1026-dcb3-42f...|00465266|2020-03-17|POINT (-95.823860...|POINT (-95.823860...|
|2a16bc8c-a529-42e...|01266994|2020-03-18|POINT (-101.24329...|POINT (-101.24329...|
|352b142f-616e-46b...|01605066|2020-03-17|POINT (-105.73150...|POINT (-105.73150...|
|66952620-0cc2-4ba...|01383943|2020-03-17|POINT (-96.226104...|POINT (-96.226104...|
|7e901a60-9f16-4a9...|01383886|2020-03-19|POINT (-95.496803...|POINT (-95.496803...|
|80fdf1e3-92ca-4b1...|01383813|2020-03-16|POINT (-97.661605...|POINT (-97.661605...|
|81f3eb49-ef3f-48f...|00066975|2020-03-18|POINT (-93.562011...|POINT (-93.562011...|
+--------------------+--------+----------+--------------------+--------------------+

我想计算每个区域中用户的距离总和以及每天每个区域的不同用户总数。我正在使用geospark,我可以运行这样的简单查询

queryDistances = """
        SELECT ID, date,
        ST_Distance(point, point1) as distance
        FROM myTable
    """

我想测量pointpoint1 之间的距离,并计算每个区域每个ID 每个date 的平均距离以及每个zone 每天不同ID 的总数。

我想要一张这样的桌子

    zone        date        avg(distance)   tot(users)
  00295753    2020-03-18       5.5              74
  01383864    2020-03-17       7.3              117

【问题讨论】:

    标签: python sql pyspark pyspark-sql


    【解决方案1】:

    你需要玩“group by”一段时间。像这样写查询

    select ID, date, AVG(ST_DISTANCE(point,point1)) as avg, count(*) as total
    from myTables 
    group by ID,Zone,Date
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2012-12-05
      • 1970-01-01
      • 2013-01-15
      • 2020-12-26
      • 1970-01-01
      • 2018-08-23
      • 2020-10-21
      • 2010-10-11
      相关资源
      最近更新 更多