【问题标题】:HAVERSINE distance in BigQuery?BigQuery中的HAVERSINE距离?
【发布时间】:2023-03-17 22:17:01
【问题描述】:

我正在寻找一种在 BigQuery 中获取 HAVERSINE() 的方法。例如,如何获取距离任意点最近的气象站?

【问题讨论】:

标签: google-bigquery geo


【解决方案1】:

2019 年更新:BigQuery 现在有一个原生的ST_DISTANCE() 函数,比Haversine 更准确。

例如:

#standardSQL
CREATE TEMP FUNCTION RADIANS(x FLOAT64) AS (
  ACOS(-1) * x / 180
);
CREATE TEMP FUNCTION RADIANS_TO_KM(x FLOAT64) AS (
  111.045 * 180 * x / ACOS(-1)
);
CREATE TEMP FUNCTION HAVERSINE(lat1 FLOAT64, long1 FLOAT64,
                               lat2 FLOAT64, long2 FLOAT64) AS (
  RADIANS_TO_KM(
    ACOS(COS(RADIANS(lat1)) * COS(RADIANS(lat2)) *
         COS(RADIANS(long1) - RADIANS(long2)) +
         SIN(RADIANS(lat1)) * SIN(RADIANS(lat2))))
);

SELECT
  lat,
  lon,
  name,
  HAVERSINE(40.73943, -73.99585, lat, lon) *1000 AS haversine_distance
  , ST_DISTANCE(
      ST_GEOGPOINT(-73.99585, 40.73943)
      , ST_GEOGPOINT(lon,lat)) bqgis_distance
FROM `bigquery-public-data.noaa_gsod.stations`
WHERE lat IS NOT NULL AND lon IS NOT NULL
ORDER BY 1 DESC
LIMIT 4;


使用standard SQL可以定义一个SQL函数来封装逻辑。例如,

#standardSQL
CREATE TEMP FUNCTION RADIANS(x FLOAT64) AS (
  ACOS(-1) * x / 180
);
CREATE TEMP FUNCTION RADIANS_TO_KM(x FLOAT64) AS (
  111.045 * 180 * x / ACOS(-1)
);
CREATE TEMP FUNCTION HAVERSINE(lat1 FLOAT64, long1 FLOAT64,
                               lat2 FLOAT64, long2 FLOAT64) AS (
  RADIANS_TO_KM(
    ACOS(COS(RADIANS(lat1)) * COS(RADIANS(lat2)) *
         COS(RADIANS(long1) - RADIANS(long2)) +
         SIN(RADIANS(lat1)) * SIN(RADIANS(lat2))))
);

SELECT
  lat,
  lon,
  name,
  HAVERSINE(40.73943, -73.99585, lat, lon) AS distance_in_km
FROM `bigquery-public-data.noaa_gsod.stations`
WHERE lat IS NOT NULL AND lon IS NOT NULL
ORDER BY distance_in_km
LIMIT 4;

【讨论】:

  • 我必须在HAVERSINE 中添加CASE WHEN lat1 = lat2 AND long1 = long2 THEN 0 以避免在计算完全相同的位置时出错。
【解决方案2】:

2018 年更新:BigQuery 现在支持原生地理功能。

ST_DISTANCE:返回两个非空 GEOGRAPHY 之间的最短距离(以米为单位)。

纽约到西雅图的距离:

#standardSQL
WITH geopoints AS (
  SELECT ST_GEOGPOINT(lon,lat) p, name, state
  FROM `bigquery-public-data.noaa_gsod.stations`  
)

SELECT ST_DISTANCE(
  (SELECT p FROM geopoints WHERE name='PORT AUTH DOWNTN MANHATTAN WA'),
  (SELECT p FROM geopoints WHERE name='SEATTLE')
)

3866381.55

旧版 SQL 解决方案(标准待定):

SELECT lat, lon, name,
  (111.045 * DEGREES(ACOS(COS(RADIANS(40.73943)) * COS(RADIANS(lat)) * COS(RADIANS(-73.99585) - RADIANS(lon)) + SIN(RADIANS(40.73943)) * SIN(RADIANS(lat))))) AS distance
FROM [bigquery-public-data:noaa_gsod.stations]
HAVING distance>0
ORDER BY distance
LIMIT 4

(基于http://www.plumislandmedia.net/mysql/haversine-mysql-nearest-loc/

【讨论】:

  • 使用标准 SQL,您可以将逻辑放在 SQL UDF 中,而不必将其直接放入查询正文中。
  • 我知道!我做了一个快速尝试,但后来我错过了 DEGREES() 和 RADIANS()。让查询挂起,直到我找出等效的转换,包括缺少 PI()。但我会回来的:)
  • 顺便说一下 - 这将以公里为单位给出距离
猜你喜欢
  • 2019-06-03
  • 2016-04-06
  • 2021-06-21
  • 2010-10-09
  • 1970-01-01
相关资源
最近更新 更多