【发布时间】:2020-03-03 09:21:40
【问题描述】:
我正在尝试找出一种基于邻近度对多个地址进行聚类的方法。我有纬度和经度,在这种情况下是理想的,因为一些集群会跨越城市/邮编边界。我将作为起点的内容与此类似,但表中最多有 10,000 行:
Hospital.Addresses <- tibble(Hospital_Name = c("Massachusetts General Hospital","MGH - Blake Building","Shriners Hospitals for Children — Boston","Yale-New Haven Medical Center", "Memorial Sloan Kettering", "MSKCC Urgent Care Center", "Memorial Sloan Kettering Blood Donation Room"),
Address = c("55 Fruit St", "100 Blossom St", "51 Blossom St", "York St", "1275 York Ave", "425 E 67th St", "1250 1st Avenue Between 67th and 68th Streets"),
City = c("Boston", "Boston", "Boston", "New Haven", "New York", "New York", "New York"),
State = c("MA", "MA", "MA", "CT", "NY", "NY","NY"),
Zip = c("02114","02114","02114", "06504", "10065", "10065", "10065"),
Latitude = c(42.363230, 42.364030, 42.363090, 41.304507, 40.764390, 40.764248, 40.764793),
Longitude = c(-71.068680, -71.069430, -71.066630, -72.936781, -73.956810, -73.957127, -73.957818))
我想对彼此相距约 1 英里的地址组进行聚类,可能无需计算 10,000 个单独点之间的 Haversine 距离。我们可能会使数学变得简单,并粗略估计 1 英里为 0.016 度的纬度或经度。
理想的输出是验证波士顿的 3 家医院位于第 1 组(彼此相距 1 英里以内),纽黑文的医院独立于第 2 组(不在 1 英里范围内)否则),纽约的 3 家医院都在第 3 组中(都在 1 英里范围内)。
我更多的是在寻找 group_near(),而不是 group_by()。
非常感谢任何建议!
【问题讨论】:
-
澄清一下:四舍五入的纬度和经度不能满足您的需求?