我可以通过多边形绑定 st_distance 调用吗？答案

【问题标题】：Can I bound an st_distance call by a polygon?我可以通过多边形绑定 st_distance 调用吗？
【发布时间】：2021-07-30 19:29:59
【问题描述】：

我看到过有关此主题的类似帖子（例如，请参阅 here 和 here），但不是特定于 sf-tidyverse 生态系统的帖子。

我有一系列湖泊，每个湖泊中有一系列采样点，每个湖泊中都有一个“焦点”，代表船只下水的位置。我想计算从船下水到每个样本点的“划船者最短行驶距离”。但是，我想以某种方式使用湖泊多边形“限制”这些距离，这样就无法计算跨陆地的距离。我可以想象这是通过让“直线”沿着湖边的蛇只要需要就可以完成，然后它才能恢复为直线。我还可以想象“直线”被分解成围绕任何中间土地弯曲的线段。我对各种实现持开放态度！

我在其他地方（例如here）看到了将边界多边形转换为栅格的想法，将水设为一个值，将土地设为另一个值，更高的值，然后进行“最小成本距离”越过土地的成本高得令人望而却步。但是，我不知道如何在光栅/SF 生态系统中实际做到这一点。

这是我用来制作这张地图的代码：

library(dplyr)
library(sf)
library(ggplot2)
Moose.ssw = sswMN.sf %>% filter(lake == "Moose")
Moose.lake = MN_lakes4 %>% filter(str_detect(map_label, "Moose")) %>% filter(cty_name == "Beltrami") 
Moose.access = try3 %>% filter(LAKE_NAME == "Moose") %>%  filter(COUNTYNAME == "Beltrami")
Moose.box = st_bbox(Moose.ssw)

ggplot() +
  geom_sf(data = Moose.lake, color="lightblue") + 
  geom_sf(data = Moose.access, color = "red", size = 2) +
  geom_sf(data = Moose.ssw, mapping = aes(color= Nitellopsis_obtusa_n)) +
  coord_sf(xlim = c(Moose.box[1], Moose.box[3]), ylim = c(Moose.box[2], Moose.box[4]))

这里有一些代码可以很好地计算直线距离——也许它可以以某种方式包装？

Moose.pt.dists = st_distance(Moose.access, Moose.ssw, by_element = TRUE)

可以从我的 Github 页面中提取生成上述数据对象所需的文件（它们是通过 dput().Link to my Github 生成的文件。

我是一名可靠的 R 程序员，但我是地理空间工作的新手，所以如果我能指出一个富有成果的方向，我也许可以编写自己的程序来摆脱困境！

【问题讨论】：

标签： r tidyverse geospatial sf r-raster

【解决方案1】：

这是一个使用 sfnetworks 的解决方案，它非常适合 tidyverse。

下面的代码应该

定期对湖的边界框进行采样（创建均匀分布的点）
将它们与最近的邻居联系起来
移除跨越陆地的连接
按照剩下的线找到从船下水道到样本位置的最短路径。

结果并不准确，但非常接近。您可以通过增加采样点的数量来提高精度。下面使用1000分。


library(tidyverse)
library(sf)
library(sfnetworks)
library(nngeo)

# set seed to make the script reproducible,
#  since there is random sampling used
set.seed(813)

# Getting your data:
x <- dget("https://raw.githubusercontent.com/BajczA475/random-data/main/Moose.lake")
# Subset to get just one lake
moose_lake <- x[5,]
boat_ramp <- dget("https://raw.githubusercontent.com/BajczA475/random-data/main/Moose.access")
sample_locations <- dget("https://raw.githubusercontent.com/BajczA475/random-data/main/Moose.ssw")
sample_bbox <- dget("https://raw.githubusercontent.com/BajczA475/random-data/main/Moose.box")


# sample the bounding box with regular square points, then connect each point to the closest 9 points
#  8 should've worked, but left some diagonals out.
sq_grid_sample <- st_sample(st_as_sfc(st_bbox(moose_lake)), size = 1000, type = 'regular') %>% st_as_sf() %>%
  st_connect(.,.,k = 9)

# remove connections that are not within the lake polygon
sq_grid_cropped <- sq_grid_sample[st_within(sq_grid_sample, moose_lake, sparse = F),]

# make an sfnetwork of the cropped grid
lake_network <- sq_grid_cropped %>% as_sfnetwork()

# find the (approximate) distance from boat ramp to point 170 (far north)
pt170 <- st_network_paths(lake_network, 
                          from = boat_ramp,
                          to = sample_locations[170,]) %>%
  pull(edge_paths) %>%
  unlist()

lake_network %>% 
  activate(edges) %>%
  slice(pt170) %>%
  st_as_sf() %>%
  st_combine() %>%
  st_length()
#> 2186.394 [m]

看起来从船下水口到湖最北端的采样位置 170 大约有 2186m。


# Plotting all of the above, path from boat ramp to point 170:

ggplot() + 
  geom_sf(data = sq_grid_sample, alpha = .05) +
  geom_sf(data = sq_grid_cropped, color = 'dodgerblue') + 
  geom_sf(data = moose_lake, color = 'blue', fill = NA) +
  geom_sf(data = boat_ramp, color = 'springgreen', size = 4) + 
  geom_sf(data = sample_locations[170,], color = 'deeppink1', size = 4) +
  geom_sf(data = lake_network %>% 
            activate(edges) %>%
            slice(pt170) %>%
            st_as_sf(),
          color = 'turquoise',
          size = 2) +
  theme_void()

虽然上面只找到并绘制了从船下水道到一个采样点的距离，但网络可以找到所有距离。您可能需要使用*apply 或purrr，也许还需要一个小的自定义函数来查找发射到所有样本点的“一对多”距离。

sfnetworks 上的This page 将有助于编写一对多解决方案。

编辑：

要查找从船下水道到采样点的所有距离：

st_network_cost(lake_network,
                from = boat_ramp,
                to = sample_locations)

这对我来说比 for 循环或下面发布的 sp 解决方案要快得多。由于st_network_cost 将删除任何重复的距离，因此可能需要调整采样中的某些代码。 sample_locations（或@bajcz 答案中的Moose.ssw）也需要清理以删除重复点。 322 行中有 180 个独特点。

【讨论】：

+1 一个很好的答案，也非常优雅！请参阅下面我对我自己的问题的回答——我想我可能在你的代码中发现了一个错字？此外，如果您能弄清楚 *apply 解决方案是什么，可以对我放入 for 循环的位置进行矢量化，我可以将其添加进去，看看它是否会使代码更快！谢谢！
谢谢@Bajcz。我很快会再看一下它的拼写错误和 *apply/purrr 功能。我认为有几个简单的改进可以减少计算时间并提高距离精度。
@Bajcz 我错误地认为您想要到每个点的路线和距离。我添加了一种更快的方法来仅查找距离。在我的测试中，sp 方法大约需要 4 分钟，而（新）sfnetworks 方法大约需要 4 秒。如果您对提高准确性的其他小改动感兴趣，请告诉我。我不想对已接受的答案进行大范围的编辑。
非常感谢！我认为即使在我的特定情况下不需要，知道如何做这些路线也会对许多人有所帮助。我在帖子中提到，在我的上下文中，有时，船坞离湖只是一点，这会导致距离返回为 Inf。我在我的答案中发布了一个解决方案——在它们周围安装一个“假湖”缓冲区。您的 sfnetworks 方法中是否有类似的解决方法？
@Bajcz 有一些解决方法，最好的办法是使用st_network_blend 将停靠点添加到网络。一个简单的解决方案是将网络到码头的固定距离添加到所有计算的距离中。 st_network_cost 不需要扩展坞位于网络上或网络内。

【解决方案2】：

我计划今天回答我自己的问题，因为我刚刚设法使用与我的问题相关的gdistance 和sp 方法获取代码，但mrhellmann 击败了我！由于他们的答案也有效并且非常好和优雅，我想我只是在这里发布代码，利用这两种方法并比较感兴趣的人的结果（如果其中一个或另一个在你的上下文中不起作用） .它们的速度差不多，尽管在 mrhellmann 的 sfnetworks 版本中有一个 for() 循环，所以如果该位可以矢量化，它可能会更快，我相信这是可能的。

   #All the hooplah needed to get things started and talking the same language. 
    x = dget("https://raw.githubusercontent.com/BajczA475/random-data/main/Moose.lake")
    # Subset to get just one lake
    Moose.lake = x[5,]
    Moose.access = dget("https://raw.githubusercontent.com/BajczA475/random-data/main/Moose.access")
    Moose.ssw  = dget("https://raw.githubusercontent.com/BajczA475/random-data/main/Moose.ssw")
    Moose.box = dget("https://raw.githubusercontent.com/BajczA475/random-data/main/Moose.box")
    
    library(sf)
    library(sp)
    library(raster)
    library(gdistance)
    library(tidyverse)
    library(sfnetworks)
    library(nngeo)
    library(mosaic)

这是使用sp 和gdistance 工具的“我的”方式：

    ptm <- proc.time()
    
#Make all the objects needed into spatial class objects.
    test.pts = as(Moose.access, Class = "Spatial") 
    ssw.pts = as(Moose.ssw, Class = "Spatial")
    test.bounds = as(Moose.lake, Class = "Spatial")
    
#Turn the lake into a raster of resolution 1000 x 1000 (which is arbitrary) and then make all points not in the lake = 0 so that no distances can "afford" to cross over land.
    test.raster = raster(extent(test.bounds), nrow=1000, ncol=1000)
    lake.raster = rasterize(test.bounds, test.raster)                       
    lake.raster[is.na(lake.raster)] = 0  
    
    ##For some lakes, although not this one, the public access was just off the lake, which resulted in distances of Inf. This code puts a circular buffer around the dock locations to prevent this. It makes a new raster with 1s at the dock location(s), finds all indices of cells within the buffer distance of each dock (here, 300, which is overkill), and makes the corresponding cells in the lake raster also 1 so that they are considered navigable. This makes the distances slightly inaccurate because it may allow some points to be more easily reachable than they should be, but it is better than the alternative!
    access.raster = rasterize(test.pts, lake.raster, field = 1)
    index.spots = raster::extract(access.raster, test.pts, buffer=300, cellnumbers = T)
    index.spots2 = unlist(lapply(index.spots, "[", , 1))
    lake.raster[index.spots2] = 1
    
#Make a transition matrix so that the cost of moving from one cell to the next can be understood. TBH, I don't understand this part well beyond that.
    transition.mat1 = transition(lake.raster, transitionFunction=mean, directions=16) #This code does take a little while.
    transition.mat1 = geoCorrection(transition.mat1,type="c") 
    
#Calculates the actual cost-based distances.
    dists.test = costDistance(transition.mat1, test.pts, ssw.pts)
    proc.time() - ptm  #About 55 seconds on my laptop.

然后为了比较，这里是 mrhellmann 提供的 sfnetworks 方式。

ptm <- proc.time()

sq_grid_sample = st_sample(st_as_sfc(st_bbox(Moose.lake)), size = 1000, type = 'regular') %>% st_as_sf() %>%
  st_connect(.,.,k = 9)

sq_grid_cropped = sq_grid_sample[st_within(sq_grid_sample, Moose.lake, sparse = F)] #I needed to delete a comma in this line to get it to work--I don't think we needed row,column indexing here.

lake_network = sq_grid_cropped %>% as_sfnetwork()

##Using their code to generate all the distances between the dock and the sample points. This might be vectorizable. 
dists.test2 = rep(0, nrow(Moose.ssw))
for (i in 1:nrow(Moose.ssw)) {
dist.pt = st_network_paths(lake_network, 
                          from = Moose.access,
                          to = Moose.ssw[i,]) %>%
  pull(edge_paths) %>%
  unlist() #This produces a vertices we must go through I think?

dists.test2[i] = lake_network %>% 
  activate(edges) %>%
  slice(dist.pt) %>% 
  st_as_sf() %>%
  st_combine() %>%
  st_length()
}
proc.time() - ptm #About 58 seconds on my laptop.

我认为这里有两张图表表明两种方法都产生了相对相似的数字，没有很多偏差迹象，尽管看起来sfnetworks 方法的一些距离最终会更长一些。第一个是使用这两种方法的距离估计的重叠密度图，第二个是相互绘制的距离的“1 对 1”散点图，您可以看到拟合到 1 对 1 线会很不错的。

hist.df = data.frame(dists = c(dists.test, dists.test2), version = rep(c("sp", "sfnetworks"), each = 322))

gf_density(~dists, fill=~version, data=hist.df )
gf_point(dists.test2~dists.test)

【讨论】：