计算 shapefile 中的密度答案

【问题标题】：Calculating density within a shapefile计算 shapefile 中的密度
【发布时间】：2020-04-10 14:13:35
【问题描述】：

我正在尝试计算 shapefile 中的密度，但我很确定我做错了。我们的想法是根据密度确定哪些地理区域的销售额最高。

Here is a link to the file that I use (testdata.shp)

library(sf)

sample <- st_read("testdata.shp")

sample$area <- st_area(sample$geometry)

density_calc <-sample %>% st_buffer(0) %>% group_by(areas) %>% summarise(`Sales (density)` = sum(sales)/sum(area))

以下是 shapefile 的详细信息：

Geometry set for 2106 features 
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: -120.0065 ymin: 35.00184 xmax: -114.0396 ymax: 42.00221
epsg (SRID):    4326
proj4string:    +proj=longlat +datum=WGS84 +no_defs

我想我的问题是，我真的不知道什么是对什么错，所以我不知道我是否做对了。

对不起，如果这不是最广泛的问题，我只是不太记得我的高中几何！

【问题讨论】：

这段代码没有任何问题，除了你不需要零缓冲区并且你不需要 group_by/summarize 因为areas 在你的数据中是唯一的，所以你最终会所有大小为 1 的组。sample$density = sample$sales/sample$area 也可以正常工作。

标签： r gis sf sp

【解决方案1】：

raster 包有助于使计算变得非常简单，就像使用 R 中的 data.frame 一样：

library(raster)
list.files(workDir)
test_shp <- shapefile(file.path(workDir, 'testdata.shp'))
names(test_shp)
#[1] "distrct"       "sbdstrc"       "terrtry"      
#[4] "region"        "turf"          "sales"        
#[7] "leads"         "cnvrsns"       "areas" 

sum(is.na(test_shp$sales)) #note that 346 polygons have no sales data

#get the area as square kilometers
test_shp$km2 <- area(test_shp) / 10000

#calc the sales density
test_shp$sales_density <- test_shp$sales / test_shp$km2

#calculate the 25th, 50th, and 75th percentile of all polygons
quartiles <- quantile(test_shp$sales_density, probs=c(0.25, 0.5, 0.75), na.rm=TRUE) 

#plot the result, coloring by which percentile the sales density is for a given polygon 
plot(test_shp, col=ifelse(is.na(test_shp$sales_density), 'gray', ifelse(test_shp$sales_density >= quartiles[3], 'dark green', ifelse(test_shp$sales_density >= quartiles[2], 'light green', ifelse(test_shp$sales_density >= quartiles[1], 'yellow', 'red')))), border='transparent')  (eg. >75th, 50-75th, etc.)

#add the legend
legend('bottomleft', legend=c('Q4', 'Q3', 'Q2', 'Q1', 'No data'), pch=15, col=c('dark green', 'light green', 'yellow', 'red', 'gray'))

【讨论】：

不要这样做。 raster 包中的 shapefile 函数返回一个 sp 对象，这些已被弃用，取而代之的是 sf 包中的 sf 对象，就像问题中一样。我看不出问题中的代码有什么问题。
@Spacedman 我在测试有问题的代码时收到以下错误和警告：dist is assumed to be in decimal degrees (arc_degrees). Error in group_by(., areas) : could not find function "group_by" In addition: Warning message: In st_buffer.sfc(st_geometry(x), dist, nQuadSegs, endCapStyle = endCapStyle, : st_buffer does not correctly buffer longitude/latitude data
@Spacedman 我相信您的评论只有在您依赖tidyverse 时才有意义R 中的空间数据与 Hadley Wickham 和其他人开发的软件包的 tidyverse 工作流集成。”这会将您的评论置于更好的上下文中。你同意还是可以提供更多细节？
警告是因为用户正在做一个宽度为零的缓冲区，这没有解释，但通常用于修复损坏的拓扑。错误是因为您还没有完成 library(sf) 或 dplyr 来获取 group_by 函数。您无需“依赖” tidyverse 即可仅使用 sf。
警告是针对sp 包的用户。如果您将数据作为sf 对象读取，那么您的代码几乎都适用于sf 对象。