使用 R 聚合县级插值数据答案

【问题标题】：Aggregate interpolated data at county level with R使用 R 聚合县级插值数据
【发布时间】：2022-01-19 21:48:54
【问题描述】：

原始数据是州级的，我想下到县级。为此，我先将数据调整到县级，然后进行插值。

状态级别的原始数据如下所示：对象名称“df.sf”

Simple feature collection with 16 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 848097.3 ymin: 2281812 xmax: 1485685 ymax: 3136914
Projected CRS: NTF (Paris) / Lambert zone II + NGF Lallemand height
First 10 features:
   CC_1    cty_area psych_dis      dpr      BMI
1     8 36095215398  19.07783 12.00708 26.07741
2     9 70643514694  19.79458 12.37472 25.63492
3    11   900908400  20.43846 12.49231 25.13092
4    12 29954144448  20.20175 12.37719 26.61934
5     4   404969822  18.00000 11.33333 25.32944
6     2   747005892  18.23529 11.80392 24.56922
7     6 21203879353  19.52273 12.06364 25.83914
8    13 23736584551  21.46269 13.32836 26.73993
9     3 48207855936  20.56610 12.80000 25.68827
10    5 34345207798  19.95376 12.45279 26.12742
                         geometry
1  MULTIPOLYGON (((1077800 231...
2  MULTIPOLYGON (((1156417 230...
3  MULTIPOLYGON (((1338714 287...
4  MULTIPOLYGON (((1279750 284...
5  MULTIPOLYGON (((1029656 291...
6  MULTIPOLYGON (((1125077 297...
7  MULTIPOLYGON (((1058410 253...
8  MULTIPOLYGON (((1203756 304...
9  MULTIPOLYGON (((894800.9 29...
10 MULTIPOLYGON (((1004934 264...

县级调整后的原始数据如下：对象名称“bl.lk”

Simple feature collection with 401 features and 4 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 848080 ymin: 2281823 xmax: 1485607 ymax: 3136923
Projected CRS: NTF (Paris) / Lambert zone II + NGF Lallemand height
First 10 features:
    AGS      BMI      dpr psych_dis                       geometry
1  1001 26.68627 11.61818  19.20000 MULTIPOLYGON (((1064024 311...
2  1002 26.68627 11.61818  19.20000 MULTIPOLYGON (((1111437 307...
3  1003 26.68662 11.62940  19.21484 MULTIPOLYGON (((1161867 303...
4  1004 26.68627 11.61818  19.20000 MULTIPOLYGON (((1102924 304...
5  1051 26.68627 11.61818  19.20000 MULTIPOLYGON (((1058696 302...
6  1053 26.68471 11.62609  19.20966 MULTIPOLYGON (((1165074 298...
7  1054 26.68627 11.61818  19.20000 MULTIPOLYGON (((1003984 308...
8  1055 26.68627 11.61818  19.20000 MULTIPOLYGON (((1149981 305...
9  1056 26.67913 11.61881  19.19674 MULTIPOLYGON (((1102964 298...
10 1057 26.68627 11.61818  19.20000 MULTIPOLYGON (((1130439 307...

这是两者的情节：

在此之后，我做了一个 10x10 公里的网格并将其调整到德国的边界：

编辑：我是如何制作网格的

### I made first an sf object with just the borders of Germany

ger <- map %>% 
  summarize()

### I made the 10 x 10 km² grid

grid <- sf::st_make_grid(ger, cellsize = c(10*1000,10*1000))[ger]

### turning back to an sf object 
grid<-sf::st_sf(grid) 

### adapt to the borders

grid<-sf::st_intersection(grid,ger)

对象名称“网格”

Simple feature collection with 3885 features and 0 fields
Geometry type: GEOMETRY
Dimension:     XY
Bounding box:  xmin: 848097.3 ymin: 2281812 xmax: 1485685 ymax: 3136914
Projected CRS: NTF (Paris) / Lambert zone II + NGF Lallemand height
First 10 features:
                             grid
1  POLYGON ((1186108 2291812, ...
2  POLYGON ((1198097 2291812, ...
3  POLYGON ((1198097 2291812, ...
4  POLYGON ((998097.3 2301812,...
5  MULTIPOLYGON (((1008097 230...
6  MULTIPOLYGON (((1014797 230...
7  POLYGON ((1028097 2301812, ...
8  POLYGON ((1028097 2301812, ...
9  MULTIPOLYGON (((1185004 229...
10 POLYGON ((1188097 2301812, ...

网格图：

在使用球形模型拟合我的变异函数模型后，我在新网格上插入了我的数据。因此我使用了“gstat”包。

这是我的变异函数图：

编辑第 2 部分：我如何制作“dpr.krg”

dpr.krg<-gstat::krige(dpr~1,blk.sp,
            newdata = grid, model=fv.dpr)

### the object "blk.sp" is the spatial object of bl.lk
blk.sp<-as_Spatial(bl.lk)

我使用普通克里金法插入数据：对象名称：“dpr.krg”

Simple feature collection with 3885 features and 4 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 848097.3 ymin: 2281812 xmax: 1485685 ymax: 3136914
Projected CRS: NTF (Paris) / Lambert zone II + NGF Lallemand height
First 10 features:
           x       y var1.pred   var1.var                       geometry
1  1185538.0 2291408  12.38589 0.03540585 MULTIPOLYGON (((1186108 229...
2  1195629.5 2286807  12.40707 0.03757786 MULTIPOLYGON (((1198097 229...
3  1201101.8 2288814  12.41246 0.03702310 MULTIPOLYGON (((1198097 229...
4   996635.3 2300961  12.14110 0.03187281 MULTIPOLYGON (((998097.3 23...
5  1004070.0 2298689  12.13117 0.02802191 MULTIPOLYGON (((1008097 230...
6  1009703.8 2300029  12.11553 0.02637842 MULTIPOLYGON (((1014797 230...
7  1023116.7 2300258  12.10389 0.02533177 MULTIPOLYGON (((1028097 230...
8  1030446.8 2300867  12.09814 0.02590088 MULTIPOLYGON (((1028097 230...
9  1186016.8 2297043  12.38064 0.02771807 MULTIPOLYGON (((1185004 229...
10 1193204.4 2297477  12.39328 0.02687693 MULTIPOLYGON (((1188097 230...

我猜它看起来不错：

我的问题从这里开始！

我现在想使用“最近特征”或“交叉点”的方法将插值数据聚合回县级，我不太确定。

我尝试了两种不同的方法。

首先我尝试了这个尝试：

a1<-st_join(bl.lk,dpr.krg,left=T) %>% 
  aggregate(list(.$var1.pred), mean)

a1

Simple feature collection with 3878 features and 9 fields
Attribute-geometry relationship: 0 constant, 8 aggregate, 1 identity
Geometry type: GEOMETRY
Dimension:     XY
Bounding box:  xmin: 848080 ymin: 2281823 xmax: 1485607 ymax: 3136923
Projected CRS: NTF (Paris) / Lambert zone II + NGF Lallemand height
First 10 features:
    Group.1      AGS      BMI      dpr psych_dis        x       y var1.pred
1  11.48022 10042.50 26.21000 11.44444  17.92593 923097.3 2486812  11.48022
2  11.48889 10044.00 26.21000 11.44444  17.92593 917522.5 2479905  11.48889
3  11.49469 10042.50 26.21000 11.44444  17.92593 933097.3 2486812  11.49469
4  11.49718 10042.50 26.21000 11.44444  17.92593 923568.2 2477160  11.49718
5  11.50216 10044.00 26.21000 11.44444  17.92593 914425.8 2488008  11.50216
6  11.51038 10041.00 26.21000 11.44444  17.92593 933200.6 2479981  11.51038
7  11.52432 10043.00 26.21175 11.44758  17.93198 923097.3 2496812  11.52432
8  11.53093 10041.00 26.21000 11.44444  17.92593 927361.3 2471509  11.53093
9  11.53557 10041.00 26.21000 11.44444  17.92593 928485.0 2471459  11.53557
10 11.53784 10042.67 26.21042 11.44521  17.92740 933097.3 2496812  11.53784
      var1.var                       geometry
1  0.011184110 POLYGON ((929972.4 2474426,...
2  0.021920620 POLYGON ((933251.2 2507013,...
3  0.008358825 POLYGON ((929972.4 2474426,...
4  0.017955294 POLYGON ((929972.4 2474426,...
5  0.016334040 POLYGON ((933251.2 2507013,...
6  0.011367280 POLYGON ((928068.5 2477750,...
7  0.008089616 POLYGON ((935840.3 2508670,...
8  0.023942613 POLYGON ((928068.5 2477750,...
9  0.023667212 POLYGON ((928068.5 2477750,...
10 0.009007377 POLYGON ((929972.4 2474426,...

我的第二种方法是：

a2<-st_join(bl.lk,dpr.krg, left=T) %>%
  group_by(AGS) %>%
  summarise(mean(var1.pred))

names(a2)[names(a2) == "mean(var1.pred)"] <- "var1.pred"

a2
Simple feature collection with 401 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 848080 ymin: 2281823 xmax: 1485607 ymax: 3136923
Projected CRS: NTF (Paris) / Lambert zone II + NGF Lallemand height
# A tibble: 401 x 3
     AGS var1.pred                                                    geometry
   <int>     <dbl>                                          <MULTIPOLYGON [m]>
 1  1001      11.7 (((1064024 3116513, 1064892 3115261, 1065097 3113239, 1065~
 2  1002      11.6 (((1111437 3076630, 1112124 3077127, 1113655 3074885, 1114~
 3  1003      11.9 (((1161867 3031796, 1161406 3032579, 1163017 3032765, 1164~
 4  1004      11.6 (((1102924 3044050, 1103845 3044400, 1105259 3043551, 1106~
 5  1051      11.8 (((1058696 3028055, 1058403 3027541, 1059516 3026845, 1059~
 6  1053      12.1 (((1165074 2985679, 1166003 2982501, 1165265 2981093, 1165~
 7  1054      11.8 (((1003984 3083561, 1005616 3084030, 1006873 3081947, 1005~
 8  1055      11.9 (((1149981 3056897, 1149236 3056726, 1149037 3058245, 1150~
 9  1056      11.9 (((1102964 2986154, 1102281 2984642, 1101395 2983211, 1100~
10  1057      11.6 (((1130439 3075640, 1133078 3074357, 1136994 3073664, 1140~
# ... with 391 more rows

我只是对“var1.pred”感兴趣。

如果我现在绘制两者，我意识到，我会得到两个不同的图和值。所以，我不确定我是否使用了正确的方法。我也想过使用“st_intersect”或“st_nearest_feature”，但我不确定如何在脚本中实现这些命令。

a1和a2的图：

另外，我确定我是否被允许这样做。我没有找到任何关于使用插值数据进行回归的论文。

我使用了以下包：sf、gstat、tidyverse、automap

【问题讨论】：

出于好奇，您的指标代表什么？如果我站在你的立场上，我不会使用插值法，而是通过面积或人口权重（县和州都是地方行政单位，应该有良好的人口数据，无论如何面积都很容易计算）。
我认为你在Geographic Information Systems 上可能会有更好的运气，因为问题的关键似乎是你尝试编码背后的方法，而不是调试代码本身
@AriWhatElse - 好吧，如果州一级的数据代表计数，您可以使用 kreise 人口与土地人口的比率（这样的比率加起来为 100%）根据 kreise 水平对它们进行下采样。通过这样做，您将引入土地内每个人口均匀分布的假设；如果您的数据是绝对数字（某物的计数），则这是一个合理的假设。如果它是一个相对数字（某物与其他物的比率），那就更难了。
@JindraLacko - 这是一项代表人口的调查，但我需要阅读调查的方法报告，看看它是否在州一级也具有代表性。所以，我猜这不是一个计数。我已经阅读了小面积估计方法，它看起来很有希望，并且我在 R 中遇到了多边形的分解函数作为实用方法。如果我找到一个解决方案，我会把它作为答案放在这里:) 不过谢谢！
由于这是一个代码问题和一个方法问题，您应该将其拆分为 2：此处的代码问题和 GIS 板或Cross Validated 上的方法问题。先弄清楚方法，然后再回到 SO 完成代码。 Geocomputation with R 是一本免费的在线书籍，应该会有所帮助，this one 和 this one 也应该如此。显然我刚刚完成了空间分析课程！

标签： r aggregate sf gstat spatial-interpolation

【解决方案1】：

好的，我现在不会提供任何类型的答案（也许稍后！），但我想做一个可重现的示例，以便我（和其他人可以尝试）。

编辑现在有一个答案，但可以改进！

@AriWhatElse 如果它符合您的问题，请随时告诉我！

library(sf)
library(dplyr)

nc <- sf::st_transform(st_read(system.file("shape/nc.shp", package="sf")), 2264) 

plot(nc$geometry)

set.seed(42) 
gr <-  sf::st_sf(
    value = rpois(2500, 5), # this is why I needed to set the seed
    geom = st_make_grid(st_as_sfc(st_bbox(nc))
                        , n = c(50, 50))) # we can change here a bit if needed

gr_nc  <- gr[nc,] # just removing not needed pixel

plot(gr_nc["value"], add = TRUE)

然后我们采用两种不同的聚合方式：

a1 <-st_join(nc, gr_nc) %>% 
    aggregate(list(.$value), mean)

a2 <-st_join(nc, gr_nc ) %>%
    group_by(CNTY_) %>%
    summarise(value = mean(value))

plot(a1["value"])
plot(a2["value"])

edit1：编辑以提供情节

edit2：好的，我们找到了“陷阱”

a1 应该是这样的：

a1bis <- aggregate(gr_nc, by = nc, mean)

它会产生与a2相同的结果。

连接很棘手，我认为它会复制nc 中的每个几何图形，如果在您与county 分组并询问它的平均值后，它会与 gr_nc 相交多次，您将获得平均值与县相交。

正如评论中提到的，考虑到每个“像素”覆盖的表面百分比的加权平均值会更好。

【讨论】：