geopandas 和 sf 怎么可能为同一个文件提供不同的汇总统计输出？答案

【问题标题】：How is it possible that geopandas and sf give different summary statistics output for the same file?geopandas 和 sf 怎么可能为同一个文件提供不同的汇总统计输出？
【发布时间】：2021-09-03 05:10:10
【问题描述】：

我将this shapefile 与包含变量z 的多边形一起使用。

根据R中的sf包，z-列的最大值为43，根据Python中的geopandas，z-列的最大值为7。

这怎么可能？

在R:

library(sf)
theshapefile <- read_sf("z_mystery.shp")
summary(theshapefile$z)

  Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-50.00  -34.00  -17.50  -16.91   -1.50   43.00

在Python:

import geopandas as gpd
theshapefile = gpd.read_file("z_mystery.shp")
print(theshapefile.z.describe())

count    78250.000000
mean       -21.110454
std         16.849647
min        -50.000000
25%        -35.500000
50%        -21.000000
75%         -6.500000
max         11.500000
Name: z, dtype: float64

【问题讨论】：

在尝试 python3 路径时，我看到 pyproj.exceptions.CRSError: Invalid projection: epsg:28992: (Internal Proj Error: proj_create: no database context specified)，知道如何解决这个问题吗？版本似乎是最新的。

标签： python r shapefile geopandas sf

【解决方案1】：

我从sf 和geopandas 得到完全相同的结果。

在 python 中（我使用 3.9.6 和 geopandas 版本为 0.9.0）。

import geopandas as gpd
x = gpd.read_file('z_mystery.shp')
print(x['z'].describe())

这个输出：

count    200687.000000
mean        -16.910993
std          20.111462
min         -50.000000
25%         -34.000000
50%         -17.500000
75%          -1.500000
max          43.000000
Name: z, dtype: float64

【讨论】：