【问题标题】:How to do performance optimization while serializing lots of GeoDjango geometry fields?如何在序列化大量 GeoDjango 几何字段时进行性能优化?
【发布时间】:2017-12-31 06:37:07
【问题描述】:

我正在开发一个 GeoDjango 应用程序,它使用教程中提供的 WorldBorder 模型。我还创建了我自己的与 WorldBorder 相关的区域模型。所以一个 WorldBorder/Country 可以有多个 Region,其中也有边界(MultiPolygon 字段)。

我使用 DRF 为它创建了 API,但它太慢了,加载 GeoJSON 格式的所有 WorldBorder 和 Regions 需要 16 秒。返回的 JSON 大小是 10MB。这合理吗?

我什至将序列化程序更改为serpy,这比DRF GIS 序列化程序快得多,但性能仅提高了 10%。

剖析后发现,大部分时间都花在了 GIS 函数中,将数据库中的数据类型转换为坐标列表,而不是 WKT。如果我使用 WKT,序列化速度会快很多(1.7s 比 11.7s,WKT 只适用于 WorldBorder MultiPolygon,其他一切都在 GeoJson 中)

我还尝试使用具有低容差 (0.005) 的 ST_SimplifyVW 压缩 MultiPolygon,以保持准确性,从而将 JSON 大小降低到 1.7 MB。这使得总负载为 3.5 秒。当然,我仍然可以找到平衡精度和速度的最佳容差。

下面是分析数据(在简化的 MultiPolygon 中查询突然增加是由于 Django QS API 使用 ST_SimplifyVW 的错误使用)

编辑:我修复了数据库查询,因此查询调用在 75 个查询时保持不变,并且正如预期的那样,它不会显着提高性能。

编辑:我继续改进我的数据库查询。我现在把它减少到只有 8 个查询。正如预期的那样,它并没有提高那么多性能。

下面是函数调用的分析。我强调了花费大部分时间的部分。这个是使用 vanilla DRF GIS 实现的。

下面是当我将 WKT 用于没有 ST_SimplifyVW 的 MultiPolygon 字段之一时。

这是@Udi 要求的模型

class WorldBorderQueryset(models.query.QuerySet):
    def simplified(self, tolerance):
        sql = "SELECT ST_SimplifyVW(mpoly, %s) AS mpoly"
        return self.extra(
            select={'mpoly': sql},
            select_params=(tolerance,)
        )


class WorldBorderManager(models.Manager):
    def get_by_natural_key(self, name, iso2):
        return self.get(name=name, iso2=iso2)

    def get_queryset(self, *args, **kwargs):
        qs = WorldBorderQueryset(self.model, using=self._db)
        qs = qs.prefetch_related('regions',)
        return qs

    def simplified(self, level):
        return self.get_queryset().simplified(level)


class WorldBorder(TimeStampedModel):
    name = models.CharField(max_length=50)
    area = models.IntegerField(null=True, blank=True)
    pop2005 = models.IntegerField('Population 2005', default=0)
    fips = models.CharField('FIPS Code', max_length=2, null=True, blank=True)
    iso2 = models.CharField('2 Digit ISO', max_length=2, null=True, blank=True)
    iso3 = models.CharField('3 Digit ISO', max_length=3, null=True, blank=True)
    un = models.IntegerField('United Nations Code', null=True, blank=True)
    region = models.IntegerField('Region Code', null=True, blank=True)
    subregion = models.IntegerField('Sub-Region Code', null=True, blank=True)
    lon = models.FloatField(null=True, blank=True)
    lat = models.FloatField(null=True, blank=True)

    # generated from lon lat to be one field so that it can be easily
    # edited in admin
    center_coordinates = models.PointField(blank=True, null=True)

    mpoly = models.MultiPolygonField(help_text='Borders')

    objects = WorldBorderManager()

    def save(self, *args, **kwargs):
        if not self.center_coordinates:
            self.center_coordinates = Point(x=self.lon, y=self.lat)
        super().save(*args, **kwargs)

    def natural_key(self):
        return self.name, self.iso2

    def __str__(self):
        return self.name

    class Meta:
        verbose_name = 'Country'
        verbose_name_plural = 'Countries'
        ordering = ('name',)


class Region(TimeStampedModel):
    name = models.CharField(max_length=100, unique=True)
    country = models.ForeignKey(WorldBorder, related_name='regions')
    mpoly = models.MultiPolygonField(help_text='Areas')
    center_coordinates = models.PointField()

    moment_category = models.ForeignKey('moment.MomentCategory',
                                        blank=True, null=True)

    objects = RegionManager()
    no_joins = models.Manager()

    def natural_key(self):
        return (self.name,)

    def __str__(self):
        return self.name


# TODO might want to have separate table for ActiveCity for performance
# improvement since we have like 50k cities
class City(TimeStampedModel):
    country = models.ForeignKey(WorldBorder, on_delete=models.PROTECT,
                                related_name='cities')
    region = models.ForeignKey(Region, blank=True, null=True,
                               related_name='cities',
                               on_delete=models.SET_NULL)

    name = models.CharField(max_length=255)
    accent_city = models.CharField(max_length=255)
    population = models.IntegerField(blank=True, null=True)
    is_capital = models.BooleanField(default=False)

    center_coordinates = models.PointField()

    # is active marks that this city is a destination
    # only cities with is_active True will be put up to the frontend
    is_active = models.BooleanField(default=False)

    objects = DefaultSelectOrPrefetchManager(
        prefetch_related=(
            'yes_moment_beacons__activity__verb',
            'social_beacons',
            'video_beacons'
        ),
        select_related=('region', 'country')
    )
    no_joins = models.Manager()

    def natural_key(self):
        return (self.name,)

    def __str__(self):
        return self.name

    class Meta:
        verbose_name_plural = 'Cities'

class Beacon(TimeStampedModel):
    # if null defaults to city center coordinates
    coordinates = models.PointField(blank=True, null=True)
    is_fake = models.BooleanField(default=False)

    # can use city here, but the %(class)s gives no space between words
    # and it looks ugly

    def validate_activity(self):
        # activities in the region
        activities = self.city.region.moment_category.activities.all()
        if self.activity not in activities:
            raise ValidationError('Activity is not in the Region')

    def clean(self):
        self.validate_activity()

    def save(self, *args, **kwargs):
        # doing a full clean is needed here is to ensure code correctness
        # (not user),
        # because if someone use objects.create, clean() will never get called,
        # cons is validation will be done twice if the object is
        # created e.g. from admin
        self.full_clean()

        if not self.coordinates:
            self.coordinates = self.city.center_coordinates
        super().save(*args, **kwargs)

    class Meta:
        abstract = True


class YesMomentBeacon(Beacon):
    activity = models.ForeignKey('moment.Activity',
                                 on_delete=models.CASCADE,
                                 related_name='yes_moment_beacons')
    # ..........
    # other fields

    city = models.ForeignKey('world.City', related_name='yes_moment_beacons')

    objects = DefaultSelectOrPrefetchManager(
        select_related=('activity__verb',)
    )

    def __str__(self):
        return '{} - {}'.format(self.activity, self.coordinates)

# other beacon types.......

这是@Udi 要求的我的序列化程序

class RegionInWorldSerializer(GeoFeatureModelSerializer):
    yes_moment_beacons = serializers.SerializerMethodField()
    social_beacons = serializers.SerializerMethodField()
    video_beacons = serializers.SerializerMethodField()

    center_coordinates = GeometrySerializerMethodField()

    def get_center_coordinates(self, obj):
        return obj.center_coordinates

    def get_yes_moment_beacons(self, obj):
        count = 0

        # don't worry, it's already prefetched in the manager
        # (including the below methods) so len is used instead of count
        cities = obj.cities.all()

        for city in cities:
            beacons = city.yes_moment_beacons.all()
            count += len(beacons)
        return count

    def get_social_beacons(self, obj):
        count = 0

        cities = obj.cities.all()

        for city in cities:
            beacons = city.social_beacons.all()
            count += len(beacons)
        return count

    def get_video_beacons(self, obj):
        count = 0

        cities = obj.cities.all()

        for city in cities:
            beacons = city.video_beacons.all()
            count += len(beacons)
        return count

    class Meta:
        model = Region
        geo_field = 'center_coordinates'
        fields = ('name', 'yes_moment_beacons', 'video_beacons',
                  'social_beacons')


class WorldSerializer(GeoFeatureModelSerializer):
    center_coordinates = GeometrySerializerMethodField()

    regions = RegionInWorldSerializer(many=True, read_only=True)

    def get_center_coordinates(self, obj):
        return obj.center_coordinates

    class Meta:
        model = WorldBorder
        geo_field = 'mpoly'

        fields = ('name', 'iso2', 'center_coordinates', 'regions')

这是主要查询

def get_queryset(self):
    tolerance = self.request.GET.get('tolerance', None)
    if tolerance is not None:
        tolerance = float(tolerance)
        return WorldBorder.objects.simplified(tolerance)
    else:
        return WorldBorder.objects.all()

这是使用 ST_SimplifyVW 的 API 响应片段(236 个对象中的 1 个),具有高容差。如果我不使用它,Firefox 会挂起,因为我认为它无法处理 10 MB 的 JSON。与其他国家相比,这个特定国家的边界​​数据很小。由于 ST_SimplifyVW,此处返回的 JSON 从 10MB 压缩到 750kb。即使只有 750KB 的 JSON,在我的本地机器上也需要 4.5 秒。

{
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "geometry": {
        "coordinates": [
          [
            [
              [
                74.915741,
                37.237328
              ],
              [
                74.400543,
                37.138962
              ],
              [
                74.038315,
                36.814682
              ],
              [
                73.668304,
                36.909637
              ],
              [
                72.556641,
                36.821266
              ],
              [
                71.581131,
                36.346443
              ],
              [
                71.18779,
                36.039444
              ],
              [
                71.647766,
                35.419991
              ],
              [
                71.496094,
                34.959435
              ],
              [
                70.978592,
                34.504997
              ],
              [
                71.077209,
                34.052216
              ],
              [
                70.472214,
                33.944153
              ],
              [
                70.002777,
                34.052773
              ],
              [
                70.323318,
                33.327774
              ],
              [
                69.561096,
                33.08194
              ],
              [
                69.287491,
                32.526382
              ],
              [
                69.328247,
                31.940365
              ],
              [
                69.013885,
                31.648884
              ],
              [
                68.161102,
                31.830276
              ],
              [
                67.575546,
                31.53194
              ],
              [
                67.778046,
                31.332218
              ],
              [
                66.727768,
                31.214996
              ],
              [
                66.395538,
                30.94083
              ],
              [
                66.256653,
                29.85194
              ],
              [
                65.034149,
                29.541107
              ],
              [
                64.059143,
                29.41444
              ],
              [
                63.587212,
                29.503887
              ],
              [
                62.484436,
                29.406105
              ],
              [
                60.868599,
                29.863884
              ],
              [
                61.758331,
                30.790276
              ],
              [
                61.713608,
                31.383331
              ],
              [
                60.85305,
                31.494995
              ],
              [
                60.858887,
                32.217209
              ],
              [
                60.582497,
                33.066101
              ],
              [
                60.886383,
                33.557213
              ],
              [
                60.533882,
                33.635826
              ],
              [
                60.508331,
                34.140274
              ],
              [
                60.878876,
                34.319717
              ],
              [
                61.289162,
                35.626381
              ],
              [
                62.029716,
                35.448601
              ],
              [
                62.309158,
                35.141663
              ],
              [
                63.091934,
                35.432495
              ],
              [
                63.131378,
                35.865273
              ],
              [
                63.986107,
                36.038048
              ],
              [
                64.473877,
                36.255554
              ],
              [
                64.823044,
                37.138603
              ],
              [
                65.517487,
                37.247215
              ],
              [
                65.771927,
                37.537498
              ],
              [
                66.302765,
                37.323608
              ],
              [
                67.004166,
                37.38221
              ],
              [
                67.229431,
                37.191933
              ],
              [
                67.765823,
                37.215546
              ],
              [
                68.001389,
                36.936104
              ],
              [
                68.664154,
                37.274994
              ],
              [
                69.246643,
                37.094154
              ],
              [
                69.515823,
                37.580826
              ],
              [
                70.134995,
                37.529045
              ],
              [
                70.165543,
                37.871719
              ],
              [
                70.71138,
                38.409866
              ],
              [
                70.97998,
                38.470459
              ],
              [
                71.591934,
                37.902618
              ],
              [
                71.429428,
                37.075829
              ],
              [
                71.842758,
                36.692101
              ],
              [
                72.658508,
                37.021202
              ],
              [
                73.307205,
                37.462753
              ],
              [
                73.819717,
                37.228058
              ],
              [
                74.247208,
                37.409546
              ],
              [
                74.915741,
                37.237328
              ]
            ]
          ]
        ],
        "type": "MultiPolygon"
      },
      "properties": {
        "name": "Afghanistan",
        "iso2": "AF",
        "center_coordinates": {
          "coordinates": [
            65.216,
            33.677
          ],
          "type": "Point"
        },
        "regions": {
          "type": "FeatureCollection",
          "features": [
            {
              "type": "Feature",
              "geometry": {
                "coordinates": [
                  66.75292967820785,
                  34.52466146754814
                ],
                "type": "Point"
              },
              "properties": {
                "name": "Central Afghanistan",
                "yes_moment_beacons": 0,
                "video_beacons": 0,
                "social_beacons": 0
              }
            },
            {
              "type": "Feature",
              "geometry": {
                "coordinates": [
                  69.69726561529792,
                  35.96022296494905
                ],
                "type": "Point"
              },
              "properties": {
                "name": "Northern Highlands",
                "yes_moment_beacons": 0,
                "video_beacons": 0,
                "social_beacons": 0
              }
            },
            {
              "type": "Feature",
              "geometry": {
                "coordinates": [
                  63.89541422401191,
                  32.27442932956255
                ],
                "type": "Point"
              },
              "properties": {
                "name": "Southwestern Afghanistan",
                "yes_moment_beacons": 0,
                "video_beacons": 0,
                "social_beacons": 0
              }
            }
          ]
        }
      }
    },
    ........
}

所以这里的重点是,GeoDjango 没有我预期的那么快,还是预期的性能数字?在仍然输出 GeoJSON(即不是 WKT)的同时,我能做些什么来提高性能。微调容差是唯一的方法吗?不过,我也可能会分离端点以获取区域。

【问题讨论】:

  • 好吧,传输格式的序列化是众所周知的性能瓶颈之一。您要序列化多少个模型记录,以及多深?
  • @Jason 实际上对象并不多。它有点深但不太深(如 5-6 深),但每个对象都有大的 MultiPolygon 数据。我只有 236 个对象,但由于所述 MultiPolygon,它创建了 10 MB 的 JSON(未压缩)。最后我遇到了一个让我思考的问题,是不是 Python 这么慢? (或者其他语言的序列化比较慢?)
  • 有趣。您可能希望将其提交给 geodjango 开发者邮件列表。
  • 我认为您应该只返回数字数据,而不返回地理多边形,并将所有区域/国家多边形缓存在预先计算的 geojson 中(因为它们不会改变),将组合任务留给客户端。即,使用该国家/地区所有地区的地理数据创建一个/country/123 API 调用或静态文件,可能会提前简化。
  • 另一个问题:使用聚合而不是在get_yes_moment_beacons上查找示例

标签: python django gis geojson geodjango


【解决方案1】:

由于您的地理数据不会经常更改,请尝试在预先计算的 geojson 中缓存所有地区/国家多边形。即,使用该国家/地区所有地区的地理数据创建一个/country/123.geojson API 调用或静态文件,可能提前简化。

您的其他 API 调用应该只返回数字数据,没有地理多边形,将组合任务留给客户端。

【讨论】:

    【解决方案2】:

    以防其他人在尝试优化 PostGIS 时遇到类似问题。加速序列化过程的一种可能的解决方案是直接在数据库端完成,而不依赖于 geodjango。这个函数将接受一个查询集并返回 geojson。当您需要即时进行大量 geojson 序列化时,它很有用。

    def queryset_geojson_serializer(queryset: QuerySet,
                                fields: List[str],
                                spatial_field: str,
                                bbox: Tuple[float, float, float, float] = None,
                                limit: int = 1000,
                                decimal_places: int = 4,
                                ) -> str:
    
    """
    Serialize queryset into geojson. This function cannot handle related fields.
    
    :param queryset: Queryset with Spataial Field. All fields must be part of queryset and not related.
    :param fields: List of Fields to include as parameters.
    :param spatial_field: Name of spatial field.
    :param bbox: Tuple of bounding box. (xmin, ymin, xmax, ymax)
    :param limit: Limit number of returned values.
    :param decimal_places: Number of decimal places in geojson geometry.
    :return: Geojson of dataset
    """
    
    # test to make sure that all fields are in the model
    # this returns a better error when the field isn't present.
    model = queryset.model
    fields_for_test = fields + [spatial_field]
    for f in fields_for_test:
        model._meta.get_field(f)
    
    if not isinstance(limit, int):
        raise ValueError(f'limit must be an integer, not {type(limit)}')
    
    # try to get srid from the spatial field
    if model._meta.get_field(spatial_field).srid:
        srid = model._meta.get_field(spatial_field).srid
    else:
        srid = 4326
    
    # filter queryset to ensure that spatial field not Null
    queryset = queryset.filter(**{f"{spatial_field}__isnull": False})
    
    # get unique pk list
    pk = model._meta.pk.name
    qs_id_list = tuple(queryset.values_list(pk, flat=True)[0:limit])
    
    # if values
    if len(qs_id_list) > 0:
        # generate initial SQL
        query_raw = f'SELECT {", ".join(fields)}, st_AsGeoJSON({spatial_field}, {decimal_places}) AS geojson FROM {model._meta.db_table}'
    
        # select only values in queryset
        where = f' WHERE {pk} IN' + str(qs_id_list)
    
        # filter results based on bounding box in the query
        if bbox:
            where += f' AND loc && ST_MakeEnvelope({bbox[0]}, {bbox[1]}, {bbox[2]}, {bbox[3]}, {srid})'
    
        query_raw += where
    
        result = queryset.raw(query_raw)
    
        # generate features
        features = []
        for v in result:
            properties = {field: str(getattr(v, field)) for i, field in enumerate(fields)}
            feature = {'type': 'Feature',
                       'properties': properties,
                       'geometry': json.loads(v.geojson)
                       }
            features.append(feature)
    else:
        features = []
    
    # convert python dictionary into json
    geojson = json.dumps({
        "type": "FeatureCollection",
        "crs": {"type": "name", "properties": {"name": f"EPSG:{srid}"}},
        'features': features
    })
    
    return geojson
    

    几个注意事项:

    需要 PostGIS。使用st_asGeoJSON 方法创建geojson 特征。

    不适用于相关字段。

    【讨论】:

      【解决方案3】:

      您是否考虑过使用Topojson format? 它大大减少了文件大小。然后可以使用传单、openlayers 将 Topojson 转换回 geojson...

      【讨论】:

      • 您好,感谢您的回复。这意味着我必须创建自己的 TopoJSON 序列化程序,因为我认为它在 Django 中不可用。但是,如果您密切注意我的问题,问题不在于如何压缩数据(因为我是通过在 PostGIS 中使用 ST_SimplifyVW 来完成的),而是 Python 在序列化数据方面非常慢。所以我认为即使我使用 TopoJSON,它仍然会对性能产生巨大影响。因此,预先计算序列化结果并缓存/保存它们是处理 GeoJSON 或 TopoJSON 格式的方法。
      • 是 ST_SimplifyVW 减慢了它的速度吗?我注意到我使用的大多数 PostGIS 功能都很慢。我一直在服务器和本地使用django-geojson,没有明显的延迟。
      • 不,不是。就像我在问题中所说的那样。大部分性能损失是序列化过程(从 DB 中的 WKT 到 Python 数据类型到传输格式 GeoJSON/TopoJSON 等)(参见 cPython 分析器转储)。也许它在你的机器上并不慢,因为数据并不多。在我的项目中,它从 API 返回了 10 兆字节的 JSON。甚至 Firefox 在预览时也会挂起。
      猜你喜欢
      • 1970-01-01
      • 2016-02-16
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2023-03-19
      • 1970-01-01
      • 2021-03-09
      • 2021-12-03
      相关资源
      最近更新 更多