【问题标题】:Efficient join using Django REST Framework serializers使用 Django REST Framework 序列化程序进行高效连接
【发布时间】:2017-06-20 20:21:26
【问题描述】:

我正在使用以下一组序列化程序来实现连接,这在我的开发设置中运行良好,但在网站服务器和数据库服务器之间存在任何距离时表现非常糟糕。我对运行节目的 SQL 产生了怀疑,并做了一些记录;似乎它正在对每个条目进行新查询并组合结果,而不是一次完成整个连接并像我想要的那样返回连接。这是我的序列化程序:

class UserSerializer(serializers.ModelSerializer):
    class Meta:
        model = User
        exclude = ('password', 'last_login', 'is_superuser', 'is_staff', 'is_active', 'date_joined',
                   'groups', 'user_permissions')


class DepartmentSerializer(serializers.HyperlinkedModelSerializer):
    curator = UserSerializer()
    class Meta:
        model = Department
        fields = '__all__'


class CategorySerializer(serializers.HyperlinkedModelSerializer):
    class Meta:
        model = Category
        fields = '__all__'


class DetailedLinkedContentSerializer(serializers.HyperlinkedModelSerializer):
    category = CategorySerializer()
    department = DepartmentSerializer()
    type = serializers.SerializerMethodField()

    class Meta:
        fields = '__all__'
        model = LinkedContent

    def get_type(self, obj):
        return 'link'


class DetailedFileContentSerializer(serializers.HyperlinkedModelSerializer):
    category = CategorySerializer()
    department = DepartmentSerializer()
    link_url = serializers.SerializerMethodField()
    type = serializers.SerializerMethodField()

    class Meta:
        fields = '__all__'
        model = FileContent

    def get_link_url(self, obj):
        return obj.file.url

    def get_type(self, obj):
        return obj.file_type

如您所见,我正在通过将序列化器中的字段作为其他模型的序列化器来进行“加入”,例如 category = CategorySerializer()。看起来这就是DRF recommends,除非我误解了什么。以下是在我的开发环境中运行的数百个查询的一个小示例:

(0.001) SELECT "content_linkedcontent"."id", "content_linkedcontent"."link_text", "content_linkedcontent"."department_id", "content_linkedcontent"."category_id", "content_linkedcontent"."visibility_rank", "content_linkedcontent"."link_url" FROM "content_linkedcontent"; args=()
(0.001) SELECT "content_category"."id", "content_category"."name", "content_category"."description" FROM "content_category" WHERE "content_category"."id" = 3; args=(3,)
(0.001) SELECT "content_department"."id", "content_department"."name", "content_department"."description", "content_department"."curator_id", "content_department"."visibility_rank" FROM "content_department" WHERE "content_department"."id" = 24; args=(24,)
(0.000) SELECT "auth_user"."id", "auth_user"."password", "auth_user"."last_login", "auth_user"."is_superuser", "auth_user"."username", "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email", "auth_user"."is_staff", "auth_user"."is_active", "auth_user"."date_joined" FROM "auth_user" WHERE "auth_user"."id" = 3; args=(3,)
(0.000) SELECT "content_category"."id", "content_category"."name", "content_category"."description" FROM "content_category" WHERE "content_category"."id" = 3; args=(3,)
(0.000) SELECT "content_department"."id", "content_department"."name", "content_department"."description", "content_department"."curator_id", "content_department"."visibility_rank" FROM "content_department" WHERE "content_department"."id" = 29; args=(29,)
(0.000) SELECT "auth_user"."id", "auth_user"."password", "auth_user"."last_login", "auth_user"."is_superuser", "auth_user"."username", "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email", "auth_user"."is_staff", "auth_user"."is_active", "auth_user"."date_joined" FROM "auth_user" WHERE "auth_user"."id" = 6; args=(6,)
(0.000) SELECT "content_category"."id", "content_category"."name", "content_category"."description" FROM "content_category" WHERE "content_category"."id" = 4; args=(4,)
(0.000) SELECT "content_department"."id", "content_department"."name", "content_department"."description", "content_department"."curator_id", "content_department"."visibility_rank" FROM "content_department" WHERE "content_department"."id" = 25; args=(25,)
(0.000) SELECT "auth_user"."id", "auth_user"."password", "auth_user"."last_login", "auth_user"."is_superuser", "auth_user"."username", "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email", "auth_user"."is_staff", "auth_user"."is_active", "auth_user"."date_joined" FROM "auth_user" WHERE "auth_user"."id" = 6; args=(6,)
(0.000) SELECT "content_category"."id", "content_category"."name", "content_category"."description" FROM "content_category" WHERE "content_category"."id" = 1; args=(1,)
(0.000) SELECT "content_department"."id", "content_department"."name", "content_department"."description", "content_department"."curator_id", "content_department"."visibility_rank" FROM "content_department" WHERE "content_department"."id" = 29; args=(29,)
(0.000) SELECT "auth_user"."id", "auth_user"."password", "auth_user"."last_login", "auth_user"."is_superuser", "auth_user"."username", "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email", "auth_user"."is_staff", "auth_user"."is_active", "auth_user"."date_joined" FROM "auth_user" WHERE "auth_user"."id" = 6; args=(6,)
(0.000) SELECT "content_category"."id", "content_category"."name", "content_category"."description" FROM "content_category" WHERE "content_category"."id" = 1; args=(1,)
(0.000) SELECT "content_department"."id", "content_department"."name", "content_department"."description", "content_department"."curator_id", "content_department"."visibility_rank" FROM "content_department" WHERE "content_department"."id" = 25; args=(25,)
(0.000) SELECT "auth_user"."id", "auth_user"."password", "auth_user"."last_login", "auth_user"."is_superuser", "auth_user"."username", "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email", "auth_user"."is_staff", "auth_user"."is_active", "auth_user"."date_joined" FROM "auth_user" WHERE "auth_user"."id" = 6; args=(6,)
(0.000) SELECT "content_category"."id", "content_category"."name", "content_category"."description" FROM "content_category" WHERE "content_category"."id" = 1; args=(1,)
(0.000) SELECT "content_department"."id", "content_department"."name", "content_department"."description", "content_department"."curator_id", "content_department"."visibility_rank" FROM "content_department" WHERE "content_department"."id" = 24; args=(24,)
(0.000) SELECT "auth_user"."id", "auth_user"."password", "auth_user"."last_login", "auth_user"."is_superuser", "auth_user"."username", "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email", "auth_user"."is_staff", "auth_user"."is_active", "auth_user"."date_joined" FROM "auth_user" WHERE "auth_user"."id" = 3; args=(3,)
(0.000) SELECT "content_category"."id", "content_category"."name", "content_category"."description" FROM "content_category" WHERE "content_category"."id" = 3; args=(3,)
(0.000) SELECT "content_department"."id", "content_department"."name", "content_department"."description", "content_department"."curator_id", "content_department"."visibility_rank" FROM "content_department" WHERE "content_department"."id" = 28; args=(28,)
(0.000) SELECT "auth_user"."id", "auth_user"."password", "auth_user"."last_login", "auth_user"."is_superuser", "auth_user"."username", "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email", "auth_user"."is_staff", "auth_user"."is_active", "auth_user"."date_joined" FROM "auth_user" WHERE "auth_user"."id" = 6; args=(6,)
(0.000) SELECT "content_category"."id", "content_category"."name", "content_category"."description" FROM "content_category" WHERE "content_category"."id" = 1; args=(1,)
(0.000) SELECT "content_department"."id", "content_department"."name", "content_department"."description", "content_department"."curator_id", "content_department"."visibility_rank" FROM "content_department" WHERE "content_department"."id" = 28; args=(28,)
(0.000) SELECT "auth_user"."id", "auth_user"."password", "auth_user"."last_login", "auth_user"."is_superuser", "auth_user"."username", "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email", "auth_user"."is_staff", "auth_user"."is_active", "auth_user"."date_joined" FROM "auth_user" WHERE "auth_user"."id" = 6; args=(6,)
(0.000) SELECT "content_category"."id", "content_category"."name", "content_category"."description" FROM "content_category" WHERE "content_category"."id" = 4; args=(4,)

那么,如何在 DRF 中使用序列化程序对想要的信息进行真正的联接?

更新:

按照this 博客条目上的建议,我设法将查询时间缩短了一半,这是我更新的序列化程序和使用它们的视图:

class DetailedLinkedContentSerializer(serializers.HyperlinkedModelSerializer):
    category = CategorySerializer()
    department_query = Department.objects.all()
    department_query = DepartmentSerializer.setup_eager_loading(department_query)
    department = DepartmentSerializer(department_query)
    # department = DepartmentSerializer()
    type = serializers.SerializerMethodField()

    class Meta:
        fields = '__all__'
        model = LinkedContent

    def get_type(self, obj):
        return 'link'

    @staticmethod
    def setup_eager_loading(queryset):
        """ Perform necessary eager loading of data. """
        queryset = queryset.select_related('category', 'department')
        return queryset


class DetailedFileContentSerializer(serializers.HyperlinkedModelSerializer):
    category = CategorySerializer()
    department_query = Department.objects.all()
    department_query = DepartmentSerializer.setup_eager_loading(department_query)
    department = DepartmentSerializer(department_query)
    # department = DepartmentSerializer()
    link_url = serializers.SerializerMethodField()
    type = serializers.SerializerMethodField()

    class Meta:
        fields = '__all__'
        model = FileContent

    def get_link_url(self, obj):
        return obj.file.url

    def get_type(self, obj):
        return obj.file_type

    @staticmethod
    def setup_eager_loading(queryset):
        """ Perform necessary eager loading of data. """
        queryset = queryset.select_related('category', 'department')
        return queryset

以及我对使用这些序列化程序的看法:

class DetailedContentView(views.APIView):
    permission_classes = [IsAuthenticated, ContentCuratorOrReadOnly, IsGroupMember, ]
    def get(self, request, *args, **kwargs):
        context = {"request": request}
        linked_content = LinkedContent.objects.all()
        file_content = FileContent.objects.all()
        # this line is newly added
        linked_content = DetailedLinkedContentSerializer.setup_eager_loading(linked_content)
        # this line too
        file_content = DetailedLinkedContentSerializer.setup_eager_loading(file_content)
        linked_content_serializer = DetailedLinkedContentSerializer(linked_content, many=True, context=context)
        file_content_serializer = DetailedFileContentSerializer(file_content, many=True, context=context)

        response = linked_content_serializer.data + file_content_serializer.data
        response = sorted(response, key=lambda x: (x['department']['visibility_rank'], x['visibility_rank']))

        return Response(response)

但是,我在 Department 序列化程序中预取用户序列化程序的解决方案似乎并没有奏效。具体来说,将我的部门序列化程序更新为:

class DepartmentSerializer(serializers.HyperlinkedModelSerializer):
    curator = UserSerializer()
    class Meta:
        model = Department
        fields = '__all__'

    @staticmethod
    def setup_eager_loading(queryset):
        """ Perform necessary eager loading of data. """
        queryset = queryset.select_related('curator')
        return queryset

以下几行:

department_query = Department.objects.all()
department_query = DepartmentSerializer.setup_eager_loading(department_query)
department = DepartmentSerializer(department_query)

似乎没有按我的意愿预取我的策展人。

【问题讨论】:

  • 您是否尝试过在您的视图中的查询集上使用select_relatedprefetch_related?这些应该预加载数据,避免对每个条目和每个关系的新请求。
  • 我现在正在处理这个问题,我即将更新我的问题。

标签: python django join serialization django-rest-framework


【解决方案1】:

我使用来自 this blog 的评论中的 mixin 更新了我的序列化程序:

class EagerLoadingMixin:
    @classmethod
    def setup_eager_loading(cls, queryset):
        if hasattr(cls, "_SELECT_RELATED_FIELDS"):
            queryset = queryset.select_related(*cls._SELECT_RELATED_FIELDS)
        if hasattr(cls, "_PREFETCH_RELATED_FIELDS"):
            queryset = queryset.prefetch_related(*cls._PREFETCH_RELATED_FIELDS)
        return queryset  

并将'department__curator' 添加到我的预取字段列表中。现在看起来像这样:

class DetailedFileContentSerializer(EagerLoadingMixin, serializers.HyperlinkedModelSerializer):
    category = CategorySerializer()
    department = DepartmentSerializer()
    link_url = serializers.SerializerMethodField()
    type = serializers.SerializerMethodField()

    class Meta:
        fields = '__all__'
        model = FileContent

    def get_link_url(self, obj):
        return obj.file.url

    def get_type(self, obj):
        return obj.file_type

    _SELECT_RELATED_FIELDS = ['department', 'category', 'department__curator']

现在查询大约需要 1/3 的时间,并且不包含数百个 SELECT。它仍然需要很长时间,但我相信我可以通过为我的数据库切换到不同的托管解决方案来解决这个问题。

【讨论】:

    【解决方案2】:

    在你看来:

    def get_queryset(self):
        return (
            super().get_queryset()
            .select_related(relation1, relation2, ...)
            .prefetch_related(relation3, relation4, ...)
        )
    

    就是这样。

    引擎盖下的 DRF:

    nestend_instance_for_serialization = getattr(instance, fk_field_name)

    serialize_nested(instance.one_to_many_relation_field_name.all())

    qs.all() 有一个缓存,它被评估一次,然后被 drf 和其他循环使用。

    【讨论】:

      【解决方案3】:

      我不确定我的解决方案是否也有效,但我尝试使用custom manager 解决此问题,例如someone else commented on the blog post。 (这更像是 Django 解决方案而不是 DRF 解决方案。)请注意,即使您不使用序列化程序(例如 SomeModel.objects.all()),它也会获取相关对象,这可能是您想要的,也可能不是。希望经过同行评审后,这个答案会变得更好。

      假设:

      • 用户 - 列表:一对多

      • 列表 - 项目:多对多

      • 列表 - 喜欢:一对多

      class PreFetchMixin:
          def get_queryset(self):
              queryset = super().get_queryset()
              if hasattr(self, '_SELECT_RELATED_FIELDS'):
                  queryset = queryset.select_related(
                      *self._SELECT_RELATED_FIELDS)
              if hasattr(self, '_PREFETCH_RELATED_FIELDS'):
                  queryset = queryset.prefetch_related(
                      *self._PREFETCH_RELATED_FIELDS)
              if hasattr(self, '_ANNOTATIONS'):
                  queryset = queryset.annotate(**self._ANNOTATIONS)
              return queryset
      
      # PreFetchMixin must come first considering MRO
      from django.db import models
      from django.db.models import Count
      class ListingManager(PreFetchMixin, models.Manager):
          _SELECT_RELATED_FIELDS = ('user',)
          _PREFETCH_RELATED_FIELDS = ('items',)
          _ANNOTATIONS = {'num_likes': Count('like')}
      
      

      并在Listing添加一行:

      class Listing(models.Model):
          ...
          objects = ListingManager()
          ...
      

      如果你想使用 DRF 序列化器,你需要做一些小的改动:

      from rest_framework import serializers
      # assumes that ItemSerializer is defined
      class ListingSerializer(serializers.ModelSerializer):
          items = ItemSerializer(many=True, read_only=True)
          num_likes = serializers.IntegerField(read_only=True)
      
          class Meta:
              model = Listing
              fields = '__all__'
      
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2018-06-05
        • 1970-01-01
        • 2016-12-18
        • 2017-06-04
        • 1970-01-01
        • 2015-08-22
        • 2017-04-14
        • 1970-01-01
        相关资源
        最近更新 更多