【问题标题】:How to handle manytomany field with Scrapy如何使用 Scrapy 处理多线程字段
【发布时间】:2016-09-10 05:11:51
【问题描述】:

我想在 Django 中使用Scrapy

我的目标是将actors 字段链接到name 字段,但我不知道如何处理Django manytomany。我的数据库是 MySQL(我没有使用 djangoItem)。

models.py

class Movies(models.Model):
    content_ID = models.CharField(max_length=30)
    release_date = models.CharField(max_length=30)
    running_time = models.CharField(max_length=10)
    actors = models.CharField(max_length=300)
    series = models.CharField(max_length=30)
    director = models.CharField(max_length=30)
    label = models.CharField(max_length=30)
    image_urls = models.CharField(max_length=200, null=True)
    images = models.TextField(null=True)
    image_paths = models.TextField(null=True)

    def __str__(self):
        return self.content_ID

class Actors(models.Model):
    names = models.CharField(max_length=100, null=True)
    movielist = models.ManyToManyField(EnMovielist)
    image_urls = models.CharField(max_length=200)
    images = models.TextField(null=True)
    image_paths = models.TextField(null=True)

    def __str__(self):
        return self.name

【问题讨论】:

  • 将演员字段链接到名称字段?你是说外键?
  • 我的意思是多对多!!我已经编辑了我的 models.py。谢谢
  • 您的意思是要访问多对多字段中的数据?
  • Scrapy 是从哪里来的?您能否详细说明您面临的问题?

标签: python django scrapy manytomanyfield


【解决方案1】:

https://github.com/DevProfi/scrapy-djangoitem 为了处理scrapy,我使用管道

class ItemPersistencePipeline(object):
    def process_item(self, item, spider, partial=True):
        try:
            item_model = item_to_model(item)
        except TypeError:
            return item
        model, created = get_or_create(item_model, spider.unique_fields)

        # Если объект модели не создана значит она уже есть и нужно обновить ее
        if not created:
            try:
                update_model(destination=model, source=item_model, item=item, fields=spider.unique_fields, partial=partial)
            except Exception as e:
                return e

        # Объект модели создан, нужно создать m2m объекты для нее если существуют
        else:
            item_fields_m2m = sorted(item._model_fields_m2m)
            for f in item_fields_m2m:
                val = item.get(f)
                if val:
                    getattr(model, f).set(val)
        #             TODO add bulk insert model fields
        # model.related_set.set(new_list)
        return item


def update_model(destination, source, item, fields, partial, commit=False):
    # partial включено ли частичное обновление
    # commit испольщзуется для анализа изменился ли объект чтобы зря не сохранять его в базу
    pk = destination.pk
    opts = source._meta
    fields_m2m = sorted(opts.many_to_many)
    field_names_m2m = [f.name for f in fields_m2m]
    source_fields = fields_for_model(source, exclude=field_names_m2m)

    for key in source_fields.keys():
        # if key != 'name':
        val_old = getattr(destination, key)
        t = type(val_old)
        try:
             val_new = (getattr(source, key))
        except ObjectDoesNotExist:
            continue
        if partial:
            if val_new:
                if val_new != val_old:
                    setattr(destination, key, val_new)
                    commit = True
        else:
            commit = True
            setattr(destination, key, val_new)
    if not pk:
        setattr(destination, 'pk', pk)

    if commit:
        destination.save()

    # TODO fix for update m2m fields with list
    item_fields_m2m = sorted(item._model_fields_m2m)

    for f in item_fields_m2m:
        val_new = item.get(f)
        val_old = list(getattr(destination, f).all())
        if val_new and (val_new not in val_old):
            getattr(destination, f).add(val_new)

    return destination

【讨论】:

    猜你喜欢
    • 2019-09-14
    • 1970-01-01
    • 1970-01-01
    • 2015-05-29
    • 2019-02-27
    • 2021-09-26
    • 2019-11-27
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多