【问题标题】:Update duplicate fields based on key in dictionary of a list根据列表字典中的键更新重复字段
【发布时间】:2019-05-09 22:12:51
【问题描述】:

给定列表

l = [
    {
        "URL": "https://www.nytimes.com/",
        "ACTION": "FOLLOW",
        "RESULTS": "/PAGES/222437976487981212229928695878437391142.png",
        "PAGE-UUID": 2.224379764879812e+38,
        "PARENT-UUID": 2.224379764879812e+38
    },
    {
        "URL": "https://www.nytimes.com/es/",
        "ACTION": "NEW",
        "RESULTS": "/PAGES/138695820074592921946528124297895673746.png",
        "PAGE-UUID": 138695820074592921946528124297895673746,
        "PARENT-UUID": 2.224379764879812e+38
    },
    {
        "URL": "https://www.nytimes.com/",
        "ACTION": "NEW",
        "RESULTS": "/PAGES/166947399632926520629187111715759306140.png",
        "PAGE-UUID": 166947399632926520629187111715759306140,
        "PARENT-UUID": 2.436661515947743e+38
    },
    {
        "URL": "https://www.nytimes.com/subscriptions/Multiproduct/lp3L3W6.html?campaignId=6W74R",
        "ACTION": "NEW",
        "RESULTS": "/PAGES/299203350572384506529421004856026300297.png",
        "PAGE-UUID": 299203350572384506529421004856026300297,
        "PARENT-UUID": 2.436661515947743e+38
    }
]

我希望能够检查具有相同“URL”的条目,如果发现重复,将重复的字段“ACTION”更新为值“DUPLICATE”而不是新的。在这种情况下,URL“https://www.nytimes.com/”会出现多次。 预期的输出是这样的:

l = [
        {
            "URL": "https://www.nytimes.com/",
            "ACTION": "FOLLOW",
            "RESULTS": "/PAGES/222437976487981212229928695878437391142.png",
            "PAGE-UUID": 2.224379764879812e+38,
            "PARENT-UUID": 2.224379764879812e+38
        },
        {
            "URL": "https://www.nytimes.com/es/",
            "ACTION": "NEW",
            "RESULTS": "/PAGES/138695820074592921946528124297895673746.png",
            "PAGE-UUID": 138695820074592921946528124297895673746,
            "PARENT-UUID": 2.224379764879812e+38
        },
        {
            "URL": "https://www.nytimes.com/",
            "ACTION": "DUPLICATE",
            "RESULTS": "/PAGES/166947399632926520629187111715759306140.png",
            "PAGE-UUID": 166947399632926520629187111715759306140,
            "PARENT-UUID": 2.436661515947743e+38
        },
        {
            "URL": "https://www.nytimes.com/subscriptions/Multiproduct/lp3L3W6.html?campaignId=6W74R",
            "ACTION": "NEW",
            "RESULTS": "/PAGES/299203350572384506529421004856026300297.png",
            "PAGE-UUID": 299203350572384506529421004856026300297,
            "PARENT-UUID": 2.436661515947743e+38
        }
    ]

我尝试使用 set,但不太明白我该从哪里开始:

    seen = set()
    new_l = []
    for d in l:
        t = tuple(d.items())
        print "This is t", t
        if t not in seen:
            seen.add(t)
            new_l.append(d)

【问题讨论】:

    标签: python dictionary hashmap duplicates


    【解决方案1】:

    你很接近!

    urls = set()
    for d in l:
        if d['URL'] in urls: # A duplicate!
            d['ACTION'] = 'DUPLICATE'
        else: # Not seen before
            ulrs.add(d['URL'])
    

    【讨论】:

      猜你喜欢
      • 2020-09-11
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2014-10-22
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多