【发布时间】:2019-05-09 22:12:51
【问题描述】:
给定列表
l = [
{
"URL": "https://www.nytimes.com/",
"ACTION": "FOLLOW",
"RESULTS": "/PAGES/222437976487981212229928695878437391142.png",
"PAGE-UUID": 2.224379764879812e+38,
"PARENT-UUID": 2.224379764879812e+38
},
{
"URL": "https://www.nytimes.com/es/",
"ACTION": "NEW",
"RESULTS": "/PAGES/138695820074592921946528124297895673746.png",
"PAGE-UUID": 138695820074592921946528124297895673746,
"PARENT-UUID": 2.224379764879812e+38
},
{
"URL": "https://www.nytimes.com/",
"ACTION": "NEW",
"RESULTS": "/PAGES/166947399632926520629187111715759306140.png",
"PAGE-UUID": 166947399632926520629187111715759306140,
"PARENT-UUID": 2.436661515947743e+38
},
{
"URL": "https://www.nytimes.com/subscriptions/Multiproduct/lp3L3W6.html?campaignId=6W74R",
"ACTION": "NEW",
"RESULTS": "/PAGES/299203350572384506529421004856026300297.png",
"PAGE-UUID": 299203350572384506529421004856026300297,
"PARENT-UUID": 2.436661515947743e+38
}
]
我希望能够检查具有相同“URL”的条目,如果发现重复,将重复的字段“ACTION”更新为值“DUPLICATE”而不是新的。在这种情况下,URL“https://www.nytimes.com/”会出现多次。 预期的输出是这样的:
l = [
{
"URL": "https://www.nytimes.com/",
"ACTION": "FOLLOW",
"RESULTS": "/PAGES/222437976487981212229928695878437391142.png",
"PAGE-UUID": 2.224379764879812e+38,
"PARENT-UUID": 2.224379764879812e+38
},
{
"URL": "https://www.nytimes.com/es/",
"ACTION": "NEW",
"RESULTS": "/PAGES/138695820074592921946528124297895673746.png",
"PAGE-UUID": 138695820074592921946528124297895673746,
"PARENT-UUID": 2.224379764879812e+38
},
{
"URL": "https://www.nytimes.com/",
"ACTION": "DUPLICATE",
"RESULTS": "/PAGES/166947399632926520629187111715759306140.png",
"PAGE-UUID": 166947399632926520629187111715759306140,
"PARENT-UUID": 2.436661515947743e+38
},
{
"URL": "https://www.nytimes.com/subscriptions/Multiproduct/lp3L3W6.html?campaignId=6W74R",
"ACTION": "NEW",
"RESULTS": "/PAGES/299203350572384506529421004856026300297.png",
"PAGE-UUID": 299203350572384506529421004856026300297,
"PARENT-UUID": 2.436661515947743e+38
}
]
我尝试使用 set,但不太明白我该从哪里开始:
seen = set()
new_l = []
for d in l:
t = tuple(d.items())
print "This is t", t
if t not in seen:
seen.add(t)
new_l.append(d)
【问题讨论】:
标签: python dictionary hashmap duplicates