【问题标题】:Dedeplicating JSON objects based on 1st key:value pair基于第一个键值对去删除 JSON 对象
【发布时间】:2022-01-20 15:09:07
【问题描述】:

这是输出格式,基于“CVE_data_meta”,我需要对匹配的 ID 进行重复数据删除。

#pull references
for ref in item["cve"]["references"]["reference_data"]:
    references = ref["url"]
    cleanData.append({"CVE_data_meta": cve_data_meta_id,
                     "description": description,
                     "baseScore": baseScore,
                     "vectorSring": vectorString,
                     "cweID": cweValue,
                     "cweID URL": ("https://cwe.mitre.org/data/definitions/"
                                    + str(cweValue) + ".html"),
                     "references": references,
                     "publishedDate": pub_date,
                     "lastModifiedDate": last_mod_date
                     })

这是我从 API 的清理响应中提取数据并输出到 JSON 文件的迭代:

# # ==========================================================================================
# # narrow response with additional 'keywords'
# # ==========================================================================================
myResults = open("2-cleanData.json", "r")
scope = json.load(myResults)
output_json=[]
results = []
for k in keywords:
    counter = 0
    items = [x for x in scope if k in x['description']]
    for item in items:
        output_json.append(item)
        counter += 1
    results.append(counter)
with open("3-Final CVEs.json", "w+") as outFile2:
    outFile2.write(json.dumps(output_json, indent=2,))

keywords 变量可由用户更改;但希望任何能够添加关键字并且不会在输出文件中获得重复条目。

完整代码here.

示例输出:(3 个 CVE 条目)

{
  "CVE_data_meta": "CVE-2021-0924",
  "description": "In xhci_vendor_get_ops of xhci.c, there is a possible out of bounds read due to a missing bounds check. This could lead to local escalation of privilege with no additional execution privileges needed. User interaction is not needed for exploitation.Product: AndroidVersions: Android kernelAndroid ID: A-194461020References: Upstream kernel",
  "baseScore": 7.8,
  "vectorSring": "CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H",
  "cweID": "CWE-125",
  "cweID URL": "https://cwe.mitre.org/data/definitions/CWE-125.html",
  "references": "https://source.android.com/security/bulletin/2021-11-01",
  "publishedDate": "2021-12-15T19:15Z",
  "lastModifiedDate": "2021-12-17T18:12Z"
},
{
  "CVE_data_meta": "CVE-2021-0981",
  "description": "In enqueueNotificationInternal of NotificationManagerService.java, there is a possible way to run a foreground service without showing a notification due to improper input validation. This could lead to local escalation of privilege with no additional execution privileges needed. User interaction is not needed for exploitation.Product: AndroidVersions: Android-12Android ID: A-191981182",
  "baseScore": 7.8,
  "vectorSring": "CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H",
  "cweID": "CWE-269",
  "cweID URL": "https://cwe.mitre.org/data/definitions/CWE-269.html",
  "references": "https://source.android.com/security/bulletin/pixel/2021-12-01",
  "publishedDate": "2021-12-15T19:15Z",
  "lastModifiedDate": "2021-12-17T18:09Z"

...several entries later...

  "CVE_data_meta": "CVE-2021-0924",
  "description": "In xhci_vendor_get_ops of xhci.c, there is a possible out of bounds read due to a missing bounds check. This could lead to local escalation of privilege with no additional execution privileges needed. User interaction is not needed for exploitation.Product: AndroidVersions: Android kernelAndroid ID: A-194461020References: Upstream kernel",
  "baseScore": 7.8,
  "vectorSring": "CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H",
  "cweID": "CWE-125",
  "cweID URL": "https://cwe.mitre.org/data/definitions/CWE-125.html",
  "references": "https://source.android.com/security/bulletin/2021-11-01",
  "publishedDate": "2021-12-15T19:15Z",
  "lastModifiedDate": "2021-12-17T18:12Z"
},

现在,只需要删除重复项...

【问题讨论】:

  • 只需要根据'CVE_data_meta'条目的值去重吗?
  • 是的!并且具有重复 CVE 的相应值也不会被追加或仅在追加后重复数据删除。因此,重复数据删除可以在迭代期间完成,或者创建另一个循环附加到对 outFile 进行重复数据删除。
  • 好的,这让事情变得相对容易。请参阅我发布的答案。

标签: python json object duplicates


【解决方案1】:

您可以使用set 轻松地对结果进行重复数据删除,以跟踪已经看到的'CVE_data_meta' 条目并跳过已经看到的条目,如下所示。 set 会员测试非常快,所以会很快。

使用有限的测试数据进行测试:

myResults = [
 {'CVE_data_meta': 'CVE-2021-0924',
  'description': 'In xhci_vendor_get_ops of xhci.c, there is a possible out of '
                 'bounds read due to a missing bounds check. This could lead '
                 'to local escalation of privilege with no additional '
                 'execution privileges needed. User interaction is not needed '
                 'for exploitation.Product: AndroidVersions: Android '
                 'kernelAndroid ID: A-194461020References: Upstream kernel',
  'baseScore': 7.8,
  'vectorSring': 'CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H',
  'cweID': 'CWE-125',
  'cweID URL': 'https://cwe.mitre.org/data/definitions/CWE-125.html',
  'references': 'https://source.android.com/security/bulletin/2021-11-01',
  'publishedDate': '2021-12-15T19:15Z',
  'lastModifiedDate': '2021-12-17T18:12Z'},
 {'CVE_data_meta': 'CVE-2021-0981',
  'description': 'In enqueueNotificationInternal of '
                 'NotificationManagerService.java, there is a possible way to '
                 'run a foreground service without showing a notification due '
                 'to improper input validation. This could lead to local '
                 'escalation of privilege with no additional execution '
                 'privileges needed. User interaction is not needed for '
                 'exploitation.Product: AndroidVersions: Android-12Android ID: '
                 'A-191981182',
  'baseScore': 7.8,
  'vectorSring': 'CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H',
  'cweID': 'CWE-269',
  'cweID URL': 'https://cwe.mitre.org/data/definitions/CWE-269.html',
  'references': 'https://source.android.com/security/bulletin/pixel/2021-12-01',
  'publishedDate': '2021-12-15T19:15Z',
  'lastModifiedDate': '2021-12-17T18:09Z'},
 {'CVE_data_meta': 'CVE-2021-0924',
  'description': 'In xhci_vendor_get_ops of xhci.c, there is a possible out of '
                 'bounds read due to a missing bounds check. This could lead '
                 'to local escalation of privilege with no additional '
                 'execution privileges needed. User interaction is not needed '
                 'for exploitation.Product: AndroidVersions: Android '
                 'kernelAndroid ID: A-194461020References: Upstream kernel',
  'baseScore': 7.8,
  'vectorSring': 'CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H',
  'cweID': 'CWE-125',
  'cweID URL': 'https://cwe.mitre.org/data/definitions/CWE-125.html',
  'references': 'https://source.android.com/security/bulletin/2021-11-01',
  'publishedDate': '2021-12-15T19:15Z',
  'lastModifiedDate': '2021-12-17T18:12Z'}
]

代码:

from pprint import pprint

# Deduplicate results
cleaned = []
seen = set()
for obj in myResults:
    key = obj['CVE_data_meta']
    if key not in seen:
        cleaned.append(obj)
        seen.add(key)

pprint(cleaned)

【讨论】:

  • 是的!如果您想查看最终产品,我会将更新推送到主脚本,但效果很好。
  • 很高兴听到 - 我打算建议更新您的问题。
【解决方案2】:

在查看您的代码后,我相信您可以这样做来避免 重复的字典:

results = []
cve_ids = []
for k in keywords:
    counter = 0
    items = [x for x in scope if k in x['description']]
    for item in items if item['cweID'] not in cwe_ids:
        output_json.append(item)
        cwe_ids.append(item['cweID'])
        counter += 1

【讨论】:

  • 感谢回复,基本上脚本会拉取几十或几百个CVE条目:66个 发现总数:54个提权漏洞4个远程代码执行漏洞1个任意代码执行漏洞7个拒绝服务漏洞但是有时可能有两个“关键字”值在同一个描述中,所以当它迭代时,它会附加相同的条目两次;每个关键字一次。所以,我可以试试你的建议,但不确定它是否有效。输出很多,让我在单独的评论中告诉你
  • 代码太长,无法在评论中发布,即使是相关的 sn-p。按照问题中代码的链接可能会更好,这样您就可以看到它在做什么,或者运行它以获得一个想法。如果其他人抱怨这个问题没有意义,我将完全重新组织这个问题。 @ultramundane
  • @worthingtontech 我查看了代码并提出了一个与我最初想象的不同的解决方案。我认为这是保持它们独特性的一种方式。
  • 我认为你对我需要的东西很感兴趣。我还认为您可能会混淆“cve”和“cwe”(它们是不同的东西,这很棘手)。但我喜欢你的“如果,不是”的想法。我需要将其应用于,即:“CVE_data_meta”:“CVE-2021-0924”。然后如果它已经被迭代或存在于outFile中,那么不要追加重复出现的实例,包括关联的值:“description”、baseScore、vectorSring、cweID、cweID URL、references、publishedDate、lastModifiedDate
  • @worthingtontech 在这种情况下,将 item['cweID'] 替换为 item['CVE_data_meta'] 应该可以解决问题。
猜你喜欢
  • 2021-07-08
  • 2014-09-06
  • 1970-01-01
  • 1970-01-01
  • 2021-10-25
  • 2018-03-17
  • 1970-01-01
  • 1970-01-01
  • 2021-05-30
相关资源
最近更新 更多