【问题标题】:Remove duplicates from List of dynamic objects从动态对象列表中删除重复项
【发布时间】:2022-01-08 21:01:17
【问题描述】:

目标:从同一个最深的子列表中删除重复项。保留其他人。

列表包含多个:dict -> dict -> list

但是,不同的子列表可能包含与不同子列表完全相同的句子。这些需要保留。

set() 似乎很理想,但我希望将其应用于最深的子列表。不在my_list 对象上。这种结构可能会发生变化,并在不同的运行中具有更深的dictslists


代码:

我对此有很多变体,但实际上my_list 可以有任何结构。

如果结构可能不同,我想要的可能吗?

my_list =  # ...

for ele in my_list:
    if isinstance(ele, list):
      ele = list(set(ele))
    elif: isinstance(ele, dict):
      

my_list:

例如1st PDF -> ECON -> awards1st PDF -> ECON -> security 包含相同的重复项。

[
    {
        "../data/gri/reports/GPIC_Sustainability_Report_2020__-_40_Years_of_Sustainable_Success.pdf": {
            "COMP": {
                "Behaviour": [
                    "we focus apply measures four elements safety culture systems processes skills knowledge individuals behaviours attitudes perception leadership"
                ]
            },
            "ECON": {
                "subsidies": [
                    "meanwhile main recent regulatory impact business significant phasing subsidies gas electricity prices expected continue next years well nationwide strategy allocates natural gas conservatively"
                ],
                "awards": [
                    "ensure robust security 100 readiness times participate international awards rospa bsc awards",
                    "ensure robust security 100 readiness times participate international awards rospa bsc awards"
                ],
                "security": [
                    "ensure robust security 100 readiness times participate international awards rospa bsc awards",
                    "ensure robust security 100 readiness times participate international awards rospa bsc awards"
                ]
            }
        }
    },
    {
        "../data/gri/reports/GRI_2018_Report.pdf": {
            "COMP": {
...

所需列表:

[
    {
        "../data/gri/reports/GPIC_Sustainability_Report_2020__-_40_Years_of_Sustainable_Success.pdf": {
            "COMP": {
                "Behaviour": [
                    "we focus apply measures four elements safety culture systems processes skills knowledge individuals behaviours attitudes perception leadership"
                ]
            },
            "ECON": {
                "subsidies": [
                    "meanwhile main recent regulatory impact business significant phasing subsidies gas electricity prices expected continue next years well nationwide strategy allocates natural gas conservatively"
                ],
                "awards": [
                    "ensure robust security 100 readiness times participate international awards rospa bsc awards"
                ],
                "security": [
                    "ensure robust security 100 readiness times participate international awards rospa bsc awards"
                ]
            }
        }
    },
    {
        "../data/gri/reports/GRI_2018_Report.pdf": {
            "COMP": {
...

如果我需要澄清其他任何事情,请告诉我。

【问题讨论】:

    标签: python list for-loop


    【解决方案1】:

    所以,听起来您唯一关心的重复项是当您拥有一个字符串列表时。我们可以做一些假设:

    • 只有 JSON(列表、字典、字符串和原语)
    • 如果我们未能散列一个对象,那么它就不可能是重复的
    • 去重列表的顺序无关紧要

    所以让我们使用递归。

    def dedup(obj):
        if isinstance(obj, list):
            try:
                # We try to dedupe as if everything is hashable,
                # but this will fail for a list of dicts, so fallback
                # in that case.
                return list({dedup(x) for x in obj})
            except TypeError:
                return [dedup(x) for x in obj]
        elif isinstance(obj, dict):
            return {k: dedup(v) for k, v in obj.items()}
        else:
            # this is some kind of primitive (str/int/float/bool/None)
            return obj
    

    【讨论】:

    • 这些假设是正确的。我会试试这个并报告回来。谢谢
    • 我的编辑必须至少有 6 个字符。非常感谢:)
    猜你喜欢
    • 2017-06-23
    • 2021-12-11
    • 2020-06-04
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2022-01-04
    • 1970-01-01
    相关资源
    最近更新 更多