【问题标题】:removing word interchanged elements of list in python在python中删除列表的单词互换元素
【发布时间】:2019-05-05 13:27:21
【问题描述】:

我有一个列表,其中包含已互换的重复值。例如

dataList=["john is student", "student is john", "john student is", "john is student", "alica is student", "good weather", "weather good"]

我想删除所有这些重复值,如图所示:

预期输出:

dataList=["john is student","john is student", "john is student","john is student","alica is student", "good weather", "good weather"]

我尝试使用的代码是:

for i in dataList:
    first=(i.split()[0]) +  i.split()[1] + i.split()[2]) in studentList
    ........

我陷入了形成逻辑的困境。我可以知道如何获得所需的结果

【问题讨论】:

  • 您的问题不完整! ..在"good weather""weather good" 之间保留哪个?我猜第一个一个?
  • @GrijeshChauhan:我已经在预期的输出中提到,“好天气”是我想要的。感谢您的解决方案,它有帮助。我对我的问题做了一个小更新,我想保留重复的值,我不想删除它们。给您带来的不便,我深表歉意。

标签: python regex python-3.x list data-extraction


【解决方案1】:

如果您认为第一个匹配项是您在最终列表中需要的正确匹配项,那么您可以尝试以下操作:

dataList= ["john is student", 
           "student is john", 
           "john student is", 
           "alica is student", 
           "good weather", 
           "weather good",
          ]

data = {}
for words in dataList:
    data.setdefault(frozenset(words.split()), words)

dataList = data.values() 
 # dataList is you need

编辑

自从我上次回答以来,问题已经更新,要求保留重复值。

[回答]

dataList= ["john is student", 
           "student is john", 
           "john student is",
           "alica is student",
           "good weather", 
           "weather good",
          ]

class WordFrequence:
    def __init__(self, word, frequence=1):
        self.word = word
        self.frequence = frequence

    def as_list(self):
        return [self.word] * self.frequence

    def __repr__(self):
        return "{}({}, {})".format(self.__class__.__name__, self.word, self.frequence)    

counter = {} 
for words in dataList:
    key = frozenset(words.split())
    if key in counter:
        counter[key].frequence += 1
    else:
        counter[key] = WordFrequence(words)

dataList = [] # this is what you need
for wf in counter.values():
    dataList.extend(wf.as_list())

对于长输入dataList,您可以通过将WordFrequence 替换为recordclass 来改进我的代码

【讨论】:

  • 非常感谢您提供的代码。它有帮助。由于我更新了问题中的要求,因此新代码将是: data = {} new=[] for words in dataList: k=data.setdefault(frozenset(words.split()), words) new.append( k)打印(新)你能更新你的答案吗?谢谢!
  • @Cathy 又添加了一个答案,该答案也会阻止订单。检查下面的帖子
【解决方案2】:

@Grijesh 已经给出了一个非常干净的解决方案,只是重新迭代他的代码 -

dataList=["john is student", "student is john", "john student is", 
          "alica is student", "good weather", "weather good"]

final_data = {} 
for i in dataList:
    data[" ".join(sorted(set(i.split())))] = i

输出

>>>list(final_data.values())
   ['john student is', 'alica is student', 'weather good']

在上面,我们从句子中提取单词,然后我们创建了一个唯一的单词集并对其进行排序以捕获句子中的唯一实例。

现在我们用它制作了一个字典,我们知道字典只能保存唯一的键,所以它只会保留唯一的集合(我们最终通过连接得到了一个字符串)

【讨论】:

    【解决方案3】:

    您可以创建一个字典 seen 存储 frozenset 的单词,每个元素第一次出现单词。您可以先签入seen dict 并使用{}.setdefault( ) 设置或获取旧值。

    dataList= ["john is student", 
               "student is john", 
               "john student is",
               "alica is student",
               "good weather", 
               "weather good",
              ]
    
    seen = {}
    data = []
    for words in dataList:
        key = frozenset(words.split())
        words = seen.setdefault(key, words)
        data.append(words)
    

    输出:

    >>> data
    ['john is student',
     'john is student',
     'john is student',
     'alica is student',
     'good weather',
     'good weather']
    

    【讨论】:

      【解决方案4】:

      考虑到第一次出现是正确的。

      dataList= ["john is student", 
                 "student is john", 
                 "john student is", 
                 "alica is student", 
                 "good weather", 
                 "weather good",
                ]
      
      filterdData = {}
      for statement in dataList:
          filterdData.setdefault(''.join(sorted(statement)), statement)
      
      dataList = filterdData.values() 
      print(dataList)
      

      您还可以使用迭代包装语法检查库,以仅接受英语的正确形式。

      【讨论】:

        猜你喜欢
        • 2021-10-27
        • 2023-03-21
        • 2021-07-20
        • 1970-01-01
        • 2019-07-21
        • 1970-01-01
        • 2015-02-11
        • 1970-01-01
        相关资源
        最近更新 更多