【问题标题】:how to get list value and count in python如何在python中获取列表值和计数
【发布时间】:2018-01-28 02:39:57
【问题描述】:

我正在尝试计算列表中的每个单词。这样我就可以删除具有更大计数值的单词。但是我得到的输出不正确。 假设如果我的文件中有这些行“这是最好的时代,也是最糟糕的时代。这是智慧的时代,这是愚蠢的时代”。我的代码正在做什么打印(是,4)和再次某处(是,3)等等。每次出现单词时,它都会打印该单词,但计数值不同。我需要对每个单词进行一次计数。

for file in files:  
    print(file)
    f=open(file, 'r')
    content = f.read() 
    wordlist = content.split()
    #print(wordlist)
    wordfreq = [wordlist.count(w) for w in wordlist] # a list comprehension
    print("List\n" + str(wordlist) + "\n")
    print("Frequencies\n" + str(wordfreq) + "\n")
    test = [i for i in wordfreq if i > 100]
    print("result\n"+str(list(zip(test,wordlist))))

【问题讨论】:

    标签: python list arraylist stop-words


    【解决方案1】:

    你可以像这样使用Counter

    >>> from collections import Counter
    >>>
    >>> s = "it was the best of times it was the worst of times .it was the age of wisdom it was the age of foolishness"
    >>>
    >>> d = Counter(s.split())
    >>> for k,v in d.items():
    ...     print '{} -> {}'.format(k,v)
    ...
    of -> 4
    age -> 2
    it -> 3
    foolishness -> 1
    times -> 2
    worst -> 1
    .it -> 1
    the -> 4
    wisdom -> 1
    was -> 4
    best -> 1
    >>>
    

    如果你不想使用collections.Counter,你可以使用这样的字典:

    >>> s = "it was the best of times it was the worst of times .it was the age of wisdom it was the age of foolishness"
    >>> d = {}
    >>> for word in s.split():
    ...     try:
    ...         d[word] += 1
    ...     except KeyError:
    ...         d[word] = 1
    ...
    >>> d
    {'of': 4, 'age': 2, 'it': 3, 'foolishness': 1, 'times': 2, 'worst': 1, '.it': 1, 'the': 4, 'wisdom': 1, 'was': 4, 'best': 1}
    

    【讨论】:

      【解决方案2】:

      没有计数器的解决方案:

      new = s.split(' ')
      m=list()
      for i in new:
       m.append((i , new.count(i)))
      for i in set(m):
          print i
      del m[:] # deleting list for using it again
      

      输出:

      ('best', 1)  
      ('was', 4)   
      ('times', 2)  
      ('it', 3)  
      ('worst', 1)  
      ('.it', 1)  
      ('wisdom', 1)  
      ('foolishness', 1)  
      ('the', 4)     
      ('of', 4) 
      ('age', 2)
      
      another test : 
       s = 'was was it was hello it was'
      output :  
      ('hello', 1)  
      ('was', 4)  
      ('it', 2)  
      

      如果您将数据保存到文件中,请使用:

      s=""
      
      with open('your-file-name', 'r') as r:
       s+=r.read().replace('\n', '') #reading multi lines
      
      new = s.split(' ')
      m=list()
      for i in new:
       m.append((i , new.count(i)))
      for i in set(m):
          print i
      del m[:] # deleting list for using it ag
      

      【讨论】:

      • @user3778289 如果您不想使用 (Couner) 模块,您可以简单地使用此代码
      • 谢谢你,这很好。但它仍然多次给我这个词。就像 (was,4),(times,4),again (was,4)
      • @user3778289 但在我的输出中(was=4)重复一个(Set)应该删除重复的请复制并粘贴我的代码并再次测试
      • 它给了我重复的输出。就像我有一个大的输入文件。这句话只是一个例子
      • 也许因为我使用了 (m.append()) 它让你重复,因为每次你测试你的程序数据都会附加到 (m) 你应该清空列表并重试
      【解决方案3】:
      from collections import Counter
      
      for file in files:
          words = open(file).read().split()
          frequencies = Counter(words)
      

      【讨论】:

        【解决方案4】:

        您可以从collections 使用Counter

        from collections import Counter
        import itertools
        
        for file in files:
        
            data = itertools.chain.from_iterable([i.strip('\n').split() for i in open(file)])
        
            the_counts = Counter(data)
        
            print("wordlist: {}".format(data))
            print("frequencies: {}".format(dict(the_count))
            test = [(a, b) for a, b in the_count.items() if b > 100]
        

        【讨论】:

        • 这给了我一个错误。test = [(a, b) for a, b in dict(the_count).items() if b > 10] SyntaxError: invalid syntax
        • @user3778289 再试一次,让我知道会发生什么。
        • test = [(a, b) for a, b in the_count.items() if b > 10] ^ SyntaxError: invalid syntax error is still there
        【解决方案5】:
        import pandas as pd
        a = pd.Series(txt.split()).value_counts().reset_index().rename(columns={0:"counts","index":"word"})
        a[a.counts<100]
        

        【讨论】:

          猜你喜欢
          • 2020-08-31
          • 2017-10-12
          • 1970-01-01
          • 2019-05-01
          • 1970-01-01
          • 1970-01-01
          • 2020-12-31
          • 1970-01-01
          • 2019-02-05
          相关资源
          最近更新 更多