第一种:vocab = dict(Counter(text).most_common(MAX_VOCAB_SIZE-1))

举例:

from collections import Counter 

colors = ['red', 'blue', 'red', 'green', 'blue', 'blue']

c = Counter(colors)

print (dict(c))

most_common:取top-k的数据

第二种:

def generate_vocab_file(input_seg_file, output_vocab_file):
  with open(input_seg_file, 'r',encoding='UTF-8') as f:
  lines = f.readlines()
  word_dict = {}
  for line in lines:
  label, content = line.strip('\r\n').split('\t')
  for word in content.split():
  word_dict.setdefault(word, 0)
  word_dict[word] += 1
  # [(word, frequency), ..., ()]
  sorted_word_dict = sorted(
  word_dict.items(), key = lambda d:d[1], reverse=True)
  with open(output_vocab_file, 'w',encoding='UTF-8') as f:
  f.write('<UNK>\t10000000\n')
  for item in sorted_word_dict:
  f.write('%s\t%d\n' % (item[0], item[1]))

类似实现:

colors = ['red', 'blue', 'red', 'green', 'blue', 'blue']

result = {}

for color in colors:

  if result.get(color)==None:

     result[color]=1

  else:

    result[color]+=1

print (result) #{'red': 2, 'blue': 3, 'green': 1}

相关文章:

  • 2022-12-23
  • 2022-12-23
  • 2021-12-19
  • 2022-01-29
  • 2021-12-04
  • 2022-02-22
  • 2022-12-23
  • 2022-12-23
猜你喜欢
  • 2021-09-19
  • 2022-01-20
  • 2022-12-23
  • 2023-03-12
  • 2021-09-15
  • 2022-02-06
  • 2023-03-18
相关资源
相似解决方案