【问题标题】:Python3,dictionary from csv file to count frequency of wordsPython3,从csv文件中计算单词频率的字典
【发布时间】:2018-08-24 23:25:48
【问题描述】:

我正在尝试编写一个函数来读取具有不同学位的学生志愿者的 CSV 文件。该函数的目的是创建一个字典,其中键是度数,值是度数的频率。

数据组织如下;

name    degree     email

ABC     PhD.       abd@gmail.com
CDE     Ph.D.      cde@gmail.com
FGH     MD,PHD     fgh@gmail.com

旨在获取字典如下:

#degree_count{'phd':3,'md':1}

def degree_frequency(csv_file):
    f = open('csv_file')
    csv_f = csv.reader(f)
    #Creating a list to store all the degrees from the csv file
    student_degree_list=[]
    #Creating an empty dictionary to count the frequency
    degree_count={}
    for row in csv_f:
        student_degree_list.append(row[1]) 
    #Replacing fullstops to account for variations in writing degrees ( eg JD vs J.D)
    [word.replace(".", "") for word in student_degree_list]
    [word.lower() for word in student_degree_list]
    for ele in student_degree_list:
        if ele in degree_count:
            degree_count[ele]=degree_count[ele]+1
        else:
            degree_count[ele]=0
    return degree_count

【问题讨论】:

  • 那么您发布的代码有什么问题?
  • @learning_python 你会用熊猫吗?
  • @Aran Frey:我正在交互式平台上尝试。它只是说测试用例失败。没有查明问题。所以我不确定代码有什么问题。
  • @Tanmay Jain:我被明确告知不要使用熊猫。
  • @learning_python 哦哦会编辑答案

标签: python python-3.x dictionary word-frequency


【解决方案1】:

我相信您的问题是,除非您将其分配给变量,否则以下代码无效。

[word.replace(".", "") for word in student_degree_list]
[word.lower() for word in student_degree_list]

另外,如果一个度数出现 1 次,它不应该设置为 1 而不是 0 吗?

工作代码:

#degree_count{'phd':3,'md':1}

def degree_frequency():
    f = open('csv_file')
    csv_f = csv.reader(f)
    # Creating a list to store all the degrees from the csv file
    student_degree_list = []
    # Creating an empty dictionary to count the frequency
    degree_count = {}
    for row in csv_f:
        student_degree_list.append(row[1])
    #Replacing fullstops to account for variations in writing degrees ( eg JD vs J.D)
    student_degree_list = [word.replace('.','').lower() for word in student_degree_list]
    for ele in student_degree_list:
        if ele in degree_count:
            degree_count[ele] += 1
        else:
            # Supposed to be 1?
            degree_count[ele]=0
    return degree_count

【讨论】:

  • 好吧,为什么不只使用一个列表推导式?[word.replace(".", "").lower() for word in student_degree_list]
  • 我很困惑。我重用了他的列表理解,但将生成的新列表分配给了 student_degree_list 变量,以便它实际上改变了列表。
  • 什么意思,我基本上合并了两个列表推导
  • 好的,我明白你在说什么。我只使用了一个列表理解。看我的回答...
【解决方案2】:
import csv 
from collections import Counter

columns = defaultdict(list) # each value in each column is appended to a list

with open('csv_file.csv') as f:
    reader = csv.DictReader(f) # read rows into a dictionary format
    for row in reader: # read a row as {column1: value1, column2: value2,...}
        for (k,v) in row.items(): # go over each column name and value 
            columns[k].append(v) # append the value into the appropriate list
                                 # based on column name k

感谢csv reader code

degree_list = columns['degree']
degree_list_clean = []

for cad_degrees in degree_list:
    cad_degrees_lst = cad_degrees.split()
    for degree in cad_degrees_lst:
        degree_clean = degree.strip().replace('.','').lower()
        degree_list_clean.append(degree_clean)

选项 1

output_dict_counter_version = dict(Counter(degree_list_clean))
print(output_dict_counter_version)

选项 2

degree_frequency_dict = {}

for deg in degree_list_clean:
    if deg in degree_frequency_dict:
        degree_frequency_dict[deg] += 1
    else:
        degree_frequency_dict[deg] = 1

print(degree_frequency_dict)    

使用熊猫

import pandas as pd
from collections import Counter

data = pd.read_csv("csv_file.csv")
degree_list = data['degree'].tolist()


degree_list_clean = []

for cad_degrees in degree_list:
    cad_degrees_lst = cad_degrees.split()
    for degree in cad_degrees_lst:
        degree_clean = degree.strip().replace('.','').lower()
        degree_list_clean.append(degree_clean)

print(dict(Counter(degree_list_clean)))



'''
------------------ Input
name,degree,email
ABC,PhD. ,abd@gmail.com
CDE,Ph.D. ,cde@gmail.com
FGH, MD PHD ,fgh@gmail.com

-------------------- Output
{'phd': 3, 'md': 1}
'''

【讨论】:

    猜你喜欢
    • 2022-07-16
    • 2017-11-27
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2017-04-03
    • 2015-06-14
    • 2022-12-19
    • 2011-05-30
    相关资源
    最近更新 更多