【发布时间】:2019-06-15 08:53:48
【问题描述】:
我是 Python 编程新手。我想获取此 Wikipedia 数据集 (people_wiki.csv) 中每个单词的字数。我能够获取每个单词,并且它作为字典出现,但我无法将字典键值对拆分为单独的列。我尝试了几种方法(from_dict、from_records、to_frame、pivot_table 等)这在 python 中是否可行。我将不胜感激。
样本数据集:
URI name text
<http://dbpedia.org/resource/George_Clooney> George Clooney 'george timothy clooney born may 6 1961 is an american actor writer producer director and activist he has received three golden globe awards for his work as an actor and two academy awards one for acting and the other for producingclooney made his...'
我试过了:
clooney_word_count_table = pd.DataFrame.from_dict(clooney['word_count'], orient='index', columns=['word','count']
我也试过了:
clooney['word_count'].to_frame()
这是我的代码:
people = pd.read_csv("people_wiki.csv")
clooney = people[people['name'] == 'George Clooney']
from collections import Counter
clooney['word_count']= clooney['text'].apply(lambda x: Counter(x.split(' ')))
clooney_word_count_table = pd.DataFrame.from_dict(clooney['word_count'], orient='index', columns=['word','count']
clooney _word_count_table
输出:
word_count
35817 {'george': 1, 'timothy': 1, 'clooney': 9, 'ii': ...
我希望从 clooney_word_count_table 中获得一个包含 2 列的输出数据框:
word count
normalize 1
george 3
combat 1
producer 2
【问题讨论】:
标签: python dictionary dataframe word-count