【发布时间】:2019-09-17 16:52:14
【问题描述】:
我想计算一个特定主题在很长的单词列表中出现的次数。目前,我有一个字典,其中外键是主题,内键是该主题的关键字。
我正在尝试有效地计算关键字出现次数并保持其对应主题出现次数的累积总和。
最终,我想保存多个文本的输出。这是我目前实施的一个例子。我遇到的问题是它非常慢,并且它不会将关键字计数存储在输出 DataFrame 中。是否有解决这些问题的替代方案?
import pandas as pd
topics = {
"mathematics": {
"analysis": 0,
"algebra": 0,
"logic": 0
},
"philosophy": {
"ethics": 0,
"metaphysics": 0,
"epistemology": 0
}
}
texts = {
"text_a": [
"the", "major", "areas", "of", "study", "in", "mathematics", "are",
"analysis", "algebra", "and", "logic", "in", "philosophy", "they",
"are", "ethics", "metaphysics", "and", "epistemology"
],
"text_b": [
"logic", "is", "studied", "both", "in", "mathematics", "and",
"philosophy"
]
}
topics_by_text = pd.DataFrame()
for title, text in texts.items():
topic_count = {}
for topic, sub_dict in topics.items():
curr_topic_counter = 0
for keyword, count in sub_dict.items():
keyword_occurrences = text.count(keyword)
topics[topic][keyword] = keyword_occurrences
curr_topic_counter += keyword_occurrences
topic_count[topic] = curr_topic_counter
topics_by_text[title] = pd.Series(topic_count)
print(topics_by_text)
【问题讨论】:
标签: python performance loops dictionary counter