【发布时间】:2021-05-28 14:43:55
【问题描述】:
我有一个包含大量评论的数据框,一个包含名词词的大列表 (1000) 和另一个包含动词/形容词的大列表 (1000)。
示例数据框和列表:
import pandas as pd
data = {'reviews':['Very professional operation. Room is very clean and comfortable',
'Daniel is the most amazing host! His place is extremely clean, and he provides everything you could possibly want (comfy bed, guidebooks & maps, mini-fridge, towels, even toiletries). He is extremely friendly and helpful.',
'The room is very quiet, and well decorated, very clean.',
'He provides the room with towels, tea, coffee and a wardrobe.',
'Daniel is a great host. Always recomendable.',
'My friend and I were very satisfied with our stay in his apartment.']}
df = pd.DataFrame(data)
nouns = ['place','Amsterdam','apartment','location','host','stay','city','room','everything','time','house',
'area','home','’','center','restaurants','centre','Great','tram','très','minutes','walk','space','neighborhood',
'à','station','bed','experience','hosts','Thank','bien']
verbs_adj = ['was','is','great','nice','had','clean','were','recommend','stay','are','good','perfect','comfortable',
'have','easy','be','quiet','helpful','get','beautiful',"'s",'has','est','located','un','amazing','wonderful',]
我想创建一个字典来存储每个评论中名词和动词/形容词的所有共现,例如
'非常专业的操作。房间非常干净舒适。'
{'room': {'is': 1, 'clean': 1, 'comfortable': 1}
使用以下代码:
def count_co_occurences(reviews):
# Iterate on each review and count
occurences_per_review = {
f"review_{i+1}": {
noun: dict(Counter(review.lower().split(" ")))
for noun in nouns
if noun in review.lower()
}
for i, review in enumerate(reviews)
}
# Remove verb_adj not found in main list
opr = deepcopy(occurences_per_review)
for review, occurences in opr.items():
for noun, counts in occurences.items():
for verb_adj in counts.keys():
if verb_adj not in verbs_adj:
del occurences_per_review[review][noun][verb_adj]
return occurences_per_review
pprint(count_co_occurences(data["reviews"]))
适用于列表和评论数量较小的情况,但是当此功能用于大型列表/大型编号时,我的笔记本会崩溃。的评论。如何修改代码以处理此问题?
【问题讨论】: