【发布时间】:2021-10-07 18:34:41
【问题描述】:
我有一个数据框
0 2021-03-19 20:59:49+06 ... I only need uxy to hit 20 eod to make up for a...
1 2021-03-19 20:59:51+06 ... Oh this isn’t good
2 2021-03-19 20:59:51+06 ... lads why is my account covered in more red ink...
3 2021-03-19 20:59:51+06 ... I'm tempted to drop my last 800 into some stup...
4 2021-03-19 20:59:52+06 ... The sell offs will continue until moral improves.
我想使用计数器计算单词的每次出现次数,并且我想确保我只计算字符串 所以我将从
Counter()
Then when word occurs
Counter(I:1,only:1,need:1....)
Then when it will see the same word the number would be added up to the previous number
这是我尝试过的
import enchant
import pandas as pd
import string
from collections import Counter
from nltk.corpus import stopwords
from stopwords import res
discussion = pd.read_csv('discussion_thread_data.csv', error_bad_lines=False, index_col=False, dtype='unicode')
discussion = discussion.drop_duplicates('text')
discussion = discussion[discussion['text'].notnull()]
print(discussion)
# print(discussion)
d = enchant.Dict("en_US")
stop = stopwords.words('english')
word_bin = Counter()
def clean_word(word):
res = []
[res.append(c) for c in word if c not in string.punctuation]
return ''.join(res)
def word_extractor(text):
global word_bin
words = text.split()
words = set([clean_word(word) for word in words])
words = [word for word in words if (word != '' and not d.check(word)) and not ['A', 'IM']]
# words = d.check(words)
word_bin += Counter(words)
print(word_bin)
discussion.text.apply(lambda x: word_extractor(x))
word_bin = [word for word, cnt in word_bin.most_common(100)]
print('end')
print(word_bin)
但它不断给我每行的 Counter() 请帮忙
【问题讨论】: