【发布时间】:2020-10-20 23:53:21
【问题描述】:
我在处理以下数据时遇到了一些困难(来自 pandas 数据框):
Text
0 Selected moments from Fifa game t...
1 What I learned is that I am ...
3 Bill Gates kept telling us it was comi...
5 scenario created a month before the...
... ...
1899 Events for May 19 – October 7 - October CTOvision.com
1900 Office of Event Services and Campus Center Ope...
1901 How the CARES Act May Affect Gift Planning in ...
1902 City of Rohnert Park: Home
1903 iHeartMedia, Inc.
我需要提取每行唯一单词的计数(删除标点符号后)。所以,例如:
Unique
0 6
1 6
3 8
5 6
... ...
1899 8
1900 8
1901 9
1902 5
1903 2
我尝试如下:
df["Unique"]=df['Text'].str.lower()
df["Unique"]==Counter(word_tokenize('\n'.join( file["Unique"])))
但我没有任何计数,只有一个单词列表(没有它们在该行中出现的频率)。
你能告诉我有什么问题吗?
【问题讨论】: