【问题标题】:AttributeError: 'WordList' object has no attribute 'split'AttributeError: 'WordList' 对象没有属性 'split'
【发布时间】:2021-04-05 21:00:26
【问题描述】:

在对“脚本”列进行标记后,我正在尝试应用 Lemmatization。但我得到一个属性错误。我尝试了不同的方法

这是我的“脚本”专栏:

df_toklem["script"][0:5]
---------------------------------------------------------------------------
type(df_toklem["script"])

输出:

id
1    [ext, street, day, ups, man, big, pot, belly, ...
2    [credits, still, life, tableaus, lawford, n, h...
3    [fade, ext, convent, day, whispering, nuns, pr...
4    [fade, int, c, hercules, turbo, prop, night, e...
5    [open, theme, jaws, plane, busts, clouds, like...
Name: script, dtype: object
---------------------------------------------------------------------------
pandas.core.series.Series

以及我尝试应用词形还原的代码:

from textblob import Word
nltk.download("wordnet")
df_toklem["script"].apply(lambda x: " ".join([Word(word).lemmatize() for word in x.split()]))

错误:

[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\PC\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-72-dbc80c619ec5> in <module>
      1 from textblob import Word
      2 nltk.download("wordnet")
----> 3 df_toklem["script"].apply(lambda x: " ".join([Word(word).lemmatize() for word in x.split()]))

~\Anaconda3\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds)
   4198             else:
   4199                 values = self.astype(object)._values
-> 4200                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   4201 
   4202         if len(mapped) and isinstance(mapped[0], Series):

pandas\_libs\lib.pyx in pandas._libs.lib.map_infer()

<ipython-input-72-dbc80c619ec5> in <lambda>(x)
      1 from textblob import Word
      2 nltk.download("wordnet")
----> 3 df_toklem["script"].apply(lambda x: " ".join([Word(word).lemmatize() for word in x.split()]))

AttributeError: 'WordList' object has no attribute 'split'

我尝试了不同的方法,但遗憾的是找不到有效的解决方案。感谢您的宝贵时间。

【问题讨论】:

    标签: python nlp token lemmatization word-list


    【解决方案1】:

    您尝试执行的操作不起作用,因为您将字符串函数(拆分)应用于单词列表。 我会尝试改用 nltk,并使用我的标记化数据创建一个新的 pandas 列:

    import nltk
    df_toklem['tokenized'] = df_toklem.apply(lambda row: nltk.word_tokenize(row['script']))
    

    【讨论】:

    • 嗯,我明白了。谢谢你的帮助:) @circuito
    • 如果您觉得我的回答有用,请不要忘记接受/投票 :)
    • 顺便说一句,你应该删除 ",axis=1" :) @circuito
    • 是的,对不起! :p
    猜你喜欢
    • 2017-07-02
    • 2016-01-31
    • 2016-09-19
    • 2020-09-14
    • 2021-07-25
    • 2019-03-18
    • 2015-04-30
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多