【问题标题】:plot Word cloud without stopwords绘制没有停用词的词云
【发布时间】:2020-11-01 04:03:47
【问题描述】:

我希望使用我的 pandas 数据框中的列来绘制 Wordcloud

这是我的代码:

all_words=''.join(  [tweet for tweet in tweet_table['tokens'] ] ) 

word_Cloud=WordCloud(width=500, height=300, random_state=21, max_font_size=119).generate(all_words)

plt.imshow(word_Cloud, interpolation='bilinear')

我要绘制的列tweet_table['tokens'] 如下所示:

0        [da, trumpanzee, follower, blm, balance, wp, g...
1        [counting, blacklivesmatter, received, trainin...
2        [okay, like, little, kids, pretty, smart, know...
3        [thank, oscopelabs, got, mounted, loud, amp, p...
4        [bpi, proud, supported, hoops, 4l, f, e, see, ...
                               ...                        
44713    [tomorrow, buy, charity, compilation, undergro...
44714    [needs, erected, state, capitol, think, darkfa...
44715    [clay, county, sheriffs, motto, screw, amp, in...
44716    [films, eleven, films, bravo, bad, ass, video,...
44717                       [everybody, give, listen, blm]

我上面的代码给了我以下错误:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-227-4066d6d1a153> in <module>
      2 # REMOVE STOP WORDS
      3 
----> 4 all_words=''.join(  [tweet for tweet in tweet_table['tokens'] ] )


TypeError: sequence item 0: expected str instance, list found

请问我该如何解决这个错误? tweet_table['token'] 列是 tokenized 并从任何 stopwords 中清除

非常感谢

Ps:当我在此列中使用类似的代码tweet_table['clean_text'] 时,代码可以正常工作。

tweet_table['clean_text'] 列如下所示:

0            You have a da trumpanzee follower in      ...
1          Over 279  and counting   If  BlackLivesMatte...
2        Okay but like little kids are pretty smart and...
3        Thank you oscopelabs  got it mounted loud  amp...
4        BPI is proud to have supported Hoops4L Y F E  ...
                               ...                        
44713    TOMORROW you can buy the   charity compilation...
44714        That needs to be erected at the State Capi...
44715      Clay County Sheriffs  Motto  To Screw  amp  ...
44716      Films Eleven Films bravo         Bad ass vid...
44717              everybody should give this a listen ...

【问题讨论】:

标签: python-3.x pandas token word-cloud stop-words


【解决方案1】:

我刚刚修好了

allwords=''.join( str(tweet_table['tokens']))

word_Cloud=WordCloud(width=500, height=300, random_state=21,
                     max_font_size=119).generate(allwords)

plt.imshow(word_Cloud, interpolation='bilinear')

tweet_table['tokens'] 没有任何停用词。否则,我们创建一个停用词列表并将其添加为下面的代码

from wordcloud import WordCloud,STOPWORDS

stopwords_newlist = ["https", "co"] + list(STOPWORDS)

allwords=''.join( str(tweet_table['tokens']))

word_Cloud=WordCloud(width=500, height=300, random_state=21, stopwords=stopwords_newlist,
                     max_font_size=119).generate(allwords)


plt.imshow(word_Cloud, interpolation='bilinear')

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2018-03-30
    • 2020-12-30
    • 2019-07-01
    • 2020-09-09
    • 2018-05-23
    • 2019-07-02
    • 2023-03-04
    • 1970-01-01
    相关资源
    最近更新 更多