【问题标题】:How to generate wordcloud of bangla text in python?如何在 python 中生成孟加拉语文本的 wordcloud?
【发布时间】:2021-02-14 03:49:17
【问题描述】:

我尝试了下面的代码:

!pip install python-bidi
from wordcloud import WordCloud
from matplotlib import pyplot as plt
from bidi.algorithm import get_display

text="""মুস্তাফিজ"""

bidi_text = get_display(text)
print(bidi_text)
# https://github.com/amueller/word_cloud/issues/367
# https://stackoverflow.com/questions/54063438/create-wordcloud-in-python-for-foreign-language-hebrew
# https://www.omicronlab.com/bangla-fonts.html
rgx = r"[\u0980-\u09FF]+"
wordcloud = WordCloud(font_path='/content/Siyamrupali.ttf').generate(bidi_text)

#wordcloud = WordCloud(font_path='/content/FreeSansBold.ttf').generate(bidi_text)
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()

然后我得到这个错误:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-87-56d899c0de07> in <module>()
     12 # https://www.omicronlab.com/bangla-fonts.html
     13 rgx = r"[\u0980-\u09FF]+"
---> 14 wordcloud = WordCloud(font_path='/content/Siyamrupali.ttf').generate(bidi_text)
     15 
     16 #wordcloud = WordCloud(font_path='/content/FreeSansBold.ttf').generate(bidi_text)

2 frames
/usr/local/lib/python3.6/dist-packages/wordcloud/wordcloud.py in generate_from_frequencies(self, frequencies, max_font_size)
    381         if len(frequencies) <= 0:
    382             raise ValueError("We need at least 1 word to plot a word cloud, "
--> 383                              "got %d." % len(frequencies))
    384         frequencies = frequencies[:self.max_words]
    385 

ValueError: We need at least 1 word to plot a word cloud, got 0.

这一行没有选择孟加拉语:wordcloud = WordCloud(font_path='/content/Siyamrupali.ttf').generate(bidi_text)

我尝试了这里几乎所有的孟加拉语字体:https://www.omicronlab.com/bangla-fonts.html

没有用

【问题讨论】:

    标签: python nlp data-visualization word-cloud bangla-font


    【解决方案1】:

    您没有用您在词云中定义的 regexp 进行更改。在处理词云中的文本时,它无法匹配模式并返回一个空列表。 在创建词云对象时传递 rgx 变量将解决您的问题。

    wordcloud = WordCloud(font_path='/content/Siyamrupali.ttf',regexp=rgx).generate(bidi_text)
    

    这是完整的sn-p代码。

    !pip install python-bidi
    from wordcloud import WordCloud
    from matplotlib import pyplot as plt
    from bidi.algorithm import get_display
    
    text="""মুস্তাফিজ"""
    
    bidi_text = get_display(text)
    print(bidi_text)
    # https://github.com/amueller/word_cloud/issues/367
    # https://stackoverflow.com/questions/54063438/create-wordcloud-in-python-for-foreign-language-hebrew
    # https://www.omicronlab.com/bangla-fonts.html
    rgx = r"[\u0980-\u09FF]+"
    wordcloud = WordCloud(font_path='/content/Siyamrupali.ttf', 
    regexp=rgx).generate(bidi_text)
    
    #wordcloud = WordCloud(font_path='/content/FreeSansBold.ttf').generate(bidi_text)
    plt.imshow(wordcloud, interpolation='bilinear')
    plt.axis("off")
    plt.show()
    

    【讨论】:

      【解决方案2】:

      我使用以下代码在孟加拉语中生成了一个词云。你可以试试看:

      def generate_Word_cloud(self,author_post, wordsWordnumber, img_file, stop_word_root_path):

      stop_word_file = stop_word_root_path+'/stopwords-bn.txt'
      print(stop_word_file)
      f = open(stop_word_file, "r", encoding="utf8")
      stop_word = f.read().split("\n")
      print(stop_word)
      
      final_text = " ".join(author_post)
      print(final_text)
      wordcloud = WordCloud(stopwords = stop_word, font_path='/usr/share/fonts/truetype/freefont/kalpurush.ttf',
          width = 600, height = 500,max_font_size=300, max_words=vocabularyWordnumber,
                            min_word_length=4, background_color="black").generate(final_text)
      wordcloud.to_file(img_file)
      

      【讨论】:

        【解决方案3】:

        我关注了this comment,最终可以在Ubuntu中解决问题。

        第 1 步

        !sudo apt-get install libfreetype6-dev libharfbuzz-dev libfribidi-dev gtk-doc-tools
        

        第 2 步

        !wget -O raqm-0.7.0.tar.gz https://raw.githubusercontent.com/python-pillow/pillow-depends/master/raqm-0.7.0.tar.gz
        

        现在 raqm-0.7.0.tar.gz 文件应该在您的下载部分。

        第 3 步

        !tar -xzvf raqm-0.7.0.tar.gz
        

        第 4 步

        !cd raqm-0.7.0
        

        第 5 步

        !./configure --prefix=/usr && make -j4 && sudo make -j4 install
        

        第 6 步

        现在您只需重新安装 Pillow 库。激活正确的环境。然后运行以下命令:

        python3 -m pip install --upgrade pip
        python3 -m pip install --upgrade Pillow
        

        就是这样!现在你有一个工作的 Pillow 库,可以在图像中生成适当的孟加拉语和其他印度字体。

        此外,正如@Farzana Eva 在她的评论中所建议的,您需要在 wordcloud 对象中传递 rgx 变量。

        【讨论】:

          猜你喜欢
          • 2018-01-18
          • 2021-11-15
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          相关资源
          最近更新 更多