【问题标题】:TypeError: 'JavaPackage' object is not callable on google collab [duplicate]TypeError:“JavaPackage”对象在谷歌协作中不可调用[重复]
【发布时间】:2021-06-22 17:10:47
【问题描述】:

我正在学习 apache spark,我在 google colab 上运行了以下代码。

#installed based upon https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/jupyter/quick_start_google_colab.ipynb#scrollTo=lNu3meQKEXdu

import os

# Install java
!apt-get install -y openjdk-8-jdk-headless -qq > /dev/null
!wget -q "https://downloads.apache.org/spark/spark-3.1.1/spark-3.1.1-bin-hadoop2.7.tgz" > /dev/null
!tar -xvf spark-3.1.1-bin-hadoop2.7.tgz > /dev/null
!pip install -q findspark

os.environ["SPARK_HOME"] = "/content/spark-3.1.1-bin-hadoop2.7"
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["PATH"] = os.environ["JAVA_HOME"] + "/bin:" + os.environ["PATH"]
! java -version

# Install spark-nlp and pyspark
! pip install spark-nlp==3.0.0 pyspark==3.1.1


import sparknlp
spark = sparknlp.start()

from sparknlp.base import DocumentAssembler
documentAssembler = DocumentAssembler().setInputCol(text_col).setOutputCol('document')

我收到以下错误。我该如何解决它

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-48-535b177b526b> in <module>()
      4 
      5 from sparknlp.base import DocumentAssembler
----> 6 documentAssembler = DocumentAssembler().setInputCol(text_col).setOutputCol('document')

4 frames
/usr/local/lib/python3.7/dist-packages/pyspark/ml/wrapper.py in _new_java_obj(java_class, *args)
     64             java_obj = getattr(java_obj, name)
     65         java_args = [_py2java(sc, arg) for arg in args]
---> 66         return java_obj(*java_args)
     67 
     68     @staticmethod

TypeError: 'JavaPackage' object is not callable

【问题讨论】:

  • 试试这个:documentAssembler = DocumentAssembler.setInputCol(text_col).setOutputCol('document')
  • 试过但不同的错误----&gt; 6 documentAssembler = DocumentAssembler.setInputCol(text_col).setOutputCol('document') TypeError: setInputCol() missing 1 required positional argument: 'value'
  • 参考我的回答,有更多解释

标签: java python apache-spark google-colaboratory


【解决方案1】:

正如我在上一条评论中提到的:

通过您拥有的 spark 数据框中的列的名称更改 text_col,按其名称记录 您可以添加 .setCleanupMode("clean_mode") 更多详情可以参考这个链接:https://spark.apache.org/docs/latest/ml-features

documentAssembler = DocumentAssembler \
                   .setInputCol("text_col") \               
                   .setOutputCol("document")         
                   

【讨论】:

    猜你喜欢
    • 2017-01-04
    • 2020-02-24
    • 2022-08-11
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-02-02
    • 1970-01-01
    相关资源
    最近更新 更多