【发布时间】:2018-04-29 16:49:32
【问题描述】:
我正在尝试读取 HDInsight Spark/Jupyter 集群中的 avro 文件,但得到了
u'Failed to find data source: com.databricks.spark.avro. Please find an Avro package at http://spark.apache.org/third-party-projects.html;'
Traceback (most recent call last):
File "/usr/hdp/current/spark2-client/python/pyspark/sql/readwriter.py", line 159, in load
return self._df(self._jreader.load(path))
File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/usr/hdp/current/spark2-client/python/pyspark/sql/utils.py", line 69, in deco
raise AnalysisException(s.split(': ', 1)[1], stackTrace)
AnalysisException: u'Failed to find data source: com.databricks.spark.avro. Please find an Avro package at http://spark.apache.org/third-party-projects.html;'
df = spark.read.format("com.databricks.spark.avro").load("wasb://containername@aaa...aaa.blob.core.windows.net/...")
我该如何解决这个问题?似乎我需要安装该软件包,但如何在 HDInsight 上安装?
【问题讨论】:
标签: azure jupyter azure-hdinsight