【问题标题】:I can't write a orc file with spark我不能用 spark 写一个 orc 文件
【发布时间】:2020-01-20 19:32:06
【问题描述】:

我正在尝试将数据帧写入兽人,但无济于事。我正在使用带有 Java 的 Spark 1.6。 我在本地机器上运行,我尝试安装一些依赖项但没有成功。

我的 POM 是这样的:

<properties>
        <spark.version>1.6.0</spark.version>
        <scala.short.version>2.10</scala.short.version>
        <slf4j.version>1.7.25</slf4j.version>
        <maven.compiler.source>1.7</maven.compiler.source>
        <maven.compiler.target>1.7</maven.compiler.target>
    </properties>


    <dependencies>
        <!-- https://mvnrepository.com/artifact/org.scalatest/scalatest_${scala.short.version} -->


        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-api</artifactId>
            <version>${slf4j.version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming_2.11</artifactId>
            <version>1.6.0</version>
        </dependency>

        <dependency>
            <groupId>org.apache.kafka</groupId>
            <artifactId>kafka-clients</artifactId>
            <version>2.3.0</version>
        </dependency>

        <dependency>
            <groupId>org.apache.kafka</groupId>
            <artifactId>kafka_2.10</artifactId>
            <version>0.9.0.0</version>
        </dependency>

        <dependency>
            <groupId>commons-logging</groupId>
            <artifactId>commons-logging</artifactId>
            <version>1.1.1</version>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming-kafka_2.10</artifactId>
            <version>1.6.0</version>
        </dependency>


        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming-kafka-0-10_2.10</artifactId>
            <version>2.0.0</version>
        </dependency>


        <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>1.6.0</version>
        </dependency>

        <dependency>
            <groupId>com.databricks</groupId>
            <artifactId>spark-avro_2.10</artifactId>
            <version>3.2.0</version>
        </dependency>


        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>2.11.8</version>
            <!--<scope>provided</scope>-->

        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.11</artifactId>
            <version>1.6.0</version>
        </dependency>


        <dependency>
            <groupId>com.typesafe</groupId>
            <artifactId>config</artifactId>
            <version>RELEASE</version>
        </dependency>


        <dependency>
            <groupId>commons-codec</groupId>
            <artifactId>commons-codec</artifactId>
            <version>1.11</version>
            <!--<scope>provided</scope>-->
        </dependency>

        <!-- https://mvnrepository.com/artifact/com.typesafe.play/play-json -->
        <dependency>
            <groupId>com.typesafe.play</groupId>
            <artifactId>play-json_2.11</artifactId>
            <version>2.7.0-M1</version>
        </dependency>




        <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws -->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-aws</artifactId>
            <version>2.7.3</version>
        </dependency>

        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-xml</artifactId>
            <version>2.11.0-M4</version>
        </dependency>

        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-parser-combinators</artifactId>
            <version>2.11.0-M4</version>
        </dependency>



    </dependencies>

我有一个要写入 orc 文件的工作火花,但此错误返回给我:

Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: orc. Please find packages at http://spark-packages.org
    at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:77)
    at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:219)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:139)
    at Confiaveis.main(Confiaveis.java:96)
Caused by: java.lang.ClassNotFoundException: orc.DefaultSource
    at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62)
    at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62)
    at scala.util.Try$.apply(Try.scala:192)
    at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62)
    at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62)
    at scala.util.Try.orElse(Try.scala:84)
    at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:62)
    ... 4 more

我用这个命令写的:

df.write().mode("append").format("orc").save("path");

有谁知道我该如何解决这个问题? 就我对 spark 的了解而言,我知道这是一个他找不到的库,但我找不到任何地方来说明那个库是什么。

【问题讨论】:

    标签: java apache-spark orc


    【解决方案1】:

    试试

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-hive_*your_version*</artifactId>
        <version>*your_version*</version>
        <scope>provided</scope>
    </dependency>
    

    【讨论】:

    • @GabrielSampaio 道歉,orc 是一种 hive 格式,您需要在依赖项中使用它。见编辑
    • 工作!但是现在我又遇到了一个问题:“ORC 数据源只能与 HiveContext 一起使用。”
    猜你喜欢
    • 1970-01-01
    • 2018-10-31
    • 1970-01-01
    • 2019-01-17
    • 2018-01-31
    • 1970-01-01
    • 2020-07-21
    • 2020-05-20
    • 2015-08-27
    相关资源
    最近更新 更多