【问题标题】:Apache Spark - Unable to read data from MS Access tables into Spark datasetApache Spark - 无法将数据从 MS Access 表读取到 Spark 数据集中
【发布时间】:2020-05-10 01:06:29
【问题描述】:

当我尝试将 .accdb 数据读入我的 spark 数据集时,我得到了

Exception in thread "main" java.lang.NoClassDefFoundError: Could not initialize class net.ucanaccess.jdbc.UcanaccessDriver
at java.base/java.lang.Class.forName0(Native Method)
at java.base/java.lang.Class.forName(Class.java:398)
at java.sql/java.sql.DriverManager.isDriverAllowed(DriverManager.java:555)
at java.sql/java.sql.DriverManager.isDriverAllowed(DriverManager.java:547)
at java.sql/java.sql.DriverManager.getDriver(DriverManager.java:280)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.$anonfun$driverClass$2(JDBCOptions.scala:105)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.<init>(JDBCOptions.scala:105)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.<init>(JDBCOptions.scala:35)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:32)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
at business.extract.DataExtractorImpl.loadFromAccessTable(DataExtractorImpl.java:62)
at application.Orchestrator.initializeJob(Orchestrator.java:52)
at application.ETLEngine.main(ETLEngine.java:15)

这是我的代码:

//DataExtractorImpl.java
    public Dataset<Row> loadFromAccessTable(String url, String tableName) throws IOException, CustomValidationException {
    return ETLContext.getETLContext().getSession()
            .read()
            .format("jdbc")
            .option("URL", "jdbc:ucanaccess://C:/Users/KE926ES/Documents/db/Creditcard_default.accdb")
            .option("dbtable", "CC_SOURCE_1")
            .load();

我有以下罐子

  • ucanaccess-5.0.0.jar
  • jackcess-3.0.1.jar
  • commons-lang3-3.10.jar
  • commons-logging-1.2.jar

我还尝试将以下内容添加到选项列表中

.option("driver", "net.ucanaccess.jdbc.UcanaccessDriver")

【问题讨论】:

    标签: apache-spark apache-spark-dataset


    【解决方案1】:

    类路径或 fat jar 中的库可能不可用。

    尝试使用 spark-submit 将所需的 jar 传递给您的应用程序,如下所示。

    spark-submit --conf spark.driver.extraClassPath=ucanaccess-5.0.0.jar:jackcess-3.0.1.jar:commons-lang3-3.10.jar:commons-logging-1.2.jar --conf spark.executor.extraClassPath=ucanaccess-5.0.0.jar:jackcess-3.0.1.jar:commons-lang3-3.10.jar:commons-logging-1.2.jar
    
    

    此外,如果您从 IDE 运行应用程序,请检查这些 jar 是否已正确添加到您的类路径中。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2022-01-02
      • 2017-03-28
      • 2019-07-09
      • 2021-10-26
      相关资源
      最近更新 更多