【问题标题】:Confluent Kafka S3 sink connector throws `java.lang.NoClassDefFoundError: com/google/common/base/Preconditions` when using Parquet formatConfluent Kafka S3 sink 连接器在使用 Parquet 格式时抛出`java.lang.NoClassDefFoundError: com/google/common/base/Preconditions`
【发布时间】:2021-10-12 08:04:25
【问题描述】:

使用 Confluent S3 sink 连接器时,会出现以下错误:

[2021-08-08 02:25:15,588] ERROR WorkerSinkTask{id=s3-test-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover unt
il manually restarted. Error: com/google/common/base/Preconditions (org.apache.kafka.connect.runtime.WorkerSinkTask:607)
java.lang.NoClassDefFoundError: com/google/common/base/Preconditions
        at org.apache.hadoop.conf.Configuration$DeprecationDelta.<init>(Configuration.java:379)
        at org.apache.hadoop.conf.Configuration$DeprecationDelta.<init>(Configuration.java:392)
        at org.apache.hadoop.conf.Configuration.<clinit>(Configuration.java:474)
        at org.apache.parquet.hadoop.ParquetWriter$Builder.<init>(ParquetWriter.java:345)
        at org.apache.parquet.avro.AvroParquetWriter$Builder.<init>(AvroParquetWriter.java:162)
        at org.apache.parquet.avro.AvroParquetWriter$Builder.<init>(AvroParquetWriter.java:153)
        at org.apache.parquet.avro.AvroParquetWriter.builder(AvroParquetWriter.java:43)
        at io.confluent.connect.s3.format.parquet.ParquetRecordWriterProvider$1.write(ParquetRecordWriterProvider.java:79)
        at io.confluent.connect.s3.format.KeyValueHeaderRecordWriterProvider$1.write(KeyValueHeaderRecordWriterProvider.java:105)
        at io.confluent.connect.s3.TopicPartitionWriter.writeRecord(TopicPartitionWriter.java:532)
        at io.confluent.connect.s3.TopicPartitionWriter.checkRotationOrAppend(TopicPartitionWriter.java:302)
        at io.confluent.connect.s3.TopicPartitionWriter.executeState(TopicPartitionWriter.java:245)                                                                   at io.confluent.connect.s3.TopicPartitionWriter.write(TopicPartitionWriter.java:196)
        at io.confluent.connect.s3.S3SinkTask.put(S3SinkTask.java:234)                                                                                                at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:581)                                                                   at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:329)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:232)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:201)
        at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:182)
        at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:231)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)

这发生在 5.5、10.0.0 和 10.0.1 上。

它只发生在 Parquet 上,而 Arvo 工作正常。

日志显示分区器和源数据格式工作正常。

[2021-08-08 02:25:15,564] INFO Opening record writer for: xxxxx/xxxxx.xxxxx.users/year=2021/month=08/day=07/xxxxx.xxxxx.tablename+0+0000000000.snappy.parquet
 (io.confluent.connect.s3.format.parquet.ParquetRecordWriterProvider:74)

连接器是从 Confluent 网站手动下载的。

【问题讨论】:

    标签: apache-kafka apache-kafka-connect s3-kafka-connector


    【解决方案1】:

    事实证明,hadoop-common 需要来自 Google 的 guava utiltiy,而在发行版中不知何故缺少它。

    您需要在hadoop-common Maven repo page 中找到corresponding guava.jar。然后手动将guava.jar下载到连接器的lib/文件夹中。

    似乎有一个条目 explicitly excluded guava from hadoop-common 依赖导致了这个问题:

            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-common</artifactId>
                <version>${hadoop.version}</version>
                <exclusions>
                    <exclusion>
                        <groupId>org.apache.avro</groupId>
                        <artifactId>avro</artifactId>
                    </exclusion>
                    <exclusion>
                        <groupId>com.google.guava</groupId>
                        <artifactId>guava</artifactId>
                    </exclusion>
                    <exclusion>
    

    这确实应该在测试中发现。

    【讨论】:

    • 虽然我同意应该被抓到,但只有单元测试、AFAIK,而不是彻底的冒烟/集成测试,可能在测试范围内明确包含 hadoop-common
    猜你喜欢
    • 1970-01-01
    • 2018-01-18
    • 2018-05-26
    • 1970-01-01
    • 2022-05-31
    • 1970-01-01
    • 2021-10-06
    • 2022-07-12
    • 2023-03-03
    相关资源
    最近更新 更多