【问题标题】:Spark Java: java.util.ConcurrentModificationException while broadcasting object of type GenericObjectPoolSpark Java:广播 GenericObjectPool 类型的对象时出现 java.util.ConcurrentModificationException
【发布时间】:2015-08-26 15:19:31
【问题描述】:

我正在 java 中开发一个 spark-streaming 项目。我正在尝试使用 kafka-producer java api 从 spark 向 apache kafka 发送一些消息。由于为每个元素创建 KafkaProducer 实例会非常昂贵,因此我正在尝试使用使用 apache 通用池框架的生产者池。如下面的代码sn-p所示,我正在创建GenericObjectPool实例并广播它,如下所示:-

GenericObjectPool<KafkaProducer<String, String>> producerPool = new GenericObjectPool<KafkaProducer<String, String>>(
                new KafkaProducerFactory(prop));
final Broadcast<GenericObjectPool<KafkaProducer<String, String>>> pool = ssc.sparkContext() .broadcast(producerPool);  //**Causing exception**

KafkaProducerFactory 类的代码粘贴如下:-

import java.io.Serializable;
import java.util.Map;

import org.apache.commons.pool2.BasePooledObjectFactory;
import org.apache.commons.pool2.PooledObject;
import org.apache.commons.pool2.impl.DefaultPooledObject;
import org.apache.kafka.clients.producer.KafkaProducer;

public class KafkaProducerFactory<K,V> extends BasePooledObjectFactory<KafkaProducer<K, V>> 
implements Serializable{
    private Map<String,Object> configs;
    public KafkaProducerFactory(Map<String, Object> configs) {
        this.configs=configs;
    }

    @Override
    public KafkaProducer<K, V> create() {
        return new KafkaProducer<K, V>(this.configs);
    }

    @Override
    public PooledObject<KafkaProducer<K,V>> wrap(KafkaProducer<K,V> producer) {
        return new DefaultPooledObject<KafkaProducer<K,V>>(producer);
    }

    @Override
    public void destroyObject(PooledObject<KafkaProducer<K,V>>obj){
        obj.getObject().close();
    }
}

但上面的行给了我下面粘贴的异常:-

com.esotericsoftware.kryo.KryoException: java.util.ConcurrentModificationException

完整的 StackTrace 粘贴在下面:-

Exception in thread "main" com.esotericsoftware.kryo.KryoException: java.util.ConcurrentModificationException
Serialization trace:
classes (sun.misc.Launcher$AppClassLoader)
classloader (java.security.ProtectionDomain)
context (java.security.AccessControlContext)
acc (org.apache.spark.util.MutableURLClassLoader)
referent (java.lang.ref.WeakReference)
factoryClassLoader (org.apache.commons.pool2.impl.GenericObjectPool)
    at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:585)
    at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
    at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:501)
    at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
    at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
    at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
    at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:318)
    at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:293)
    at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:501)
    at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
    at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
    at com.esotericsoftware.kryo.Kryo.writeObjectOrNull(Kryo.java:549)
    at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:570)
    at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
    at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:501)
    at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
    at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
    at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:501)
    at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
    at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
    at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
    at org.apache.spark.serializer.KryoSerializationStream.writeObject(KryoSerializer.scala:148)
    at org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:203)
    at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:102)
    at org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:85)
    at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
    at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
    at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1291)
    at org.apache.spark.api.java.JavaSparkContext.broadcast(JavaSparkContext.scala:648)
    at com.veda.txt.spark.Engine.start(Engine.java:63)
    at com.veda.txt.spark.Engine.main(Engine.java:126)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:622)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:665)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:170)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:193)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.util.ConcurrentModificationException
    at java.util.Vector$Itr.checkForComodification(Vector.java:1127)
    at java.util.Vector$Itr.next(Vector.java:1104)
    at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:74)
    at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:18)
    at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:501)
    at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
    ... 39 more
15/08/26 20:38:14 INFO SparkContext: Invoking stop() from shutdown hook

请告诉我出了什么问题。

谢谢

【问题讨论】:

    标签: connection-pooling apache-kafka spark-streaming kryo


    【解决方案1】:

    KafkaProducer 不可序列化,不能广播。

    一般来说,对于此类问题,您可以使用 foreachPartition 并为每个分区创建一次昂贵的资源,而不是每个元素一次。如果这仍然不能满足您的性能需求,您可以使用单例(假设对象是线程安全的,对于 kafka 生产者来说应该是线程安全的)。

    最近在 spark 用户邮件列表上分享了一篇关于此主题的博文:

    http://allegro.tech/spark-kafka-integration.html

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-08-05
      • 1970-01-01
      • 2016-06-13
      • 2017-10-07
      • 2019-10-03
      相关资源
      最近更新 更多