【问题标题】:ALS training using PySpark throws a StackOverflowError使用 PySpark 进行 ALS 训练会引发 StackOverflowError
【发布时间】:2015-11-21 14:07:31
【问题描述】:

尝试在 Windows 上使用 Spark 的 MLLib (1.4) 中的 ALS 训练机器学习模型时,Pyspark 总是以 StackoverflowError 终止。我尝试按照https://stackoverflow.com/a/31484461/36130 中的说明添加检查点——似乎没有帮助(虽然每次运行都会创建一个新目录,但它始终是空的)。

这是训练代码和堆栈跟踪:

ranks = [8, 12]
lambdas = [0.1, 10.0]
numIters = [10, 20]
bestModel = None
bestValidationRmse = float("inf")
bestRank = 0
bestLambda = -1.0
bestNumIter = -1

for rank, lmbda, numIter in itertools.product(ranks, lambdas, numIters):
    ALS.checkpointInterval = 2
    model = ALS.train(training, rank, numIter, lmbda)
    validationRmse = computeRmse(model, validation, numValidation)

    if (validationRmse < bestValidationRmse):
         bestModel = model
         bestValidationRmse = validationRmse
         bestRank = rank
         bestLambda = lmbda
         bestNumIter = numIter

testRmse = computeRmse(bestModel, test, numTest)

堆栈跟踪:

15/08/27 02:02:58 ERROR Executor: Exception in task 3.0 in stage 56.0 (TID 127)
java.lang.StackOverflowError
    at java.io.ObjectInputStream$BlockDataInputStream.readInt(Unknown Source)
    at java.io.ObjectInputStream.readHandle(Unknown Source)
    at java.io.ObjectInputStream.readClassDesc(Unknown Source)
    at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
    at java.io.ObjectInputStream.readObject0(Unknown Source)
    at java.io.ObjectInputStream.defaultReadFields(Unknown Source)
    at java.io.ObjectInputStream.readSerialData(Unknown Source)
    at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
    at java.io.ObjectInputStream.readObject0(Unknown Source)
    at java.io.ObjectInputStream.defaultReadFields(Unknown Source)
    at java.io.ObjectInputStream.readSerialData(Unknown Source)
    at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
    at java.io.ObjectInputStream.readObject0(Unknown Source)
    at java.io.ObjectInputStream.defaultReadFields(Unknown Source)
    at java.io.ObjectInputStream.readSerialData(Unknown Source)
    at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
    at java.io.ObjectInputStream.readObject0(Unknown Source)
    at java.io.ObjectInputStream.readObject(Unknown Source)
    at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
    at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at java.io.ObjectStreamClass.invokeReadObject(Unknown Source)
    at java.io.ObjectInputStream.readSerialData(Unknown Source)

【问题讨论】:

  • 你的数据大小和给 spark 的堆大小是多少?
  • 输入文件为24MB(约100k条记录),spark.executor.memory - 4G,JVM内存设置为2G
  • 我猜你是在本地模式下运行的?
  • 自设置以来将 spark.driver.memory 设置为 4G。一切都发生在驱动程序内部

标签: apache-spark machine-learning pyspark apache-spark-mllib


【解决方案1】:

尝试设置检查点目录

sc.setCheckpointDir("/check_point_dir")

【讨论】:

    猜你喜欢
    • 2015-10-07
    • 1970-01-01
    • 2021-05-24
    • 2016-10-19
    • 1970-01-01
    • 2021-03-31
    • 2020-01-12
    • 2021-05-23
    • 2017-09-28
    相关资源
    最近更新 更多