【问题标题】:Mahout - Exception: Java Heap spaceMahout - 例外:Java 堆空间
【发布时间】:2014-04-09 14:14:59
【问题描述】:

我正在尝试使用以下方法将一些文本转换为 mahout 序列文件:

mahout seqdirectory -i Lastfm-ArtistTags2007 -o seqdirectory

但我得到的只是 OutOfMemoryError,如下所示:

Running on hadoop, using /usr/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /opt/mahout/mahout-examples-0.9-job.jar
14/04/07 16:44:34 INFO common.AbstractJob: Command line arguments: {--charset=[UTF-8], --chunkSize=[64], --endPhase=[2147483647], --fileFilterClass=[org.apache.mahout.text.PrefixAdditionFilter], --input=[Lastfm-ArtistTags2007], --keyPrefix=[], --method=[mapreduce], --output=[seqdirectoryjps], --startPhase=[0], --tempDir=[temp]}
14/04/07 16:44:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/04/07 16:44:35 INFO input.FileInputFormat: Total input paths to process : 4
14/04/07 16:44:35 WARN snappy.LoadSnappy: Snappy native library not loaded
14/04/07 16:44:35 INFO mapred.JobClient: Running job: job_local407267609_0001
14/04/07 16:44:35 INFO mapred.LocalJobRunner: Waiting for map tasks
14/04/07 16:44:35 INFO mapred.LocalJobRunner: Starting task: attempt_local407267609_0001_m_000000_0
14/04/07 16:44:35 INFO util.ProcessTree: setsid exited with exit code 0
14/04/07 16:44:35 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6ad3ad65
14/04/07 16:44:35 INFO mapred.MapTask: Processing split: Paths:/home/giuliano/cook/lastfm/Lastfm-ArtistTags2007/README.txt:0+2472,/home/giuliano/cook/lastfm/Lastfm-ArtistTags2007/ArtistTags.dat:0+71652722,/home/giuliano/cook/lastfm/Lastfm-ArtistTags2007/tags.txt:0+1739746,/home/giuliano/cook/lastfm/Lastfm-ArtistTags2007/artists.txt:0+327051
14/04/07 16:44:35 INFO compress.CodecPool: Got brand-new compressor
14/04/07 16:44:35 INFO mapred.LocalJobRunner: Map task executor complete.
14/04/07 16:44:35 WARN mapred.LocalJobRunner: job_local407267609_0001
java.lang.Exception: java.lang.OutOfMemoryError: Java heap space
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.OutOfMemoryError: Java heap space
    at org.apache.hadoop.io.BytesWritable.setCapacity(BytesWritable.java:119)
    at org.apache.mahout.text.WholeFileRecordReader.nextKeyValue(WholeFileRecordReader.java:118)
    at org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader.nextKeyValue(CombineFileRecordReader.java:69)
    at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:531)
    at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)
14/04/07 16:44:36 INFO mapred.JobClient:  map 0% reduce 0%
14/04/07 16:44:36 INFO mapred.JobClient: Job complete: job_local407267609_0001
14/04/07 16:44:36 INFO mapred.JobClient: Counters: 0
14/04/07 16:44:36 INFO driver.MahoutDriver: Program took 1749 ms (Minutes: 0.02915)

我正在使用 Mahout 0.9、Hadoop 1.2.1 和 OpenJDK Java7u25

将 MAHOUT_HEAPSIZE 定义为 4096 没有帮助,可以在此处找到文本文件:http://static.echonest.com/Lastfm-ArtistTags2007.tar.gz

【问题讨论】:

    标签: hadoop mahout


    【解决方案1】:

    当前生成的作业作为本地作业运行程序执行,执行仅发生在您触发作业的节点中。通过在mapred-site.xml 中设置属性mapred.job.tracker 来指定作业跟踪器地址,以使执行分布。

    在分布式模式下执行可能会解决您的内存不足问题

    如果您查看环境变量HADOOP_CONF_DIR,它的值是空的,使用下面的命令export HADOOP_CONF_DIR=/etc/hadoop/conf 设置它的值。确保属性 mapred.job.tracker 的值应该指向 /etc/hadoop/conf/mapred-site.xml 配置中的 jobTracker

    【讨论】:

    • 究竟我需要改变什么?我不是 hadoop 方面的专家
    猜你喜欢
    • 2011-08-26
    • 2021-09-09
    • 2013-04-17
    • 1970-01-01
    • 2011-10-27
    • 1970-01-01
    • 1970-01-01
    • 2017-12-19
    • 2011-10-24
    相关资源
    最近更新 更多