Hadoop Java 错误：线程“主”java.lang.ClassNotFoundException 中的异常：泰坦尼克号答案

【问题标题】：Hadoop Java Error : Exception in thread "main" java.lang.ClassNotFoundException: TitanicHadoop Java 错误：线程“主”java.lang.ClassNotFoundException 中的异常：泰坦尼克号
【发布时间】：2016-09-22 19:09:48
【问题描述】：

我正在尝试运行一个简单的 MapReduce 程序来计算男性和女性的平均年龄。当我试图执行它时，它给了我 Class Not Found Exception（泰坦尼克级）。我发现许多问题提供了类似的答案，并基于此我修改了我的程序，但它仍然给我同样的错误。如果有人可以调试它，那将是非常有帮助的。

import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

public class Titanic{
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable>{
    private Text category = new Text();
    public void map(LongWritable key, Text text, Context context) throws IOException, InterruptedException{
        String line = text.toString();
        String str[] = line.split(",");
        if(str[4] == "male"){
            category.set(str[4]);
        }else{
            category.set(str[4]);
        }
        IntWritable value = new IntWritable(Integer.parseInt(str[5]));
        context.write(category,value);
    }

}

public static class Reduce extends Reducer<Text, IntWritable, Text, FloatWritable>{

    public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException{
        float average = 0;
        int count =0;
        for(IntWritable val : values){
                average = average+val.get();
                count = count + 1;              
        }
        average =average/count;
        context.write(key, new FloatWritable(average));

    }   

}

public static void main(String[] args) throws Exception{
    Configuration conf = new Configuration();
    Job job = Job.getInstance(conf, "titanic");
    job.setJarByClass(Titanic.class);
    job.setMapperClass(Map.class);
    job.setReducerClass(Reduce.class);
    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);       
}

}

以下是我对其执行的命令。

创建一个jar文件：

jar cf example/titanic/titanic.jar example/titanic/Titanic*.class

执行一个jar文件：

bin/hadoop jar example/titanic/titanic.jar Titanic /user/akhil/titanic/input/TitanicData.txt /user/akhil/titanic/output/

【问题讨论】：

哪个类没有找到？

标签： java hadoop mapreduce

【解决方案1】：

罐子坏了。如果您的类属于默认包，则它们不应位于example/titanic/ 目录下，而应位于根目录下。

【讨论】：

我的 Titanic.java 文件位于 /example/titanic 文件夹下。当我编译它时，它在 /example/titanic 文件夹下创建了 3 个类文件。所以我在同一个文件夹下创建了 jar 文件并运行它。由于我没有包声明，所以它有默认包。
好吧，要解决这个问题，您可以将类保留在默认包中，但运行 cd example/titanic && jar cf titanic.jar *.class && cd - 并运行相同的 hadoop 命令；或者在你的java类的第一行添加package example.titanic;语句，重新编译并重新打包，然后运行bin/hadoop jar example/titanic/titanic.jar example.titanic.Titanic /user/akhil/titanic/input/TitanicData.txt /user/akhil/titanic/output/。 docs.oracle.com/javase/tutorial/java/package/managingfiles.html

【解决方案2】：

去掉*：

jar cf example/titanic/titanic.jar example/titanic/Titanic.class

【讨论】：

还是同样的错误。还有其他类，如 Titanic$Map.class 和 Titanic$Reduce.class，所以 * 包括所有这些。我猜这些文件需要包含在 jar 中。