Hadoop知识点 - 爱码网

1. Hadoop中的NullWritable

NullWritable是Writable的一个特殊类，实现方法为空实现，不从数据流中读数据，也不写入数据，只充当占位符，如在MapReduce中，如果你不需要使用键或值，你就可以将键或值声明为NullWritable,NullWritable是一个不可变的单实例类型。

比如，我设置map的输出为，这样做：

1 @Override
2 protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
3 String line = value.toString();
4 WebLogBean webLogBean = WebLogParser.parser(line);
5 WebLogParser.filtStaticResource(webLogBean, pages); // 过滤js/图片/css等静态资源
6 k.set(webLogBean.toString()); 
7 context.write(k, NullWritable.get(););
8 }

不能使用new NullWritable()来定义，获取空值只能NullWritable.get()来获取

来自https://www.cnblogs.com/Skyar/p/5815486.html

2. Yarn上的各种Id

jobId

描述：出自MapReduce，对作业的唯一标识。
格式：job_${clusterStartTime}_${jobid}
例子：job_1498552288473_2742

applicationId

描述：在yarn中对作业的唯一标识。
格式：application_${clusterStartTime}_${applicationId}
例子：application_1498552288473_2742

taskId

描述：作业中的任务的唯一标识
格式：task_${clusterStartTime}_${applicationId}_[m|r]_${taskId}
例子：task_1498552288473_2742_m_000000、task_1498552288473_2742_r_000000

attempId

描述：任务尝试执行的一次id
格式：attempt_${clusterStartTime}_${applicationId}_[m|r]_${taskId}_${attempId}
例子：attempt_1498552288473_2742_m_000000_0

appAttempId

描述：ApplicationMaster的尝试执行的一次id。
格式：appattempt_${clusterStartTime}_${applicationId}_${appAttempId}
例子：appattempt_1498552288473_2742_000001

containerId

描述：container的id
格式：container_e*epoch*_${clusterStartTime}_${applicationId}_${appAttempId}_${containerId}
例子：container_e20_1498552288473_2742_01_000032、container_1498552288473_2742_01_000032

参考：Yarn之日志分析