1  hadoop conf.addResource  

http://stackoverflow.com/questions/16017538/how-does-configuration-addresource-method-work-in-hadoop

How does Configuration.addResource() method work in hadoop
up vote
0
down vote
favorite
    

Does Configuration.addResource() method load resource file like ClassLoader of java or it just encapsulates ClassLoader class.Because I find it can not use String like "../resource.xml" as argument of addResource() to load resource file out of classpath, this property is just the same as ClassLoader.
Thx!
hadoop
shareimprove this question
    
asked Apr 15 '13 at 14:18
foolyoghurt
478
    
        
    
"How does it work" is a different question from "why is my usage not working for me?" Which do you really want to know? – Matt Ball Apr 15 '13 at 14:19
add a comment
1 Answer
active
oldest
votes
up vote
2
down vote
    

Browsing the Javadocs and source code for Configuration, Strings are assumed to be classpaths (line 1162), rather than relative to the file system - you should use URLs to reference files on the local file system as follows:

conf.addResource(new File("../resource.xml").toURI().toURL());

shareimprove this answer
    
answered Apr 17 '1

 

2  hadoop MapReduce 读取参数

下面我们先通过一个表格来看下,在hadoop中,使用全局变量或全局文件共享的几种方法

1     使用Configuration的set方法,只适合数据内容比较小的场景
2     将共享文件放在HDFS上,每次都去读取,效率比较低
3     将共享文件放在DistributedCache里,在setup初始化一次后,即可多次使用,缺点是不支持修改操作,仅能读取

下面是第3中方式的介绍

hadoop常见问题汇集

Alternative to deprecated DistributedCache class in Hadoop 2.2.0
As of Hadoop 2.2.0, if you use org.apache.hadoop.filecache.DistributedCache class to load files you want to add to your job as distributed cache, then your compiler will warn you regarding this class being deprecated.

In earlier versions of Hadoop, we used DistributedCache class in the following fashion to add files to be available to all mappers and reducers locally:
?
1
2
3
4
5
6
7
    
// In the main driver class using the new mapreduce API
Configuration conf = getConf();
...
DistributedCache.addCacheFile(new Path(filename).toUri(), conf);
...
Job job = new Job(conf);
...

?
1
2
    
// In the mapper class, mostly in the setup method
Path[] myCacheFiles = DistributedCache.getLocalCacheFiles(job);

But now, with Hadoop 2.2.0, the functionality of addition of files to distributed cache has been moved to the org.apache.hadoop.mapreduce.Job class. You may also notice that the constructor we used to use for the Job  class has also been deprecated and instead we should be using the new factory method getInstance(Configuration conf). The alternative solution would look as follows:

?
1
2
3
4
5
6
    
// In the main driver class using the new mapreduce API
Configuration conf = getConf();
...
Job job = Job.getInstance(conf);
...
job.addCacheFile(new URI(filename));

?
1
2
    
// In the mapper class, mostly in the setup method
URI[] localPaths = context.getCacheFiles();
souce code

相关文章:

  • 2021-11-19
  • 2021-11-02
  • 2021-08-01
  • 2021-10-17
  • 2021-07-17
  • 2021-12-09
  • 2021-04-30
猜你喜欢
  • 2022-12-23
  • 2021-03-31
  • 2022-02-07
  • 2022-12-23
  • 2021-05-27
  • 2021-11-04
  • 2022-12-23
相关资源
相似解决方案