1 hadoop conf.addResource
http://stackoverflow.com/questions/16017538/how-does-configuration-addresource-method-work-in-hadoop
How does Configuration.addResource() method work in hadoop up vote 0 down vote favorite Does Configuration.addResource() method load resource file like ClassLoader of java or it just encapsulates ClassLoader class.Because I find it can not use String like "../resource.xml" as argument of addResource() to load resource file out of classpath, this property is just the same as ClassLoader. Thx! hadoop shareimprove this question asked Apr 15 '13 at 14:18 foolyoghurt 478 "How does it work" is a different question from "why is my usage not working for me?" Which do you really want to know? – Matt Ball Apr 15 '13 at 14:19 add a comment 1 Answer active oldest votes up vote 2 down vote Browsing the Javadocs and source code for Configuration, Strings are assumed to be classpaths (line 1162), rather than relative to the file system - you should use URLs to reference files on the local file system as follows: conf.addResource(new File("../resource.xml").toURI().toURL()); shareimprove this answer answered Apr 17 '1
2 hadoop MapReduce 读取参数
下面我们先通过一个表格来看下,在hadoop中,使用全局变量或全局文件共享的几种方法
1 使用Configuration的set方法,只适合数据内容比较小的场景
2 将共享文件放在HDFS上,每次都去读取,效率比较低
3 将共享文件放在DistributedCache里,在setup初始化一次后,即可多次使用,缺点是不支持修改操作,仅能读取
下面是第3中方式的介绍
Alternative to deprecated DistributedCache class in Hadoop 2.2.0 As of Hadoop 2.2.0, if you use org.apache.hadoop.filecache.DistributedCache class to load files you want to add to your job as distributed cache, then your compiler will warn you regarding this class being deprecated. In earlier versions of Hadoop, we used DistributedCache class in the following fashion to add files to be available to all mappers and reducers locally: ? 1 2 3 4 5 6 7 // In the main driver class using the new mapreduce API Configuration conf = getConf(); ... DistributedCache.addCacheFile(new Path(filename).toUri(), conf); ... Job job = new Job(conf); ... ? 1 2 // In the mapper class, mostly in the setup method Path[] myCacheFiles = DistributedCache.getLocalCacheFiles(job); But now, with Hadoop 2.2.0, the functionality of addition of files to distributed cache has been moved to the org.apache.hadoop.mapreduce.Job class. You may also notice that the constructor we used to use for the Job class has also been deprecated and instead we should be using the new factory method getInstance(Configuration conf). The alternative solution would look as follows: ? 1 2 3 4 5 6 // In the main driver class using the new mapreduce API Configuration conf = getConf(); ... Job job = Job.getInstance(conf); ... job.addCacheFile(new URI(filename)); ? 1 2 // In the mapper class, mostly in the setup method URI[] localPaths = context.getCacheFiles();