【问题标题】:Declare mrjob mapper without ignoring key在不忽略键的情况下声明 mrjob 映射器
【发布时间】:2015-11-16 22:38:54
【问题描述】:

我想用mrjob 声明一个映射器函数。因为我的mapper函数需要引用一些常量来做一些计算所以我决定把这些常量放到mapper中的Key中(有没有其他方法?)。我在this site 上阅读了 mrjob 教程,但所有示例都忽略了关键。例如:

class MRWordFrequencyCount(MRJob):

def mapper(self, _, line):
    yield "chars", len(line)
    yield "words", len(line.split())
    yield "lines", 1

def reducer(self, key, values):
    yield key, sum(values)

基本上,我想要类似的东西:

def mapper(self, (constant1,constant2,constant3,constant4,constant5), line):
    My calculation goes here

请建议我怎么做。谢谢

【问题讨论】:

    标签: python hadoop mapreduce mrjob


    【解决方案1】:

    您可以在 __init__ 中设置常量

    from mrjob.job import MRJob
    
    class MRWordFrequencyCount(MRJob):
    
        def mapper(self, key, line):
            yield "chars", len(line)
            yield "words", len(line.split())
            yield "lines", 1
            yield "Constant",self.constant
    
        def reducer(self, key, values):
            yield key, sum(values)
    
        def __init__(self,*args,**kwargs):
            super(MRWordFrequencyCount, self).__init__(*args, **kwargs)
            self.constant = 10
    
    
    if __name__ == '__main__':
        MRWordFrequencyCount.run()
    

    输出:

    "Constant"  10
    "chars" 12
    "lines" 1
    "words" 2
    

    或者,您可以使用RawProtocol

    from mrjob.job import MRJob
    import mrjob
    
    
    class MRWordFrequencyCount(MRJob):
        INPUT_PROTOCOL = mrjob.protocol.RawProtocol
    
        def mapper(self, key, line):
            yield "constant", key
            yield "chars", len(line)
            yield "words", len(line.split())
            yield "lines", 1
    
        def reducer(self, key, values):
            if str(key) != "constant":
                yield key, sum(values)
            else:
                yield "constant",list(values)
    
    
    if __name__ == '__main__':
        MRWordFrequencyCount.run()
    

    如果输入是:

    constant1,constant2,constant3   The quick brown fox jumps over the lazy dog
    

    输出:

    "chars" 43
    "constant"  ["constant1,constant2,constant3"]
    "lines" 1
    "words" 9
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2018-07-11
      • 2010-10-12
      • 2011-07-07
      • 1970-01-01
      • 1970-01-01
      • 2016-09-21
      • 2021-11-28
      • 1970-01-01
      相关资源
      最近更新 更多