【发布时间】:2017-12-12 12:36:00
【问题描述】:
我正在尝试在 python 中尝试 mapreduce 对模式。需要检查一个单词是否在文本文件中,然后找到它旁边的单词并产生一对这两个单词。继续遇到任何一个:
neighbors = words[words.index(w) + 1]
ValueError: substring not found
或
ValueError: ("the") is not in list
文件 cwork_trials.py
from mrjob.job import MRJob
class MRCountest(MRJob):
# Word count
def mapper(self, _, document):
# Assume document is a list of words.
#words = []
words = document.strip()
w = "the"
neighbors = words.index(w)
for word in words:
#searchword = "the"
#wor.append(str(word))
#neighbors = words[words.index(w) + 1]
yield(w,1)
def reducer(self, w, values):
yield(w,sum(values))
if __name__ == '__main__':
MRCountest.run()
编辑: 尝试使用pairs模式在文档中搜索特定单词的每个实例,然后每次都找到它旁边的单词。然后为每个实例产生一对结果,即找到“the”的实例及其旁边的单词,即 [the]、[book]、[the]、[cat] 等。
from mrjob.job import MRJob
class MRCountest(MRJob):
# Word count
def mapper(self, _, document):
# Assume document is a list of words.
#words = []
words = document.split(" ")
want = "the"
for w, want in enumerate(words, 1):
if (w+1) < len(words):
neighbors = words[w + 1]
pair = (want, neighbors)
for u in neighbors:
if want is "the":
#pair = (want, neighbors)
yield(pair),1
#neighbors = words.index(w)
#for word in words:
#searchword = "the"
#wor.append(str(word))
#neighbors = words[words.index(w) + 1]
#yield(w,1)
#def reducer(self, w, values):
#yield(w,sum(values))
if __name__ == '__main__':
MRCountest.run()
就目前而言,我得到每个单词对与多个相同配对的产量。
【问题讨论】:
-
请添加您的输入数据和所需输出的示例。
-
没有请求的输入。应该在文档中搜索特定的单词,例如代码中的“the”。预期结果是一对由搜索词(即“the”)和紧随其后的词(即鸟、书、房子等)组成的实例。