【发布时间】:2018-05-17 11:25:30
【问题描述】:
我刚开始学习 Python。我正在使用 API 来构建 TFIDFs 模型,但是我遇到了一些无法解决的 lambda 函数错误。这是生成 TFIDF 的类的一部分:
class tfidf(ModelBuilder, Model):
def __init__(self, max_ngram=1, normalize = True):
self.max_ngram = max_ngram
self.normalize = normalize
def build(self, mentions, idfs):
m = mentions\
.map(lambda (target, (span, text)): (target, text))\ """error is triggered here """
.mapValues(lambda v: ngrams(v, self.max_ngram))\
.flatMap(lambda (target, tokens): (((target, t), 1) for t in tokens))\
.reduceByKey(add)\
.map(lambda ((target, token), count): (token, (target, count)))\
.leftOuterJoin(idfs)\
这是mentions 类的示例输出(这是导致tdfidf 类错误的输入):
Out[24]:
[{'_id': u'en.wikipedia.org/wiki/William_Cowper',
'source': 'en.wikipedia.org/wiki/Beagle',
'span': (165, 179),
'text': u'References to the dog appear before the 19th century in works by such writers as William Shakespeare, John Webster, John Dryden, Thomas Tickell, Henry Fielding, and William Cowper, as well as in Alexander Pope\'s translation of Homer\'s "Iliad".'},
{'_id': u"en.wikipedia.org/wiki/K-Run's_Park_Me_In_First",
'source': 'en.wikipedia.org/wiki/Beagle',
'span': (32, 62),
'text': u" On 12 February 2008, a Beagle, K-Run's Park Me In First (Uno), won the Best In Show category at the Westminster Kennel Club show for the first time in the competition's history."},
错误信息是:
.map(lambda (target, (span, text)): (target, text))\
ValueError: too many values to unpack
我试过了:.map(lambda ( src, target, span, text) : (target, text))\,因为我只需要在mentions\ 中导致相同错误的目标和文本。
一个简单且可编译的例子:
import math
import numpy
Data = [{'_id': '333981',
'source': 'Apple',
'span': (100, 119),
'text': ' It is native to the northern Pacific.'}, {'_id': '27262',
'source': 'Apple',
'span': (4, 20),
'text': ' Apples are yummy.'}]
m = map(lambda (ID, (span, text)) : (ID, text) , Data)
print(list(m))
我正在使用 python 2.7。任何帮助或指导将不胜感激。
非常感谢,
【问题讨论】:
-
@JHBonarius 我认为这不同,
.iteritems()可以在 lambda 中使用吗?如果是的话,你能给个提示吗?谢谢 -
我很快就做出了判断。但我无法重现您的错误。你能发个minimal reproducible example吗?
-
不熟悉 Spark,但似乎
mentions项目是具有四个项目的字典或类似字典的对象。您究竟希望这些人如何解压到(target, (span, text))? -
您能告诉我们您使用的是哪个 API 吗?互联网搜索抛出了几个。
-
@PaulaThomas 当然,这里是链接github.com/wikilinks/sift/blob/master/sift/models/text.py
标签: python python-2.7 apache-spark lambda pyspark