在 lambda 函数中解包的值太多答案

【问题标题】：Too many values to unpack in lambda function在 lambda 函数中解包的值太多
【发布时间】：2018-05-17 11:25:30
【问题描述】：

我刚开始学习 Python。我正在使用 API 来构建 TFIDFs 模型，但是我遇到了一些无法解决的 lambda 函数错误。这是生成 TFIDF 的类的一部分：

class tfidf(ModelBuilder, Model):

    def __init__(self, max_ngram=1, normalize = True):
        self.max_ngram = max_ngram
        self.normalize = normalize

    def build(self, mentions, idfs):
        m = mentions\
            .map(lambda (target, (span, text)): (target, text))\  """error is triggered here  """
            .mapValues(lambda v: ngrams(v, self.max_ngram))\
            .flatMap(lambda (target, tokens): (((target, t), 1) for t in tokens))\
            .reduceByKey(add)\
            .map(lambda ((target, token), count): (token, (target, count)))\
            .leftOuterJoin(idfs)\

这是mentions 类的示例输出（这是导致tdfidf 类错误的输入）：

Out[24]:                                                                        
[{'_id': u'en.wikipedia.org/wiki/William_Cowper',
  'source': 'en.wikipedia.org/wiki/Beagle',
  'span': (165, 179),
  'text': u'References to the dog appear before the 19th century in works by such writers as William Shakespeare, John Webster, John Dryden, Thomas Tickell, Henry Fielding, and William Cowper, as well as in Alexander Pope\'s translation of Homer\'s "Iliad".'},
 {'_id': u"en.wikipedia.org/wiki/K-Run's_Park_Me_In_First",
  'source': 'en.wikipedia.org/wiki/Beagle',
  'span': (32, 62),
  'text': u" On 12 February 2008, a Beagle, K-Run's Park Me In First (Uno), won the Best In Show category at the Westminster Kennel Club show for the first time in the competition's history."},

错误信息是：

 .map(lambda (target, (span, text)): (target, text))\
ValueError: too many values to unpack

我试过了：.map(lambda ( src, target, span, text) : (target, text))\，因为我只需要在mentions\ 中导致相同错误的目标和文本。

一个简单且可编译的例子：

import math
import numpy


Data = [{'_id': '333981',

  'source': 'Apple',

  'span': (100, 119),

  'text': ' It is native to the northern Pacific.'}, {'_id': '27262',

  'source': 'Apple',

  'span': (4, 20),

  'text': ' Apples are yummy.'}]



m = map(lambda (ID, (span, text)) : (ID, text) , Data)

print(list(m))

我正在使用 python 2.7。任何帮助或指导将不胜感激。

非常感谢，

【问题讨论】：

@JHBonarius 我认为这不同，.iteritems() 可以在 lambda 中使用吗？如果是的话，你能给个提示吗？谢谢
我很快就做出了判断。但我无法重现您的错误。你能发个minimal reproducible example吗？
不熟悉 Spark，但似乎 mentions 项目是具有四个项目的字典或类似字典的对象。您究竟希望这些人如何解压到(target, (span, text))？
您能告诉我们您使用的是哪个 API 吗？互联网搜索抛出了几个。
@PaulaThomas 当然，这里是链接github.com/wikilinks/sift/blob/master/sift/models/text.py

标签： python python-2.7 apache-spark lambda pyspark

【解决方案1】：

如果您想创建一个仅包含字段source 和text 的新字典数组，您可以使用

m = map(lambda item: {field: item.get(field) for field in ['source', 'text']}, Data)

如果要为键 source 和 text 创建内容元组数组：

m = map(lambda item: (item.get('source'), item.get('text')), Data)

【讨论】：

我试过了，这就是现在出现的/spark-2.3.0-bin-hadoop2.6/python/pyspark/rdd.py", line 1979, in <lambda> map_values_fn = lambda kv: (kv[0], f(kv[1])) KeyError: 0。感谢您的帮助，解决方案可能会有所不同。
@user3446905：这是关于下一行.mapValues(lambda v: ngrams(v, self.max_ngram))。您显示的代码似乎与您的字典布局不兼容。
@user3446905 你可能只需要一个关键内容的元组吗？例如。 .map(lambda item: (item.get('source'), item.get('text'))
新的错误是否意味着这条线.map(lambda (target, (span, text)): (target, text))` should stay the same? meaning that span`和text应该作为一个元组出现？
这item.get('source') 检索的不是实际数据而不是密钥吗？