在 Pandas lambda 函数中访问组答案

【问题标题】：Accessing groups in Pandas lambda function在 Pandas lambda 函数中访问组
【发布时间】：2017-10-06 22:22:50
【问题描述】：

我有一个带有多索引的 Pandas 数据框。 0 级是“应变”，1 级是“JGI 库”。每个“应变”都有几个与之关联的“JGI 库”列。我想使用 lambda 函数来应用 t 检验来比较两种不同的菌株。为了排除故障，我一直在使用 .iloc[0] 命令获取我的数据帧的一行。

row = pvalDf.iloc[0]
parent = 'LL1004'
child = 'LL345'
ttest_ind(row.groupby(level='Strain').get_group(parent), row.groupby(level='Strain').get_group(child))[1]

这按预期工作。现在我尝试将它应用到我的整个数据框

parent = 'LL1004'
child = 'LL345'
pvalDf = countsDf4.apply(lambda row: ttest_ind(row.groupby(level='Strain').get_group(parent), row.groupby(level='Strain').get_group(child))[1])

现在我收到一条错误消息，“ValueError: ('level name Strain is not the name of the index', 'occured at index (LL1004, BCHAC)')”

'LL1004' 是一个'Strain'，但 Pandas 似乎并没有意识到这一点。看起来多索引可能没有正确传递给 lambda 函数？有没有比使用 .iloc[0] 更好的解决 lambda 函数问题的方法？

我在 Github https://github.com/danolson1/pandas_ttest 上放了一份 Jupyter 笔记本和一个带有 countsDf4 数据框的 excel 文件@

谢谢，丹

【问题讨论】：

标签： python pandas lambda apply multi-index

【解决方案1】：

怎么样，更简单：

pvalDf = countsDf4.apply(lambda row: ttest_ind(row[parent], row[child]), axis=1)

我已经在你的笔记本上测试过了，它可以工作。

您的问题是DataFrame.apply() 默认情况下将该函数应用于每个列，而不是每一行。因此，您需要指定axis=1 参数来覆盖默认行为并逐行应用函数。

此外，当您可以简单地按row[x] 索引列组时，没有理由使用row.groupby(level='Strain').get_group(x)。 :)

【讨论】：

axis=1 解决了这个问题。也感谢您的其他评论。