Pandas 中基于行的动态列子集均值答案

【问题标题】：Row-based dynamic column subset mean in PandasPandas 中基于行的动态列子集均值
【发布时间】：2021-08-25 14:48:54
【问题描述】：

我有一个问题可以通过以下方式可视化：

	Our	Cat	The	Home	They	Able
Alice	10	15	NaN	30	20	25
Bob	12	NaN	14	29	NaN	30
John	NaN	9	NaN	NaN	NaN	20
Tyler	11	12	13	24	25	26

一般情况下，每一列中都有数字数据分配给每个人（索引），但有空格。我想知道如何为具有相同名称长度的列与具有缺失值的列填充同一个人的平均值。换句话说，如何将fillna() 和mean() 与一些考虑到列的自定义逻辑结合起来。完美的结果是：

	Our	Cat	The	Home	They	Able
Alice	10	15	12.5	30	20	25
Bob	12	13	14	29	29.5	30
John	9	9	9	20	20	20
Tyler	11	12	13	24	25	26

粗体数字是同一人对于相同“列长度”的平均值。

不幸的是，在我的现实生活场景中，有数百列，因此我无法手动列出每个列对应的列。

提前感谢所有帮助。

【问题讨论】：

标签： python pandas list dataframe numpy

【解决方案1】：

你可以试试：

df = df.groupby(df.columns.map(len), axis =1).apply(lambda x: x.T.fillna(x.mean(1)).T)

输出：

        Our   Cat   The  Home  They  Able
Alice  10.0  15.0  12.5  30.0  20.0  25.0
Bob    12.0  13.0  14.0  29.0  29.5  30.0
John    9.0   9.0   9.0  20.0  20.0  20.0
Tyler  11.0  12.0  13.0  24.0  25.0  26.0

【讨论】：

我不是反对者，但我认为您误读了这个问题。 OP 希望使用基于列名字符长度的分组平均值填充 Null 值。例如，Cat Our Dog 的长度为 3，因此要估算的分组平均值为 9（对于那些列）。同样，其他列的长度为 4，因此分组平均值应基于这些，以填充这些列中的 Null。
@sophcles 哦，是的！！。你是对的！！我误读了这个问题。谢谢 :) 将尝试修复答案。
这似乎比我对大量数据的回答要快得多；做得很好！ +1

【解决方案2】：

这似乎有效：

# create a df to hold the per-person means for each column name length
meandf = pd.DataFrame(index=df.index, columns=df.columns, dtype=float)

# find the unique column name lengths
lengths = set(len(i) for i in df.columns)

# iterate over the lengths and find take the mean for that chunk of the df
for l in lengths:
    subsetcols = df.columns[[len(col) == l for col in df.columns]]
    personmeans = df.loc[:, subsetcols].mean(axis=1)
    meandf.loc[personmeans.index, subsetcols] = personmeans

# write to the original df
df[df.isna()] = meandf

结果：

>>> df
        Our   Cat   The  Home  They  Able
Alice  10.0  15.0  12.5  30.0  20.0    25
Bob    12.0  13.0  14.0  29.0  29.5    30
John    9.0   9.0   9.0  20.0  20.0    20
Tyler  11.0  12.0  13.0  24.0  25.0    26

我使用meandf 作为中间结构来保存方法（没有它我无法弄清楚索引）。在每个单元格中，它包含每个人对于每个列名称长度的平均值：

>>> meandf
        Our   Cat   The  Home  They  Able
Alice  12.5  12.5  12.5  25.0  25.0  25.0
Bob    13.0  13.0  13.0  29.5  29.5  29.5
John    9.0   9.0   9.0  20.0  20.0  20.0
Tyler  12.0  12.0  12.0  25.0  25.0  25.0

【讨论】：