假设您的数据框如下所示:
import pandas as pd
import numpy as np
data = {"foobar":["foo", "bar", "baz"],
"year":[[np.nan,np.nan,np.nan],
[12,10,8],
[np.nan,np.nan,np.nan]]}
df = pd.DataFrame(data)
foobar year
0 foo [nan, nan, nan]
1 bar [12, 10, 8]
2 baz [nan, nan, nan]
...您可以使用apply 构建一个包含平均值的新列:
df["means"] = df.year.apply(np.mean)
result_list = df.means.values # array([nan, 10., nan])
foobar year means
0 foo [nan, nan, nan] NaN
1 bar [12, 10, 8] 10.0
2 baz [nan, nan, nan] NaN
但是,根据您还想对数据做什么,最好将explode 序列放入单个单元格中以获得更多pandaesque结构:
df = df.explode(column="year")
df["year"] = df.year.astype(float) # tell Pandas it's numerical data
foobar year
0 foo NaN
0 foo NaN
0 foo NaN
1 bar 12.0
...
现在只需使用默认操作来获取按foobar 或您的列名分组的值。
mean_df = df.groupby("foobar").mean()
year
foobar
bar 10.0
baz NaN
foo NaN