在熊猫数据框中创建新列作为分组依据[关闭]答案

【问题标题】：Create new column as a group by in pandas dataframe [closed]在熊猫数据框中创建新列作为分组依据[关闭]
【发布时间】：2020-02-11 23:00:54
【问题描述】：

大家好，我需要您的帮助才能在 pandas 数据框中获得预期的输出/结果。我有一个文件，其中包含如下所示的数据：

Time/Location    Value
Location1   
Today             3
Next day          0
Weekend          -6
Next week         1
Location2   
Today             2
Next day         -1
Weekend           3
Next week         2
Location3   
Today             1
Next day          3
Weekend 1
Next week        -1
Location4   
Today             3
Next day          2
Weekend           5
Next week         4
Location5   
Today             4
Next day          2
Weekend           3
Next week         1
Location6   
Today            -1
Next day          3
Weekend           3
Next week         2

并期望输出如下所示，它正在为“位置”创建新列。

Location    Time       Value
Location1   Today       3
Location1   Next day    0
Location1   Weekend    -6
Location1   Next week   1
Location2   Today       2
Location2   Next day   -1
Location2   Weekend     3
Location2   Next week   2
Location3   Today       1
Location3   Next day    3
Location3   Weekend     1
Location3   Next week  -1
Location4   Today       3
Location4   Next day    2
Location4   Weekend     5
Location4   Next week   4
Location5   Today       4
Location5   Next day    2
Location5   Weekend     3
Location5   Next week   1
Location6   Today      -1
Location6   Next day    3
Location6   Weekend     3
Location6   Next week   2

感谢任何帮助/建议....拜托！

谢谢！

【问题讨论】：

发布数据，而不是图片。以此为指导：stackoverflow.com/questions/20109391/…
到底是什么问题？ Stack Overflow 不是免费的代码编写服务。请参阅：How to Ask、tour、help center。

标签： python python-3.x pandas numpy dataframe

【解决方案1】：

如果Value 中不存在的值丢失，请使用DataFrame.insert 替换第一列中缺少的值并用ffill 前向填充它们，最后用DataFrame.dropna 和rename 列删除行：

df.insert(0, 'Location', df['Time/Location'].mask(df['Value'].notna()).ffill())
df = df.dropna(subset=['Value']).rename(columns={'Time/Location':'Time'})
print (df)
     Location       Time  Value
1   Location1      Today    3.0
2   Location1   Next day    0.0
3   Location1    Weekend   -6.0
4   Location1  Next week    1.0
6   Location2      Today    2.0
7   Location2   Next day   -1.0
8   Location2    Weekend    3.0
9   Location2  Next week    2.0
11  Location3      Today    1.0
12  Location3   Next day    3.0
13  Location3    Weekend    1.0
14  Location3  Next week   -1.0
16  Location4      Today    3.0
17  Location4   Next day    2.0
18  Location4    Weekend    5.0
19  Location4  Next week    4.0
21  Location5      Today    4.0
22  Location5   Next day    2.0
23  Location5    Weekend    3.0
24  Location5  Next week    1.0
26  Location6      Today   -1.0
27  Location6   Next day    3.0
28  Location6    Weekend    3.0
29  Location6  Next week    2.0

另一个想法是通过Series.isin 测试第一列中的值并通过boolean indexing 过滤：

L = ['Today','Next day','Weekend','Next week']
m = df['Time/Location'].isin(L)
df.insert(0, 'Location', df['Time/Location'].mask(m).ffill())
df = df[m].rename(columns={'Time/Location':'Time'})
print (df)
     Location       Time  Value
1   Location1      Today    3.0
2   Location1   Next day    0.0
3   Location1    Weekend   -6.0
4   Location1  Next week    1.0
6   Location2      Today    2.0
7   Location2   Next day   -1.0
8   Location2    Weekend    3.0
9   Location2  Next week    2.0
11  Location3      Today    1.0
12  Location3   Next day    3.0
13  Location3    Weekend    1.0
14  Location3  Next week   -1.0
16  Location4      Today    3.0
17  Location4   Next day    2.0
18  Location4    Weekend    5.0
19  Location4  Next week    4.0
21  Location5      Today    4.0
22  Location5   Next day    2.0
23  Location5    Weekend    3.0
24  Location5  Next week    1.0
26  Location6      Today   -1.0
27  Location6   Next day    3.0
28  Location6    Weekend    3.0
29  Location6  Next week    2.0

【讨论】：