【问题标题】:Pandas Remove Elements From DatetimeIndex per Dates in Other DF ColumnPandas 从其他 DF 列中的每个日期的 DatetimeIndex 中删除元素
【发布时间】:2016-10-05 19:53:42
【问题描述】:

给定以下数据框:

import pandas as pd
df=pd.DataFrame({'A':['a','b','c'],
        'first_date':['2015-08-31 00:00:00','2015-08-24 00:00:00','2015-08-25 00:00:00']})
df.first_date=pd.to_datetime(df.first_date) #(dtype='<M8[ns]')
df['last_date']=pd.to_datetime('5/6/2016') #(dtype='datetime64[ns]')
def fnl(x):
    l = pd.date_range(x.loc['first_date'], x.loc['last_date'], freq='B')
    return pd.Series([l])

df['range'] = df.apply(fnl, axis=1)
df

    A   first_date  last_date   range
0   a   2015-08-31  2016-05-06  DatetimeIndex(['2015-08-31', '2015-09-01', '20...
1   b   2015-08-24  2016-05-06  DatetimeIndex(['2015-08-24', '2015-08-25', '20...
2   c   2015-08-25  2016-05-06  DatetimeIndex(['2015-08-25', '2015-08-26', '20...

对于落入其相应范围的每个日期(即如果 exc['A'] 中的日期超出了 df['A'] 中的对应范围,显然不能排除。

exc=pd.DataFrame({'A':['a','a','b','b','c','c'],
                'Exclusions':['2014-12-30 00:00:00','2015-08-31 00:00:00',\
                              '2015-08-25 00:00:00','2015-10-20 00:00:00',\
                             '2015-08-26 00:00:00','2016-10-05 00:00:00']
                 })
exc

    A   Exclusions
0   a   2014-12-30 00:00:00
1   a   2015-08-31 00:00:00
2   b   2015-08-25 00:00:00
3   b   2015-10-20 00:00:00
4   c   2015-08-26 00:00:00
5   c   2016-10-05 00:00:00

想要的结果:

    A   first_date  last_date   range
0   a   2015-08-31  2016-05-06  DatetimeIndex(['2015-09-01', '2015-09-02', '20...
1   b   2015-08-24  2016-05-06  DatetimeIndex(['2015-08-24', '2015-08-26', '20...
2   c   2015-08-25  2016-05-06  DatetimeIndex(['2015-08-25', '2015-08-27', '20...

提前致谢!

【问题讨论】:

    标签: python-3.x pandas indexing dataframe date-range


    【解决方案1】:

    我认为您可以先由concat 创建新列range,然后由melt 重塑。然后merge 并通过boolean indexing 过滤df._merge == 'left_only':

    import pandas as pd
    df=pd.DataFrame({'A':['a','b','c'],
            'first_date':['2015-08-31 00:00:00','2015-08-24 00:00:00','2015-08-25 00:00:00']})
    df.first_date=pd.to_datetime(df.first_date) #(dtype='<M8[ns]')
    df['last_date']=pd.to_datetime('5/6/2016') #(dtype='datetime64[ns]')
    def fnl(x):
        l = pd.date_range(x.loc['first_date'], x.loc['last_date'], freq='B')
        return pd.Series(l)
    
    df1 = df.apply(fnl, axis=1)
    print (df1)
             0          1          2          3          4          5    \
    0 2015-08-31 2015-09-01 2015-09-02 2015-09-03 2015-09-04 2015-09-07   
    1 2015-08-24 2015-08-25 2015-08-26 2015-08-27 2015-08-28 2015-08-31   
    2 2015-08-25 2015-08-26 2015-08-27 2015-08-28 2015-08-31 2015-09-01   
    
             6          7          8          9      ...            175  \
    0 2015-09-08 2015-09-09 2015-09-10 2015-09-11    ...     2016-05-02   
    1 2015-09-01 2015-09-02 2015-09-03 2015-09-04    ...     2016-04-25   
    2 2015-09-02 2015-09-03 2015-09-04 2015-09-07    ...     2016-04-26   
    
             176        177        178        179        180        181  \
    0 2016-05-03 2016-05-04 2016-05-05 2016-05-06        NaT        NaT   
    1 2016-04-26 2016-04-27 2016-04-28 2016-04-29 2016-05-02 2016-05-03   
    2 2016-04-27 2016-04-28 2016-04-29 2016-05-02 2016-05-03 2016-05-04   
    
             182        183        184  
    0        NaT        NaT        NaT  
    1 2016-05-04 2016-05-05 2016-05-06  
    2 2016-05-05 2016-05-06        NaT  
    
    [3 rows x 185 columns]
    
    df = pd.concat([df,df1], axis=1)
    df = pd.melt(df, id_vars=['A','first_date','last_date'], value_name='range')
    df = df.dropna(subset=['range'])
    print (df)
         A first_date  last_date variable      range
    0    a 2015-08-31 2016-05-06        0 2015-08-31
    1    b 2015-08-24 2016-05-06        0 2015-08-24
    2    c 2015-08-25 2016-05-06        0 2015-08-25
    3    a 2015-08-31 2016-05-06        1 2015-09-01
    4    b 2015-08-24 2016-05-06        1 2015-08-25
    5    c 2015-08-25 2016-05-06        1 2015-08-26
    6    a 2015-08-31 2016-05-06        2 2015-09-02
    7    b 2015-08-24 2016-05-06        2 2015-08-26
    8    c 2015-08-25 2016-05-06        2 2015-08-27
    9    a 2015-08-31 2016-05-06        3 2015-09-03
    10   b 2015-08-24 2016-05-06        3 2015-08-27
    11   c 2015-08-25 2016-05-06        3 2015-08-28
    12   a 2015-08-31 2016-05-06        4 2015-09-04
    13   b 2015-08-24 2016-05-06        4 2015-08-28
    14   c 2015-08-25 2016-05-06        4 2015-08-31
    15   a 2015-08-31 2016-05-06        5 2015-09-07
    16   b 2015-08-24 2016-05-06        5 2015-08-31
    ...
    ...
    
    exc=pd.DataFrame({'A':['a','a','b','b','c','c'],
                    'Exclusions':['2014-12-30 00:00:00','2015-08-31 00:00:00',\
                                  '2015-08-25 00:00:00','2015-10-20 00:00:00',\
                                 '2015-08-26 00:00:00','2016-10-05 00:00:00']
                     })
    #print (exc)
    
    exc['Exclusions'] = pd.to_datetime(exc['Exclusions'])
    
    df = (pd.merge(df, exc, left_on=['A', 'range'],
                    right_on=['A','Exclusions'], 
                    indicator=True, 
                    how='left'))
    
    
    df = df[df._merge == 'left_only'] 
    df = df.drop(['Exclusions','_merge'], axis=1)               
    print (df)                
         A first_date  last_date variable      range
    1    b 2015-08-24 2016-05-06        0 2015-08-24
    2    c 2015-08-25 2016-05-06        0 2015-08-25
    3    a 2015-08-31 2016-05-06        1 2015-09-01
    6    a 2015-08-31 2016-05-06        2 2015-09-02
    7    b 2015-08-24 2016-05-06        2 2015-08-26
    8    c 2015-08-25 2016-05-06        2 2015-08-27
    9    a 2015-08-31 2016-05-06        3 2015-09-03
    10   b 2015-08-24 2016-05-06        3 2015-08-27
    11   c 2015-08-25 2016-05-06        3 2015-08-28
    12   a 2015-08-31 2016-05-06        4 2015-09-04
    13   b 2015-08-24 2016-05-06        4 2015-08-28
    ...
    ...
    

    【讨论】:

    • 由于某种原因,这适用于我的示例,发布的数据,但对于我的真实数据,当它到达 df = df[df._merge == 'left_only'] - AttributeError: 'DataFrame' 对象没有属性 '_merge'
    • 在这个有问题的行上方返回print df.head() 是什么?
    • 我在变量名中使用了小写的 'l' 而不是数字 1,而且我笔记本中的字体很难区分两者。
    • phuuuu,我觉得发现这种错误是很成问题的。超级,你找到了。美好的一天。
    • 真是一种解脱。感谢您的努力!祝您有美好的一天。
    猜你喜欢
    • 2019-03-07
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-03-17
    • 2018-10-15
    • 1970-01-01
    • 1970-01-01
    • 2022-11-26
    相关资源
    最近更新 更多