【问题标题】:I am getting NaN when I subtract two pandas dataframe columns当我减去两个熊猫数据框列时,我得到 NaN
【发布时间】:2019-03-28 11:29:49
【问题描述】:

我有一个包含几列的数据框,我想获取包含时间的两列之间的时间差。首先,我使用 pd.to_datetime 将两列转换为 DateTime 对象,但是当我减去两列并将结果分配给新列时,最终得到的是 NaN 值。

ops_data_clean_1.loc['Package committed-time'] = 
pd.to_datetime(ops_data_clean_1['Package committed-time'])
ops_data_clean_1.loc['Flight launched-time'] = 
pd.to_datetime(ops_data_clean_1['Flight launched-time'])
ops_data_clean_1['time_to_launch'] = ops_data_clean_1.loc['Flight 
launched-time'] - ops_data_clean_1.loc['Package committed-time']
ops_data_clean_1.head()

【问题讨论】:

  • 你能把你正在使用的代码吗?
  • 提供样本数据和预期输出。
  • @Noki 刚刚...

标签: pandas datetime dataframe


【解决方案1】:

我认为您的问题是使用 loc 当您仅访问数据框中的一列时。您只需从代码中删除loc 即可消除此问题。

请看下面的玩具示例,

ops_data_clean_1 = pd.DataFrame()

ops_data_clean_1['Package committed-time'] = ['2018-01-01 00:00:30', '2018-01-01 00:49:00', '2018-03-01 00:00:45']
ops_data_clean_1['Flight launched-time'] = ['2018-01-01 01:00:30', '2018-01-01 02:49:00', '2018-03-01 00:54:45']

ops_data_clean_1['Package committed-time'] = pd.to_datetime(ops_data_clean_1['Package committed-time'])
ops_data_clean_1['Flight launched-time'] = pd.to_datetime(ops_data_clean_1['Flight launched-time'])

ops_data_clean_1['time_to_launch'] = ops_data_clean_1['Flight launched-time'] - ops_data_clean_1['Package committed-time']

ops_data_clean_1.head()

# Output

Package committed-time  Flight launched-time    time_to_launch
0   2018-01-01 00:00:30 2018-01-01 01:00:30 01:00:00
1   2018-01-01 00:49:00 2018-01-01 02:49:00 02:00:00
2   2018-03-01 00:00:45 2018-03-01 00:54:45 00:54:00

如果您想使用loc,您必须使用: 选择数据框的所有行,例如ops_data_clean_1.loc[:, 'Flight launched-time']

那么代码就变成了,

ops_data_clean_1 = pd.DataFrame()

ops_data_clean_1['Package committed-time'] = ['2018-01-01 00:00:30', '2018-01-01 00:49:00', '2018-03-01 00:00:45']
ops_data_clean_1['Flight launched-time'] = ['2018-01-01 01:00:30', '2018-01-01 02:49:00', '2018-03-01 00:54:45']

ops_data_clean_1.loc[:, 'Package committed-time'] = pd.to_datetime(ops_data_clean_1['Package committed-time'])
ops_data_clean_1.loc[:, 'Flight launched-time'] = pd.to_datetime(ops_data_clean_1['Flight launched-time'])

ops_data_clean_1['time_to_launch'] = ops_data_clean_1.loc[:, 'Flight launched-time'] - ops_data_clean_1.loc[:, 'Package committed-time']

ops_data_clean_1.head()

# Output

    Package committed-time  Flight launched-time    time_to_launch
0   2018-01-01 00:00:30 2018-01-01 01:00:30 01:00:00
1   2018-01-01 00:49:00 2018-01-01 02:49:00 02:00:00
2   2018-03-01 00:00:45 2018-03-01 00:54:45 00:54:00

【讨论】:

  • 很高兴听到它有帮助
【解决方案2】:

我认为你的问题在于你使用的 .loc 函数。

.loc['Package committed-time'] 基本上是说,选择 ROW,其值为 'Package committed-time',它们都没有。

但您想选择具有该名称的列。使用简单的 ops_data_clean_1['Package committed-time'] 访问列或 ops_data_clean_1.loc[:,'Package committed-time']

更多关于 .loc 的信息在这里:enter link description here

【讨论】:

    猜你喜欢
    • 2017-03-31
    • 2013-12-10
    • 2018-07-16
    • 2019-08-07
    • 2020-05-16
    • 1970-01-01
    • 1970-01-01
    • 2013-12-04
    相关资源
    最近更新 更多