【问题标题】:Get the first and last value of a column of dataframe respect another column获取数据框列的第一个和最后一个值尊重另一列
【发布时间】:2021-06-18 20:04:47
【问题描述】:

我是python的初学者,我想得到列日期的第一个和最后一个值,总是mac_address相同,例如:

我已经通过 mac_address 订购了我的数据框,日期与下一行:

df = df.sort_values(by=['mac_address', 'date'], ascending=(True, True)) 

而数据是:

         router        mac_address      date
589455  15001391    00:00:34:1a:03:e8   2021-01-01 22:09:34
590067  17091211    00:00:34:1a:03:e8   2021-01-01 22:10:54
590136  17091236    00:00:34:1a:03:e8   2021-01-01 22:11:04
.....
.....
.....
635434  15001391    00:00:78:01:0d:11   2021-01-02 00:14:54
636479  17091211    00:00:78:01:0d:11   2021-01-02 00:16:17
949873  17091172    00:00:af:82:56:93   2021-01-02 11:26:39
950699  17091251    00:00:af:82:56:93   2021-01-02 11:27:59
950700  17091253    00:00:af:82:56:93   2021-01-02 11:28:59
950702  17091257    00:00:af:82:56:93   2021-01-02 11:29:59
950703  17091258    00:00:af:82:56:93   2021-01-02 11:30:59
619384  17091174    00:01:09:d2:09:e0   2021-01-01 23:34:32
365351  17091211    00:01:d2:7c:4e:32   2021-01-01 14:27:58
109858  17091236    00:02:75:86:4e:34   2021-01-01 05:50:47
110281  17091211    00:02:75:86:4e:34   2021-01-01 05:50:54

注意:日期列的格式为“2021-01-01 05:50:54”,出现的不同mac地址的次数是可变的

我想要两个这样的输出:

第一个输出:

    589455  15001391    00:00:34:1a:03:e8   2021-01-01 22:09:34
    590136  17091236    00:00:34:1a:03:e8   2021-01-01 22:11:04
    635434  15001391    00:00:78:01:0d:11   2021-01-02 00:14:54
    636479  17091211    00:00:78:01:0d:11   2021-01-02 00:16:17
    .....
    .....
    949873  17091172    00:00:af:82:56:93   2021-01-02 11:26:39
    950703  17091258    00:00:af:82:56:93   2021-01-02 11:30:59
    619384  17091174    00:01:09:d2:09:e0   2021-01-01 23:34:32
    365351  17091211    00:01:d2:7c:4e:32   2021-01-01 14:27:58

第二个输出:只考虑有第一个和最后一个值的数据,不考虑只出现一次的mac_adress

    589455  15001391    00:00:34:1a:03:e8   22:09:34
    590136  17091236    00:00:34:1a:03:e8   22:11:04
    635434  15001391    00:00:78:01:0d:11   00:14:54
    636479  17091211    00:00:78:01:0d:11   00:16:17
    .....
    .....
    949873  17091172    00:00:af:82:56:93   11:26:39
    950703  17091258    00:00:af:82:56:93   11:30:59

我不知道是我复杂了还是这项任务比我看到的要容易,但我在过去的 48 小时内没有任何有利的结果。你能帮我吗?非常感谢

【问题讨论】:

    标签: python dataframe format multiple-columns unique-values


    【解决方案1】:

    由于你的数据已经按照mac地址和日期排序,你不需要使用groupby

    df1 = df.loc[(df['mac_address'].ne(df['mac_address'].shift())) | 
                 (df['mac_address'].ne(df['mac_address'].shift(-1)))]
    

    第一个输出:

    >>> df1
              router        mac_address                 date
    589455  15001391  00:00:34:1a:03:e8  2021-01-01 22:09:34
    590136  17091236  00:00:34:1a:03:e8  2021-01-01 22:11:04
    635434  15001391  00:00:78:01:0d:11  2021-01-02 00:14:54
    636479  17091211  00:00:78:01:0d:11  2021-01-02 00:16:17
    949873  17091172  00:00:af:82:56:93  2021-01-02 11:26:39
    950703  17091258  00:00:af:82:56:93  2021-01-02 11:30:59
    619384  17091174  00:01:09:d2:09:e0  2021-01-01 23:34:32
    365351  17091211  00:01:d2:7c:4e:32  2021-01-01 14:27:58
    109858  17091236  00:02:75:86:4e:34  2021-01-01 05:50:47
    110281  17091211  00:02:75:86:4e:34  2021-01-01 05:50:54
    

    第二次输出:

    >>> df1.loc[df1.duplicated('mac_address', keep=False)]
              router        mac_address                 date
    589455  15001391  00:00:34:1a:03:e8  2021-01-01 22:09:34
    590136  17091236  00:00:34:1a:03:e8  2021-01-01 22:11:04
    635434  15001391  00:00:78:01:0d:11  2021-01-02 00:14:54
    636479  17091211  00:00:78:01:0d:11  2021-01-02 00:16:17
    949873  17091172  00:00:af:82:56:93  2021-01-02 11:26:39
    950703  17091258  00:00:af:82:56:93  2021-01-02 11:30:59
    109858  17091236  00:02:75:86:4e:34  2021-01-01 05:50:47
    110281  17091211  00:02:75:86:4e:34  2021-01-01 05:50:54
    

    【讨论】:

      【解决方案2】:

      对于第一个输出,您可以在 mac_address 上.groupby,然后保留“first”、“last”:

      x = (
          df.groupby("mac_address")
          .agg(["first", "last"])
          .stack()
          .reset_index()
          .drop(columns="level_1")
      )
      
      print(x.drop_duplicates(keep="first"))
      

      打印:

                mac_address    router                date
      0   00:00:34:1a:03:e8  15001391 2021-01-01 22:09:34
      1   00:00:34:1a:03:e8  17091236 2021-01-01 22:11:04
      2   00:00:78:01:0d:11  15001391 2021-01-02 00:14:54
      3   00:00:78:01:0d:11  17091211 2021-01-02 00:16:17
      4   00:00:af:82:56:93  17091172 2021-01-02 11:26:39
      5   00:00:af:82:56:93  17091258 2021-01-02 11:30:59
      6   00:01:09:d2:09:e0  17091174 2021-01-01 23:34:32
      8   00:01:d2:7c:4e:32  17091211 2021-01-01 14:27:58
      10  00:02:75:86:4e:34  17091236 2021-01-01 05:50:47
      11  00:02:75:86:4e:34  17091211 2021-01-01 05:50:54
      

      对于第二个输出,只需删除所有重复项:

      print(x.drop_duplicates(keep=False))
      

      打印:

                mac_address    router                date
      0   00:00:34:1a:03:e8  15001391 2021-01-01 22:09:34
      1   00:00:34:1a:03:e8  17091236 2021-01-01 22:11:04
      2   00:00:78:01:0d:11  15001391 2021-01-02 00:14:54
      3   00:00:78:01:0d:11  17091211 2021-01-02 00:16:17
      4   00:00:af:82:56:93  17091172 2021-01-02 11:26:39
      5   00:00:af:82:56:93  17091258 2021-01-02 11:30:59
      10  00:02:75:86:4e:34  17091236 2021-01-01 05:50:47
      11  00:02:75:86:4e:34  17091211 2021-01-01 05:50:54
      

      【讨论】:

        猜你喜欢
        • 2021-09-05
        • 1970-01-01
        • 1970-01-01
        • 2021-11-08
        • 2018-08-12
        • 2022-01-24
        • 2023-03-11
        • 2012-02-02
        • 2014-02-27
        相关资源
        最近更新 更多