【问题标题】:python sort a list of strings based on substrings using pandaspython使用pandas根据子字符串对字符串列表进行排序
【发布时间】:2021-10-13 13:28:51
【问题描述】:

我有一个 4 列的 Excel 表,文件名、SNR、动态范围、级别。

Filename SNR Dynamic Range Level
1___SLATE_FPGA_BESBEV_TX_AMIC_9.6MHz_Normal_IN1_G0_0_HQ_DEC0_FS8_HPOF.xlsx 5 11 8
19___SLATE_FPGA_BESBEV_TX_AMIC_9.6MHz_Normal_IN1_G0_0_HQ_DEC0_FS32_HPOF.xlsx 15 31 23
10___SLATE_FPGA_BESBEV_TX_AMIC_9.6MHz_Normal_IN1_G0_0_HQ_DEC0_FS16_HPOF.xlsx 10 21 24
28___SLATE_FPGA_BESBEV_TX_AMIC_9.6MHz_Normal_IN1_G0_0_HQ_DEC0_FS48_HPOF.xlsx 20 41 23
37___SLATE_FPGA_BESBEV_TX_AMIC_9.6MHz_Normal_IN1_G0_0_HQ_DEC0_FS8_HP4.xlsx 25 51 12

我需要重新组织表格的第一列 Xls 文件名,以使粗体部分按从小到大的顺序排列。 即

Filename SNR Dynamic Range Level
1___SLATE_FPGA_BESBEV_TX_AMIC_9.6MHz_Normal_IN1_G0_0_HQ_DEC0_FS8_HPOF.xlsx 5 11 8
37___SLATE_FPGA_BESBEV_TX_AMIC_9.6MHz_Normal_IN1_G0_0_HQ_DEC0_FS8_HP4.xlsx 25 51 12
10___SLATE_FPGA_BESBEV_TX_AMIC_9.6MHz_Normal_IN1_G0_0_HQ_DEC0_FS16_HPOF.xlsx 10 21 24
19___SLATE_FPGA_BESBEV_TX_AMIC_9.6MHz_Normal_IN1_G0_0_HQ_DEC0_FS32_HPOF.xlsx 15 31 23
28___SLATE_FPGA_BESBEV_TX_AMIC_9.6MHz_Normal_IN1_G0_0_HQ_DEC0_FS48_HPOF.xlsx 20 41 23

我不想更改实际的 excel 文件。我希望使用 pandas,因为我稍后会做一些其他操作。

我试过了

df.sort_values(by='Xls Filename', key=lambda col: col.str.contains('_FS'),ascending=True)

但是没有用。

提前谢谢你!

【问题讨论】:

    标签: python excel pandas numpy


    【解决方案1】:

    提取模式,使用argsort找到排序索引,然后使用排序索引进行排序:

    # extract the number to sort by into a Series
    fs = df.Filename.str.extract('FS(\d+)_\w+\.xlsx$', expand=False)
    
    # find the sort index using `argsort` and reorder data frame with the sort index
    df.loc[fs.astype(int).argsort()]
    
    #                                                                       Filename  ...  Level
    #0    1___SLATE_FPGA_BESBEV_TX_AMIC_9.6MHz_Normal_IN1_G0_0_HQ_DEC0_FS8_HPOF.xlsx  ...      8
    #4    37___SLATE_FPGA_BESBEV_TX_AMIC_9.6MHz_Normal_IN1_G0_0_HQ_DEC0_FS8_HP4.xlsx  ...     12
    #2  10___SLATE_FPGA_BESBEV_TX_AMIC_9.6MHz_Normal_IN1_G0_0_HQ_DEC0_FS16_HPOF.xlsx  ...     24
    #1  19___SLATE_FPGA_BESBEV_TX_AMIC_9.6MHz_Normal_IN1_G0_0_HQ_DEC0_FS32_HPOF.xlsx  ...     23
    #3  28___SLATE_FPGA_BESBEV_TX_AMIC_9.6MHz_Normal_IN1_G0_0_HQ_DEC0_FS48_HPOF.xlsx  ...     23
    

    正则表达式FS(\d+)_\w+\.xlsx$ 将捕获紧跟FS_\w+\.xlsx 之前的数字。


    如果您可能有不匹配的模式,请转换为 float 而不是 int,因为可能有nans:

    df.loc[fs.astype(float).values.argsort()]
    

    【讨论】:

    • 太棒了。很高兴它有帮助!
    猜你喜欢
    • 2012-11-26
    • 2015-09-11
    • 2015-08-13
    • 2020-03-07
    • 1970-01-01
    • 1970-01-01
    • 2021-11-09
    • 2019-07-13
    • 2021-09-26
    相关资源
    最近更新 更多