【问题标题】:Calculate by how much a row has shifted horizontally in pandas dataframeCalculate by how much a row has shifted horizontally in pandas dataframe
【发布时间】:2022-12-01 19:52:43
【问题描述】:

I have a dataframe where therows have been shifted horizontallyby an unknown amount.Each and every row has shifted by a different amountas shown below:

Heading 1 Heading 2 Unnamed: 1 Unnamed: 2
NaN 34 24 NaN
22 42 NaN NaN
NaN NaN 13 77
NaN NaN NaN 18

In the above dataframe, there are only2 original columns(Heading 1andHeading 2) but due to row shift (in rows1and3),extra columns(Unnamed: 1 and Unnamed: 2) have been created with the default nameUnnamed: 1andUnnamed: 2.

Nowfor each row, I want tocalculate:

1.) Thespill over. Spill over is basically the amount of NaN values in extra columns(Unnamedcolumns). For example inrow 1there isone non NaNvalue in extra columns (Unnamed: 1) and hence thespill over is 1. Inrow 2there areno non NaNvalues in extra columns so thespill over is 0. Inrow 3there are2 non NaNvalues in extra columns(Unnamed: 1 and Unnamed: 2) hence thespill over is 2and inrow 4there are1 non NaNvalues in extra columns so thespill over is 1.

2.) Theamount of NaN values in the original columns(Heading 1andHeading 2). For example inrow 1amount ofNan values in original columns are 1, inrow 2amount ofNaN values in original columns is 0, inrow 3amount ofNaN values in original columns is 2and inrow 4amount ofNaN values in original columns is 2.

So basically for each row, I have tocalculate the amount of Nan values inoriginalcolumns(Heading 1andHeading 2) and the amount of non NaN values inextracolumns(Unnamed: 1 and Unnamed: 2).

I can get the amount of extra columns (Unnamed:1 and so on) present in a dataframe by:

len(df.filter(regex=("Unnamed:.*")).columns.to_list())

Thank you!

【问题讨论】:

    标签: python pandas dataframe data-cleaning data-preprocessing


    【解决方案1】:

    You can use isna and cummin to identify the leading NAs, then sum to count them and clip to limit the shift to the original number of columns:

    df.isna().cummin(axis=1).sum(axis=1).clip(upper=2)
    

    Output:

    0    1
    1    0
    2    2
    3    2
    dtype: int64
    

    Intermediates:

    df.isna()
    
       Heading 1  Heading 2  Unnamed: 1  Unnamed: 2
    0       True      False       False        True
    1      False      False        True        True
    2       True       True       False       False
    3       True       True        True       False
    
    df.isna().cummin(axis=1)
    
       Heading 1  Heading 2  Unnamed: 1  Unnamed: 2
    0       True      False       False       False
    1      False      False       False       False
    2       True       True       False       False
    3       True       True        True       False
    
    df.isna().cummin(axis=1).sum(axis=1)
    
    0    1
    1    0
    2    2
    3    3
    dtype: int64
    

    【讨论】:

      猜你喜欢
      • 2022-11-09
      • 1970-01-01
      • 2022-12-02
      • 2022-12-28
      • 2022-12-28
      • 2022-12-26
      • 2022-12-01
      • 2022-12-02
      • 2022-12-19
      相关资源
      最近更新 更多