【发布时间】:2022-12-01 19:52:43
【问题描述】:
I have a dataframe where therows have been shifted horizontallyby an unknown amount.Each and every row has shifted by a different amountas shown below:
| Heading 1 | Heading 2 | Unnamed: 1 | Unnamed: 2 |
|---|---|---|---|
| NaN | 34 | 24 | NaN |
| 22 | 42 | NaN | NaN |
| NaN | NaN | 13 | 77 |
| NaN | NaN | NaN | 18 |
In the above dataframe, there are only2 original columns(Heading 1andHeading 2) but due to row shift (in rows1and3),extra columns(Unnamed: 1 and Unnamed: 2) have been created with the default nameUnnamed: 1andUnnamed: 2.
Nowfor each row, I want tocalculate:
1.) Thespill over. Spill over is basically the amount of NaN values in extra columns(Unnamedcolumns). For example inrow 1there isone non NaNvalue in extra columns (Unnamed: 1) and hence thespill over is 1. Inrow 2there areno non NaNvalues in extra columns so thespill over is 0. Inrow 3there are2 non NaNvalues in extra columns(Unnamed: 1 and Unnamed: 2) hence thespill over is 2and inrow 4there are1 non NaNvalues in extra columns so thespill over is 1.
2.) Theamount of NaN values in the original columns(Heading 1andHeading 2). For example inrow 1amount ofNan values in original columns are 1, inrow 2amount ofNaN values in original columns is 0, inrow 3amount ofNaN values in original columns is 2and inrow 4amount ofNaN values in original columns is 2.
So basically for each row, I have tocalculate the amount of Nan values inoriginalcolumns(Heading 1andHeading 2) and the amount of non NaN values inextracolumns(Unnamed: 1 and Unnamed: 2).
I can get the amount of extra columns (Unnamed:1 and so on) present in a dataframe by:
len(df.filter(regex=("Unnamed:.*")).columns.to_list())
Thank you!
【问题讨论】:
标签: python pandas dataframe data-cleaning data-preprocessing