获取 pandas 数据框中前导和尾随 NaN 值的数量答案

【问题标题】：Get the amount of leading and trailing NaN values in pandas dataframe获取 pandas 数据框中前导和尾随 NaN 值的数量
【发布时间】：2022-12-03 01:55:32
【问题描述】：

我有一个数据框，其中的行包含 NaN 值。 df 包含原始专栏即标题 1 和标题 2和额外的列称为未命名：1 和未命名：2如图所示：

Heading 1	Heading 2	Unnamed: 1	Unnamed: 2
NaN	34	24	NaN
NaN	NaN	44	NaN
5	NaN	NaN	NaN
5	7	NaN	NaN
NaN	NaN	13	77
NaN	NaN	NaN	18

我想要遍历每一行并找出原始列（标题 1 和标题 2）中 NaN 值的数量以及额外列中非 NaN 值的数量（未命名：1 和未命名：2）.对于每一行，都应该计算并在字典中返回其中键是行的索引，该键的值是包含原始列（标题 1 和标题 2）中 NaN 值数量的列表，列表的第二个元素是非 NaN 值的数量额外的列（未命名：1 和未命名：2）。

因此，上述数据框的结果将是：

{0 : [1, 1], 
1 : [2, 1], 
2 : [1, 0], 
3 : [0, 0], 
4 : [2, 2], 
5 : [2, 1]}

谢谢！

【问题讨论】：

标签： python pandas dataframe data-cleaning data-preprocessing

【解决方案1】：

要遍历 DataFrame 中的每一行并计算原始列中 NaN 值的数量和额外列中非 NaN 值的数量，您可以执行以下操作：

import pandas as pd

# Define the dataframe
df = pd.DataFrame(
    {
        "Heading 1": [np.nan, np.nan, 5, 5, np.nan, np.nan],
        "Heading 2": [34, np.nan, np.nan, 7, np.nan, np.nan],
        "Unnamed: 1": [24, 44, np.nan, np.nan, 13, np.nan],
        "Unnamed: 2": [np.nan, np.nan, np.nan, np.nan, 77, 18]
    }
)

# Define the original columns and the extra columns
original_cols = ["Heading 1", "Heading 2"]
extra_cols = ["Unnamed: 1", "Unnamed: 2"]

# Create a dictionary to store the counts
counts = {}

# Iterate through each row in the DataFrame
for index, row in df.iterrows():
    # Count the number of NaN values in the original columns
    original_nan_count = sum(row[col].isna() for col in original_cols)
    
    # Count the number of non-NaN values in the extra columns
    extra_non_nan_count = sum(not row[col].isna() for col in extra_cols)
    
    # Add the counts to the dictionary
    counts[index] = [original_nan_count, extra_non_nan_count]

# Print the dictionary of counts
print(counts)

这将遍历 DataFrame 中的每一行，计算原始列中 NaN 值的数量和额外列中非 NaN 值的数量，并将计数存储在字典中，其中键是行索引和值是包含计数的列表。生成的字典将如下所示：

{0: [1, 1],
 1: [2, 1],
 2: [1, 0],
 3: [0, 0],
 4: [2, 2],
 5: [2, 1]}

【讨论】：

.isna() 为我抛出一个错误。我写了original_nan_count = np.sum(np.isnan(row[['Heading 1', 'Heading 2']]))，它对我有用。

【解决方案2】：

作为备选：

df['Count'] = df[['Heading 1', 'Heading 2']].apply(lambda x: sum(x.isnull()), axis=1)
df['Count2'] = df[['Unnamed: 1', 'Unnamed: 2']].apply(lambda x: sum(x.notnull()), axis=1)
df['total']=df[['Count','Count2']].values.tolist()

output=dict(zip(df.index, df.total))
'''
{0: [1, 1], 1: [2, 1], 2: [1, 0], 3: [0, 0], 4: [2, 2], 5: [2, 1]}
'''

【讨论】：

Heading 1	Heading 2	Unnamed: 1	Unnamed: 2
NaN	34	24	NaN
NaN	NaN	44	NaN
5	NaN	NaN	NaN
5	7	NaN	NaN
NaN	NaN	13	77
NaN	NaN	NaN	18

Heading 1	Heading 2	Unnamed: 1	Unnamed: 2
NaN	34	24	NaN
NaN	NaN	44	NaN
5	NaN	NaN	NaN
5	7	NaN	NaN
NaN	NaN	13	77
NaN	NaN	NaN	18

Heading 1	Heading 2	Unnamed: 1	Unnamed: 2
NaN	34	24	NaN
NaN	NaN	44	NaN
5	NaN	NaN	NaN
5	7	NaN	NaN
NaN	NaN	13	77
NaN	NaN	NaN	18