1值多条件Python Pandas中Excel IFS/Search函数的等价物答案

【问题标题】：Equivalent of Excel IFS/Search functions in Python Pandas with 1 value and multiple conditions1值多条件Python Pandas中Excel IFS/Search函数的等价物
【发布时间】：2022-01-25 16:38:58
【问题描述】：

以下是我正在使用的 Excel 工作表的示例：

Product_Type	Part_Number	Description
Rod	R-SS-015
Rod	R-SS-030
Rod	R-SS-045
Rod	R-SS-060
Rod	R-SS-075
Rod	R-SS-090
Nut	N-150	Stainless Steel 1" Nut
Nut	N-151	Stainless Steel 2" Nut
Nut	N-152	Stainless Steel 3" Nut
Washer	W-101	Stainless Steel 1" Washer
Washer	W-102	Stainless Steel 2" Washer
Washer	W-103	Stainless Steel 3" Washer

我要做的是从 Part_Number 中获取“杆”的尺寸，从描述中获取“螺母/垫圈”的尺寸。在 Excel 中，我会使用 IFS、搜索和左/中/右函数的组合。

我能够单独找到产品类型的尺寸，但无法连接到产品类型。

如：

"如果 Product_Type 等于 Rod，则搜索 Part_Number 并输出到 Size"

或

"如果 Product_Type 等于 Nut 或 Washer，则搜索 Description 并输出到 Size"

对于 Rod，我使用以下代码：

conditions = [
df.loc[df['Part_Number'].str.contains('-015'), 'Size'] == '1"',
df.loc[df['Part_Number'].str.contains('-030'), 'Size'] == '2"',
df.loc[df['Part_Number'].str.contains('-045'), 'Size'] == '3"',
df.loc[df['Part_Number'].str.contains('-060'), 'Size'] == '4"',
df.loc[df['Part_Number'].str.contains('-075'), 'Size'] == '5"',
df.loc[df['Part_Number'].str.contains('-090'), 'Size'] == '6"',
]
df['Size'] = np.select(df[Product_Type] == "Rod", conditions, '')

对于螺母/垫圈，我使用以下代码：

df_2 = df['Descritpion'].str.extract(r'(?:\s([\d+]?\.?[\d+])")', expand=False)
df['Size'] = np.where(df['Product_Type'] == "Nut" or df['Product_Type'] == "Washer", df_2, '')

对于这两个代码，我都收到以下错误：

ValueError：案例列表必须与条件列表长度相同

这是因为只有一个值（Rod）和多个条件（大小）正确吗？

如何将 str.extract 和 str.contains 结合起来？

【问题讨论】：

标签： python excel pandas numpy

【解决方案1】：

这里有三种方法会产生相同的结果。

方法 #1：np.select()

这将对您的 np.select() 示例稍作更改。要做的关键是确保您的条件和替换都是相同长度的列表。

conditions = [
    (df['Product_Type'] == 'Rod') & (df['Part_Number'].str.contains('-015')),
    (df['Product_Type'] == 'Rod') & (df['Part_Number'].str.contains('-030')),
    (df['Product_Type'] == 'Rod') & (df['Part_Number'].str.contains('-045')),
    (df['Product_Type'] == 'Rod') & (df['Part_Number'].str.contains('-060')),
    (df['Product_Type'] == 'Rod') & (df['Part_Number'].str.contains('-075')),
    (df['Product_Type'] == 'Rod') & (df['Part_Number'].str.contains('-090')),
    df['Product_Type'] == 'Nut',
    df['Product_Type'] == 'Washer'
]

replacements = [
    '1',
    '2',
    '3',
    '4',
    '5',
    '6',
    df['Description'].str.extract(r'\s(\d+)"', expand=False),
    df['Description'].str.extract(r'\s(\d+)"', expand=False),
]

df['Size'] = np.select(conditions, replacements, default=None).astype(float)

方法#2：apply()

作为一种更短且更灵活的替代方案，还有.apply(..., axis=1) 可以将函数应用于每一行。

这里我用if语句来判断是哪一种产品，然后用字典把零件号结尾转换成你想要的尺寸。

def get_size(row):
    if row['Product_Type'] == 'Rod':
        rod_size = re.search(r'-(\d+)$', row['Part_Number']).group(1)
        rod_convert = {
            '015': 1,
            '030': 2,
            '045': 3,
            '060': 4,
            '075': 5,
            '090': 6,
        }
        return rod_convert[rod_size]
    elif row['Product_Type'] == 'Nut' or row['Product_Type'] == 'Washer':
        return re.search(r'(\d+)"', row['Description']).group(1)
    else:
        return None

df['Size'] = df.apply(get_size, axis=1).astype(float)

但是，这种灵活性是以速度为代价的。

方法#3：groupby()

这与之前的方法类似，但是之前的解决方案在数据帧的每一行上运行一个函数，这种方法将数据帧分成三个部分，并在每个数据帧上运行函数。

拆分成三个数据帧后，我检查这是哪个拆分（使用df.name）并使用正则表达式提取大小。在 Rods 的情况下，我使用map() 将零件编号替换为您想要的尺寸。

def get_size(df):
    if df.name == 'Rod':
        size = df['Part_Number'].str.extract(r'-(\d+)$', expand=False)
        df['Size'] = size.map({
            '015': 1,
            '030': 2,
            '045': 3,
            '060': 4,
            '075': 5,
            '090': 6,
        })
        return df
    elif df.name == 'Nut' or df.name == 'Washer':
        df['Size'] = df['Description'].str.extract(r'(\d+)"', expand=False).astype(float)
        return df

df = df.groupby('Product_Type').apply(get_size)

结果

所有三个产生相同的结果，即：

   Product_Type Part_Number                Description  Size
0           Rod    R-SS-015                        NaN   1.0
1           Rod    R-SS-030                        NaN   2.0
2           Rod    R-SS-045                        NaN   3.0
3           Rod    R-SS-060                        NaN   4.0
4           Rod    R-SS-075                        NaN   5.0
5           Rod    R-SS-090                        NaN   6.0
6           Nut       N-150     Stainless Steel 1" Nut   1.0
7           Nut       N-151     Stainless Steel 2" Nut   2.0
8           Nut       N-152     Stainless Steel 3" Nut   3.0
9        Washer       W-101  Stainless Steel 1" Washer   1.0
10       Washer       W-102  Stainless Steel 2" Washer   2.0
11       Washer       W-103  Stainless Steel 3" Washer   3.0

【讨论】：

这可行，但我在原帖中犯了一个错误。 “Rods”的零件号与尺寸不对应（即-001 不是 1”）。所以正则表达式对此不起作用。有没有办法使用 for 循环或 while 循环。我' d 编辑原始帖子以解释我的意思。