如何根据熊猫中其他列的条件创建新列答案

【问题标题】：How to create a new column based on conditions on other column in pandas如何根据熊猫中其他列的条件创建新列
【发布时间】：2021-12-24 21:54:22
【问题描述】：

如果 B 列中有单词“US”，我想从 C 列中获取前五位数字，否则从 C 列中获取所有数字。所需的输出在“OUTPUT”列中。

A          B          C         OUTPUT
hell       US         12234455  12234
mell       UK         12345666  12345666
shall      US         21248075  21248
pel      SPAIN        90056784  90056784
wel        SP         35455689  35455689
shel       US         12095678  12095

我正在使用以下代码，但它不起作用。请帮助。Dataset name=sf1

if sf1.B.all=="US":
   sf1['OUTPUT'] = sf1['C'].astype(str).str[:5]
else:
   sf1['OUTPUT'] = sf1['C'].astype(str)

【问题讨论】：

标签： python pandas string indexing concatenation

【解决方案1】：

你可以使用numpy where:

import numpy as np

sf1['OUTPUT'] = np.where(sf1['B'].eq('US'), sf1['C'].astype(str).str[:5], sf1['C'])

或pandas where:

sf1['OUTPUT'] = sf1['C'].astype(str).str[:5].where(sf1['B'].eq('US'), sf1['C'])

或pandas loc:

sf1.loc[sf1['B'].eq('US'), 'OUTPUT'] = sf1['C'].astype(str).str[:5]
sf1.loc[sf1['B'].ne('US'), 'OUTPUT'] = sf1['C']

输出：

        A       B          C      OUTPUT
0    hell      US   12234455       12234
1    mell      UK   12345666    12345666
2   shall      US   21248075       21248
3     pel   SPAIN   90056784    90056784
4     wel      SP   35455689    35455689
5    shel      US   12095678       12095

【讨论】：

【解决方案2】：

为此，我们将使用 numpy 的内置 where() 函数。这个函数依次接受三个参数：我们正在测试的条件，如果条件为真则分配给我们的新列的值，如果条件为假则分配的值。 source

stf1['OUTPUT'] = np.where(df['B'] == 'US', df['C'].astype(str).str[:5], df['C'])

【讨论】：

虽然这段代码 sn-p 可以解决问题，including an explanation 确实有助于提高您的帖子质量。请记住，您是在为将来的读者回答问题，而这些人可能不知道您提出代码建议的原因。
@YevhenKuzmovych 感谢您的反馈。

【解决方案3】：

让我们试试mask

df['OUTPUT'] = df.C.mask(df.B.eq('US'),lambda x : x.astype(str).str[:5])
df
Out[30]: 
       A      B         C    OUTPUT
0   hell     US  12234455     12234
1   mell     UK  12345666  12345666
2  shall     US  21248075     21248
3    pel  SPAIN  90056784  90056784
4    wel     SP  35455689  35455689
5   shel     US  12095678     12095

【讨论】：

【解决方案4】：

您也可以按如下方式使用apply函数：

df['Output'] = df.apply(lambda row: row.C[:5] if row.B == 'US' else row.C, axis = 1)

这会产生：

    A   B   C   Output
0   hell    US  12234455    12234
1   mell    UK  12345666    12345666
2   shall   US  21248075    21248
3   pel     SPAIN   90056784    90056784
4   wel     SP  35455689    35455689
5   shel    US  12095678    12095

【讨论】：