从数据框中选择特定行并为新列执行计算答案

【问题标题】：select specific rows from dataframe and perform calculations for new column从数据框中选择特定行并为新列执行计算
【发布时间】：2020-07-20 12:50:10
【问题描述】：

我有一个看起来像这样的数据框。

       Task[ms]                              Funktion  ...     min     max
  0        1              CALL_TK_CDDio_PFC_BEGIN_1MS  ...   0.640000   3.360000
  1        1                       vAdcD_MainFunction  ...  21.280001  25.920000
  2        1                          vPressE_Main1ms  ...  17.120001  81.279999
  3        1  vPositionSensorPwm_MainFunction_Fast_In  ...   9.920000  13.760000
  4        1                           CDDIO_1MS_1_IN  ...   2.240000   5.280000

我必须选择与此列名称对应的行。共有 146 行 df['Messvariable'] 。这是数据框的 Messvariable 列

0      timeslices[0].profilerDataProcess[0]_C0[us]
1      timeslices[0].profilerDataProcess[1]_C0[us]
2      timeslices[0].profilerDataProcess[2]_C0[us]
3      timeslices[0].profilerDataProcess[3]_C0[us]
4      timeslices[0].profilerDataProcess[4]_C0[us]
                
141    timeslices[9].profilerDataProcess[0]_C0[us]
142    timeslices[9].profilerDataProcess[1]_C0[us]
143    timeslices[9].profilerDataProcess[2]_C0[us]
144    timeslices[9].profilerDataProcess[3]_C0[us]
145    timeslices[9].profilerDataTask_C0[us]

我想按此列选择特定行并执行这样的操作

 while  df['Messvariable'].str.contains("timeslices[1]"):
   df['CPU_LOAD']=df['max']/(10000*2)

对于具有不同计算的所有剩余时间片也是如此。它不起作用。

str.contains 返回空数据框。

还有其他方法吗？

【问题讨论】：

显示Messvariable专栏
@DanilaGanchar 是的，添加了。它有 146 行，时间片范围从 0 到 9

标签： python pandas dataframe multiple-columns

【解决方案1】：

主要问题是regex=True 默认参数（pat 使用正则表达式）。只需将参数设置为False，或者您可以使用startswith() 或find()：

df = pd.DataFrame.from_dict({
    'Messvariable': ('timeslices[1]', 'timeslices[1]', 'empty', 'empty'),
    'max': (1, 2, 3, 4),
})

mask = df['Messvariable'].str.contains('timeslices[1]', regex=False)
# or
# mask = df['Messvariable'].str.find('timeslices[1]') != -1
# or
# mask = df['Messvariable'].str.startswith('timeslices[1]')
df['CPU_LOAD'] = 0
df.loc[mask, 'CPU_LOAD'] = df[mask]['max'] / (10000 * 2)
print(df.head())

#    Messvariable  max  CPU_LOAD
# 0  timeslices[1]    1   0.00005
# 1  timeslices[1]    2   0.00010
# 2          empty    3   0.00000
# 3          empty    4   0.00000

已更新。 对于不同的计算，最好使用 apply 和自定义函数：

df['CPU_LOAD'] = 0

def set_cpu_load(x):
    if x['Messvariable'].startswith('timeslices[1]'):
        x['CPU_LOAD'] = x['max'] / (10000 * 2)
    elif x['Messvariable'].startswith('timeslices[2]'):
        pass  # other calculation
    # elif ...
    return x

df = df.apply(set_cpu_load, axis=1)

【讨论】：

感谢您的回答。但即使指定为时间片[1]，这样做也会选择所有 146 行。
@shankarram 我的解决方案仅适用于timeslices[1]。您需要更新除timeslices[1] 之外的所有内容吗？
我需要为每个时间片从 0 到 9 进行更新。每个时间片都有不同的计算。因此，例如，当我运行此代码 .str.contains('timeslices[1]', regex=False) 代码时，它应该返回与时间片 [1] 对应的所有列中的行。但在这种情况下，我仍然得到 146 行
我尝试了同样的方法，但我之前没有使用过 apply。谢谢。