根据熊猫数据框中的行索引号插入值答案

【问题标题】：Insert value based on row index number in a pandas dataframe根据熊猫数据框中的行索引号插入值
【发布时间】：2018-03-07 22:23:11
【问题描述】：

我需要根据 pandas 数据框的行索引将值插入到列中。

import pandas as pd
df=pd.DataFrame(np.random.randint(0,100,size=(11, 4)), columns=list('ABCD'))
df['ticker']='na'
df

Sample DataFrame 在上面的示例数据框中，记录总数的前 25% 的代码列必须具有值“$”，接下来 25% 的记录必须具有值“$$”，依此类推。

我尝试获取数据帧的长度并计算 25、50、75%，然后一次访问一行并根据行索引为“ticker”分配值。

total_row_count=len(df)
row_25 = int(total_row_count * .25)
row_50 = int(total_row_count * .5)
row_75=int(total_row_count*.75)

if ((row.index >=0) and (row.index<=row_25)):
    return"$"
elif ((row.index > row_25) and (row.index<=row_50)):
    return"$$"
elif ((row.index > row_50) and (row.index<=row_75)):
    return"$$$"
elif (row.index > row_75):
    return"$$$$"

但我无法获取行索引。如果有不同的方法来分配这些值，请告诉我

【问题讨论】：

标签： python pandas dataframe

【解决方案1】：

我喜欢将np.select 用于此类任务，因为我发现语法直观且易读：

# Set up your conditions:
conds = [(df.index >= 0) & (df.index <= row_25),
         (df.index > row_25) & (df.index<=row_50),
         (df.index > row_50) & (df.index<=row_75),
         (df.index > row_75)]

# Set up your target values (in the same order as your conditions)
choices = ['$', '$$', '$$$', '$$$$']

# Assign df['ticker']
df['ticker'] = np.select(conds, choices)

返回这个：

>>> df
     A   B   C   D ticker
0   92  97  25  79      $
1   76   4  26  94      $
2   49  65  19  91      $
3   76   3  83  45     $$
4   83  16   0  16     $$
5    1  56  97  44     $$
6   78  17  18  86    $$$
7   55  56  83  91    $$$
8   76  16  52  33    $$$
9   55  35  80  95   $$$$
10  90  29  41  87   $$$$

【讨论】：

"$$$$" 不会填充在最后 2 条记录中。知道为什么它不会填充吗？
try:df['ticker'] = np.select(conds, choices, default = 'test')，如果最后2条记录填入test，则表示这些行中提供的条件都不满足。否则，我不确定……
您的解决方案有效。我不确定为什么它不会显示在我的 df 中。当我将它保存为 csv 时，我能够看到“$$$$”。谢谢 sacul
我正在尝试这个，因为我认为这可以解决我的问题，但是我收到一个错误，说 ```` 'row_6 ' 没有定义 ```` （这应该是 row_25例子）。你会碰巧知道解决这个问题的方法吗？

【解决方案2】：

我觉得cut可以解决这个问题

df['ticker']=pd.cut(np.arange(len(df))/len(df), [-np.inf,0.25,0.5,0.75,1], labels=["$","$$",'$$$','$$$$'],right=True)
df
Out[35]: 
     A   B   C   D ticker
0   63  51  19  33      $
1   12  80  57   1      $
2   53  27  62  26      $
3   97  43  31  80     $$
4   91  22  92  11     $$
5   39  70  82  26     $$
6   32  62  17  75    $$$
7    5  59  79  72    $$$
8   75   4  47   4    $$$
9   43   5  45  66   $$$$
10  29   9  74  94   $$$$

【讨论】：

我不确定我错过了什么，但是当我运行代码时，它会为ticker列中的所有行返回“$”。
@sow 它在我这边工作得很好，你介意在这里粘贴你正在使用的代码吗？>
import pandas as pd import numpy as np df=pd.DataFrame(np.random.randint(0,100,size=(11, 4)), columns=list('ABCD')) df['ticker']=pd.cut(np.arange(len(df))/len(df), [-np.inf,0.25,0.5,0.75,1], labels=["$","$$",'$$$','$$$$'],right=True) df

【解决方案3】：

您可以设置一些 np.where 语句来处理这个问题。尝试以下方法：

import numpy as np
...
df['ticker'] = np.where(df.index < row_25, "$", df['ticker'])
df['ticker'] = np.where(row_25 <= df.index < row_50, "$$", df['ticker'])
df['ticker'] = np.where(row_50 <= df.index < row_75, "$$$", df['ticker'])
df['ticker'] = np.where(row_75 <= df.index, "$$$$", df['ticker'])

【讨论】：

【解决方案4】：

这是一种使用.loc 访问器的显式解决方案。

import pandas as pd

df = pd.DataFrame(np.random.randint(0,100,size=(11, 4)), columns=list('ABCD'))
n = len(df.index)

df['ticker'] = 'na'
df.loc[df.index <= n/4, 'ticker'] = '$'
df.loc[(n/4 < df.index) & (df.index <= n/2), 'ticker'] = '$$'
df.loc[(n/2 < df.index) & (df.index <= n*3/4), 'ticker'] = '$$$'
df.loc[df.index > n*3/4, 'ticker'] = '$$$$'

#      A   B   C   D ticker
# 0   47  64   7  46      $
# 1   53  55  75   3      $
# 2   93  95  28  47      $
# 3   35  88  16   7     $$
# 4   99  66  88  84     $$
# 5   75   2  72  90     $$
# 6    6  53  36  92    $$$
# 7   83  58  54  67    $$$
# 8   49  83  46  54    $$$
# 9   69   9  96  73   $$$$
# 10  84  42  11  83   $$$$

【讨论】：

“$$$$”不会说明我缺少什么？
这很奇怪，当我尝试print(df) 时，我看到了我的帖子的输出。
您的解决方案有效。我不确定为什么它不会显示在我的 df 中。当我将它保存为 csv 时，我能够看到“$$$$”。谢谢@jpp
@sow，没问题。如果它解决了您的问题，请随时接受（在左侧打勾）。