【问题标题】:Automate graph generation with Seaborn using Pandas dataframe使用 Pandas 数据框通过 Seaborn 自动生成图形
【发布时间】:2018-12-05 14:42:08
【问题描述】:

Seaborn facetgrid 让我可以为一个人生成多个图表。但是,我无法修改代码以通过循环来为 20 个不同的人重复相同的过程。

当我尝试调用每个不同人的数据框时,它会中断。问题是我用数据框名称调用一个字符串,而不是调用数据框本身。我该如何解决?

我从一个非常大的数据框开始,我从那个更大的数据框为每个人制作了单独的数据框。当我尝试遍历每个人的数据框时,我无法调用数据框本身。

这些似乎与问题有关:

for i in Person_u:
    output_file=(i + '.png')
    input_file=(i + '.csv')
    title=i
    db=('df_' + i)

下面我包含了适用于 1 个人的代码和不适用于多人循环的代码。

# import libraries ...
# import data from csv file ...
#create data frame from values in the csv file
df = pd.read_csv(input_file, sep=',', delimiter=None, header='infer', 
    names=['LH', 'RevID', 'OrigID', 'Person', 'Date', 'File', 
    'Threshold', 'StepSize', 'RevNum', 'WL', 'RevPos', 'ExpNum', 'Light', 'ThExp'], 
    usecols=['OrigID', 'Person', 'Date', 'Threshold', 'RevNum', 'WL', 'RevPos', 'ExpNum', 'ThExp'], 
    engine='python', skiprows=1, infer_datetime_format=True)

# By Experiment
# Experiment 1, 2, 3, 4 (hundreds of rows, etc.)
df_TLR_1 = df.loc[(df.Person == 'TLR') & (df.ExpNum == 1)]
df_KJE_1 = df.loc[(df.Person == 'KJE') & (df.ExpNum == 1)]
df_NMP_2 = df.loc[(df.Person == 'NMP') & (df.ExpNum == 2)]
df_SFO_2 = df.loc[(df.Person == 'SFO') & (df.ExpNum == 2)]
df_MTC_3 = df.loc[(df.Person == 'MTC') & (df.ExpNum == 3)]
df_ZBL_3 = df.loc[(df.Person == 'ZBL') & (df.ExpNum == 3)]
df_MTC_4 = df.loc[(df.Person == 'MTC') & (df.ExpNum == 4)]
df_TLR_1 = df.loc[(df.Person == 'RJI') & (df.ExpNum == 4)]

Person_u = df.Person.unique()
ExpNum_u = df.ExpNum.unique()
WL_u = df.WL.unique()
ThExp_u = df.ThExp.unique()

# seaborn set stylesns.set(style="ticks")
grid = sns.FacetGrid(df_TLR_1, col="WL", hue="ThExp", col_wrap=4, size=4)
grid.map(plt.axhline, y=0, ls=":", c=".5")
# Draw a horizontal line showing min max constraints of staircase
if df_TLR_1.iloc[0,7] == 1:
    grid.map(plt.axhline, y=-60, ls=":", c=".5")
    grid.map(plt.axhline, y=40, ls=":", c=".5")
    grid.map(plt.plot, "RevNum", "RevPos", marker="o", ms=4)
    grid.set(xticks=np.arange(13), yticks=[-65, -60, -40, -20, 0, 20, 40, 45], xlim=(-.5, 12.5), ylim=(-65, 45))
elif df_TLR_1.iloc[0,7] == 4:
    grid.map(plt.axhline, y=-50, ls=":", c=".5")
    grid.map(plt.axhline, y=50, ls=":", c=".5")
    grid.map(plt.plot, "RevNum", "RevPos", marker="o", ms=4)
    grid.set(xticks=np.arange(13), yticks=[-60, -40, -20, 0, 20, 40, 60], xlim=(-.5, 12.5), ylim=(-65, 65))
else:
    print('Error. Experiment Number not 1-4.')
# Draw a line plot to show reversals of staircase
# Adjust the arrangement of the plots
grid.fig.tight_layout(w_pad=.5)
this_name=df_TLR_1.iloc[0,1]
th_experiment=df_TLR_1.iloc[0,8]
this_experiment=th_experiment[-4:8]

#plt.suptitle(df_TLR_1.iloc[0,1] + df_TLR_1.iloc[0,8], fontsize=20)
plt.suptitle(this_name + ' ' + this_experiment, fontsize=20, ha='right')
plt.savefig(this_name + ' ' + this_experiment + '.png')
plt.show()

当我尝试将其更改为遍历每个唯一的人时,我无法将三个字母和实验编号附加到 df_XXX_X。 例如,将 df_RJI_1 更改为 df_MTC_3 等。

for i in Person_u:
    output_file=(i + '.png')
    input_file=(i + '.csv')
    title=i
    db=('df_' + i)
    #seaborn set style
    sns.set(style="ticks")
    grid = sns.FacetGrid(db, col="WL", hue="ThExp", col_wrap=5, size=4)
    grid.map(plt.axhline, y=0, ls=":", c=".5")
    # Draw a horizontal line showing min max constraints of staircase
    if db[0,7] == 1:
        grid.map(plt.axhline, y=-40, ls=":", c=".5")
        grid.map(plt.axhline, y=60, ls=":", c=".5")
    elif db[0,7] == 4:
        grid.map(plt.axhline, y=-50, ls=":", c=".5")
        grid.map(plt.axhline, y=50, ls=":", c=".5")
    else:
        print('Error. Experiment Number not 1-4.')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-11-2b9785282fe1> in <module>()
      6     #seaborn set style
      7     sns.set(style="ticks")
----> 8     grid = sns.FacetGrid(db, col="WL", hue="ThExp", col_wrap=5, size=4)
      9     grid.map(plt.axhline, y=0, ls=":", c=".5")
     10     # Draw a horizontal line showing min max constraints of staircase

c:\users\rijekah\appdata\local\programs\python\python35\lib\site-packages\seaborn\axisgrid.py in __init__(self, data, row, col, hue, col_wrap, sharex, sharey, size, aspect, palette, row_order, col_order, hue_order, hue_kws, dropna, legend_out, despine, margin_titles, xlim, ylim, subplot_kws, gridspec_kws)
    235             hue_names = None
    236         else:
--> 237             hue_names = utils.categorical_order(data[hue], hue_order)
    238 
    239         colors = self._get_palette(data, hue, hue_order, palette)

TypeError: string indices must be integers

这是一个有效的图表示例:

# seaborn set stylesns.set(style="ticks")
grid = sns.FacetGrid(df_TLR_1, col="WL", hue="ThExp", col_wrap=3, size=6)
grid.map(plt.axhline, y=0, ls=":", c=".5")
# Draw a horizontal line showing min max constraints of staircase
if df_TLR_1.iloc[0,7] == 1:
    grid.map(plt.axhline, y=-60, ls=":", c=".5")
    grid.map(plt.axhline, y=40, ls=":", c=".5")
    grid.map(plt.plot, "RevNum", "RevPos", marker="o", ms=4)
    grid.set(xticks=np.arange(13), yticks=[-65, -60, -40, -20, 0, 20, 40, 45], xlim=(-.5, 15.5), ylim=(-65, 45))
elif df_TLR_1.iloc[0,7] == 4:
    grid.map(plt.axhline, y=-50, ls=":", c=".5")
    grid.map(plt.axhline, y=50, ls=":", c=".5")
    grid.map(plt.plot, "RevNum", "RevPos", marker="o", ms=4)
    grid.set(xticks=np.arange(13), yticks=[-60, -40, -20, 0, 20, 40, 60], xlim=(-.5, 12.5), ylim=(-65, 65))
else:
    print('Error. Experiment Number not 1-4.')

# Adjust the arrangement of the plots
grid.fig.tight_layout(w_pad=.5)
this_name=df_TLR_1.iloc[0,1]
th_experiment=df_TLR_1.iloc[0,9]
this_experiment=th_experiment[-4:8]

# add figure title and save figure
plt.suptitle(this_name + ' ' + this_experiment, fontsize=20, ha='right')
plt.savefig(this_name + ' ' + this_experiment + '.png')

【问题讨论】:

  • 问题是您创建的 db 变量是一个字符串。您希望它是您引用的数据框,但它被设置为与数据框同名的字符串。
  • 是的!如何使其成为数据框本身而不是字符串?

标签: python python-3.x pandas data-visualization seaborn


【解决方案1】:

您正在创建一个字符串而不是实际的变量名称。使用 eval 方法可以解决这个问题。

而不是您当前拥有的 db 行,将其更改为

db = eval('df_'+i)

这应该可以解决您的问题。

【讨论】:

猜你喜欢
  • 2015-09-27
  • 2018-07-28
  • 2020-08-13
  • 1970-01-01
  • 2020-05-14
  • 2019-05-16
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多