如何使用 matplotlib 子图和熊猫制作多线图？答案

【问题标题】：How to make multiline graph with matplotlib subplots and pandas?如何使用 matplotlib 子图和熊猫制作多线图？
【发布时间】：2016-11-15 09:39:28
【问题描述】：

我在编码方面还很陌生（完全自学），并在我作为癌症实验室研究助理的工作中开始使用它。我需要一些帮助，在 matplot 实验室中设置一些折线图。

我有一个数据集，其中包含大约 80 名患者的下一代测序数据。对于每位患者，我们都有不同的分析时间点、检测到的不同基因（共 40 个）以及相关的基因突变百分比。

我的目标是编写两个脚本，一个将生成“按患者”图，这将是一个带有 y-%mutation、x-time 测量的折线图，并且所有制作的线都有不同的颜色线每个患者的相关基因。第二个图将是“按基因”，其中我将有一个图包含不同的颜色线，代表每个不同患者对该特定基因的 x/y 值。

这是上述脚本的 1 个基因编号的示例数据框：

gene    yaxis   xaxis   pt# gene#
ASXL1-3 34  1   3   1
ASXL1-3 0   98  3   1
IDH1-3  24  1   3   11
IDH1-3  0   98  3   11
RUNX1-3 38  1   3   21
RUNX1-3 0   98  3   21
U2AF1-3 33  1   3   26
U2AF1-3 0   98  3   26

我设置了一个 groupby 脚本，当我对其进行迭代时，它会为每个患者的每个基因时间点提供一个数据框。

grouped = df.groupby('pt #')
for groupObject in grouped:
    group = groupObject[1]

对于患者 1，这给出以下输出：

        y     x   gene  patientnumber patientgene  genenumber  dxtotransplant  \
0    40.0  1712  ASXL1              1     ASXL1-1           1            1857   
1    26.0  1835  ASXL1              1     ASXL1-1           1            1857   
302   7.0  1835  RUNX1              1     RUNX1-1          21            1857

我需要帮助编写一个脚本来创建上述任一图。使用 bypatient 示例，我的总体想法是，我需要为患者拥有的每个基因创建一个不同的子图，其中每个子图是由该基因表示的折线图。

据我所知，使用 matplotlib 是这样的：

plt.figure()

grouped = df.groupby('patient number')

for groupObject in grouped:
    group = groupObject[1]
    df = group #may need to remove this
    for element in range(len(group)): 
        xs = np.array(df[df.columns[1]]) #"x" column
        ys= np.array(df[df.columns[0]]) #"y" column
        gene = np.array(df[df.columns[2]])[element] #"gene" column
        plt.subplot(1,1,1) 
        plt.scatter(xs,ys, label=gene)
        plt.plot(xs,ys, label=gene)
        plt.legend()
    plt.show()

这会产生以下输出：

在此输出中，圈出的线不应该连接到其他 2 个点。在这种情况下，这是患者 1，他具有以下数据点：

x       y   gene
1712    40  ASXL1
1835    26  ASXL1
1835    7   RUNX1

使用 seaborn，我已经接近使用此代码所需的图表：

grouped = df.groupby(['patientnumber'])
for groupObject in grouped:
    group = groupObject[1]
    g = sns.FacetGrid(group, col="patientgene", col_wrap=4, size=4, ylim=(0,100))  
    g = g.map(plt.scatter, "x", "y", alpha=0.5)
    g = g.map(plt.plot, "x", "y", alpha=0.5)
    plt.title= "gene:%s"%element

使用此代码，我得到以下信息：

如果我调整线路：

g = sns.FacetGrid(group, col="patientnumber", col_wrap=4, size=4, ylim=(0,100))

我得到以下结果：

正如您在 2d 示例中看到的那样，该图将我的图上的每个点都视为来自同一条线（但它们实际上是 4 条单独的线）。

如何调整迭代，以便将每个患者基因视为同一图表上的单独线？

【问题讨论】：

这里的内容可能有点宽泛——你有很好的细节水平，但 Stack Overflow 上的标签社区通常会试图阻止寻求广泛指导或入门的帖子。既然你在这方面花了一些时间，你会向我们展示你尝试过的东西，即使那没有用吗？但是我已经取消了你的截止日期 - 任何试图催促志愿者的事情通常都不会受到欢迎;-)
@halfer 嘿，我对这个网站（以及一般的编码社区）还很陌生，感谢您在我的 fopaux 上给我打电话。我曾尝试使用 seaborn、matplotlib 和 bokeh，但它们似乎都遇到了相同的错误（即，我的折线图上的每个点都被视为它们是连接的，而不是表示来自多条线的数据）。我将更新我的问题，详细说明我尝试了什么以及输出是什么。谢谢。我并不是说我在向社区寻求帮助，而是说我迫切需要帮助。我很抱歉它是那样的。
不用担心，是的，如果你可以用你的方法更新你的问题，这通常会指导答案，因为它显示了你正在采取的方法/策略，也许有人会发现错误。
@halfer 我刚刚添加了 3 个我编写的代码示例、输出以及它们与我想要完成的内容有何不同。我还稍微缩短了我的散文，并尝试删除一些不必要的细节。我希望这更符合后期的预期。再次感谢，希望这将使志愿者更容易提供帮助。
伟大的努力，这就是我们希望看到的。我自己不能就这个话题提出建议，但这看起来是很好的准备。

标签： python pandas matplotlib plot subplot

【解决方案1】：

我写了一个 subplot 函数，可以帮助你。我稍微修改了数据以帮助说明绘图功能。

gene,yaxis,xaxis,pt #,gene #
ASXL1-3,34,1,3,1
ASXL1-3,3,98,3,1
IDH1-3,24,1,3,11
IDH1-3,7,98,3,11
RUNX1-3,38,1,3,21
RUNX1-3,2,98,3,21
U2AF1-3,33,1,3,26
U2AF1-3,0,98,3,26
ASXL1-3,39,1,4,1
ASXL1-3,8,62,4,1
ASXL1-3,0,119,4,1
IDH1-3,27,1,4,11
IDH1-3,12,62,4,11
IDH1-3,1,119,4,11
RUNX1-3,42,1,4,21
RUNX1-3,3,62,4,21
RUNX1-3,1,119,4,21
U2AF1-3,16,1,4,26
U2AF1-3,1,62,4,26
U2AF1-3,0,119,4,26

这是子绘图功能...带有一些额外的花里胡哨:)

def plotByGroup(df, group, xCol, yCol, title = "", xLabel = "", yLabel = "", lineColors = ["red", "orange", "yellow", "green", "blue", "purple"], lineWidth = 2, lineOpacity = 0.7, plotStyle = 'ggplot', showLegend = False):
    """
    Plot multiple lines from a Pandas Data Frame for each group using DataFrame.groupby() and MatPlotLib PyPlot.
    @params
        df          - Required  - Data Frame    - Pandas Data Frame
        group       - Required  - String        - Column name to group on           
        xCol        - Required  - String        - Column name for X axis data
        yCol        - Required  - String        - Column name for y axis data
        title       - Optional  - String        - Plot Title
        xLabel      - Optional  - String        - X axis label
        yLabel      - Optional  - String        - Y axis label
        lineColors  - Optional  - List          - Colors to plot multiple lines
        lineWidth   - Optional  - Integer       - Width of lines to plot
        lineOpacity - Optional  - Float         - Alpha of lines to plot
        plotStyle   - Optional  - String        - MatPlotLib plot style
        showLegend  - Optional  - Boolean       - Show legend
    @return
        MatPlotLib Plot Object

    """
    # Import MatPlotLib Plotting Function & Set Style
    from matplotlib import pyplot as plt
    matplotlib.style.use(plotStyle)
    figure = plt.figure()                   # Initialize Figure
    grouped = df.groupby(group)             # Set Group
    i = 0                                   # Set iteration to determine line color indexing
    for idx, grp in grouped:
        colorIndex = i % len(lineColors)    # Define line color index
        lineLabel = grp[group].values[0]    # Get a group label from first position
        xValues = grp[xCol]                 # Get x vector
        yValues = grp[yCol]                 # Get y vector
        plt.subplot(1,1,1)                  # Initialize subplot and plot (on next line)
        plt.plot(xValues, yValues, label = lineLabel, color = lineColors[colorIndex], lw = lineWidth, alpha = lineOpacity)
        # Plot legend
        if showLegend:
            plt.legend()
        i += 1
    # Set title & Labels
    axis = figure.add_subplot(1,1,1)
    axis.set_title(title)
    axis.set_xlabel(xLabel)
    axis.set_ylabel(yLabel)
    # Return plot for saving, showing, etc.
    return plt

然后使用它...

import pandas

# Load the Data into Pandas
df = pandas.read_csv('data.csv')    

#
# Plotting - by Patient
#

# Create Patient Grouping
patientGroup = df.groupby('pt #')

# Iterate Over Groups
for idx, patientDF in patientGroup:
    # Let's give them specific titles
    plotTitle = "Gene Frequency over Time by Gene (Patient %s)" % str(patientDf['pt #'].values[0])
    # Call the subplot function
    plot = plotByGroup(patientDf, 'gene', 'xaxis', 'yaxis', title = plotTitle, xLabel = "Days", yLabel = "Gene Frequency")
    # Add Vertical Lines at Assay Timepoints
    timepoints = set(patientDf.xaxis.values)
    [plot.axvline(x = timepoint, linewidth = 1, linestyle = "dashed", color='gray', alpha = 0.4) for timepoint in timepoints]
    # Let's see it
    plot.show()

当然，我们也可以通过基因来做到这一点。

#
# Plotting - by Gene
#

# Create Gene Grouping
geneGroup   = df.groupby('gene')

# Generate Plots for Groups
for idx, geneDF in geneGroup:
    plotTitle = "%s Gene Frequency over Time by Patient" % str(geneDf['gene'].values[0])
    plot = plotByGroup(geneDf, 'pt #', 'xaxis', 'yaxis', title = plotTitle, xLab = "Days", yLab = "Frequency")
    plot.show()

如果这不是您要查找的内容，请提供说明，我会再试一次。

【讨论】：

你是个摇滚明星，我迫不及待地想明天开始工作尝试一下。谢谢！
谢谢！如果成功，请接受答案:)