【问题标题】:Pandas Seaborn Swarmplot doesn't plotPandas Seaborn Swarmplot 不绘图
【发布时间】:2016-09-09 14:02:52
【问题描述】:

我正在尝试绘制一个 seaborn swarmplot,其中 col[2] 是频率, col[3] 是要分组的类。下面给出了输入和代码。 输入

tweetcricscore,51,high active
tweetcricscore,46,event based
tweetcricscore,12,event based
tweetcricscore,46,event based
tweetcricscore,1,viewers 
tweetcricscore,178,viewers
tweetcricscore,46,situational
tweetcricscore,23,situational
tweetcricscore,1,situational
tweetcricscore,8,situational
tweetcricscore,56,situational

代码:

import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns

sns.set(style="whitegrid", color_codes=True)

df = pd.read_csv('input.csv', header = None)

df.columns = ['keyword','freq','class']

ax = sns.swarmplot(x="class", y="freq", data=df)

plt.show()

代码不会绘图,也不会给出任何错误。有什么优化代码的建议吗?

【问题讨论】:

    标签: python pandas matplotlib data-visualization seaborn


    【解决方案1】:

    我认为您首先需要read_csv,然后通过连接fillna 和最后一个strip 空格来创建新的列类:

    import pandas as pd
    import matplotlib as mpl
    import matplotlib.pyplot as plt
    import seaborn as sns
    import io
    
    temp=u"""tweetcricscore 51 high active
    tweetcricscore 46 event based
    tweetcricscore 12 event based
    tweetcricscore 46 event based
    tweetcricscore 1 viewers 
    tweetcricscore 178 viewers
    tweetcricscore 46 situational
    tweetcricscore 23 situational
    tweetcricscore 1 situational
    tweetcricscore 8 situational
    tweetcricscore 56 situational"""
    #after testing replace io.StringIO(temp) to filename
    df = pd.read_csv(io.StringIO(temp), 
                     sep="\s+", #separator is arbitrary whitespace
                     names=['keyword','freq','class1','class2']) #set new col names
    
    df['class'] = df['class1'] + ' ' + df['class2'].fillna('')
    df['class'] = df['class'].str.strip()
    print df
               keyword  freq       class1  class2        class
    0   tweetcricscore    51         high  active  high active
    1   tweetcricscore    46        event   based  event based
    2   tweetcricscore    12        event   based  event based
    3   tweetcricscore    46        event   based  event based
    4   tweetcricscore     1      viewers     NaN      viewers
    5   tweetcricscore   178      viewers     NaN      viewers
    6   tweetcricscore    46  situational     NaN  situational
    7   tweetcricscore    23  situational     NaN  situational
    8   tweetcricscore     1  situational     NaN  situational
    9   tweetcricscore     8  situational     NaN  situational
    10  tweetcricscore    56  situational     NaN  situational
    
    sns.set(style="whitegrid", color_codes=True)
    ax = sns.swarmplot(x="class", y="freq", data=df)
    plt.show()
    

    class 列不包含空格时的解决方案:

    import pandas as pd
    import matplotlib as mpl
    import matplotlib.pyplot as plt
    import seaborn as sns
    import io
    
    temp=u"""tweetcricscore 51 highactive
    tweetcricscore 46 eventbased
    tweetcricscore 12 eventbased
    tweetcricscore 46 eventbased
    tweetcricscore 1 viewers 
    tweetcricscore 178 viewers
    tweetcricscore 46 situational
    tweetcricscore 23 situational
    tweetcricscore 1 situational
    tweetcricscore 8 situational
    tweetcricscore 56 situational"""
    #after testing replace io.StringIO(temp) to filename
    df = pd.read_csv(io.StringIO(temp), 
                     sep="\s+", #separator is arbitrary whitespace
                     names=['keyword','freq','class']) #set new col names
    print df
    
               keyword  freq        class
    0   tweetcricscore    51   highactive
    1   tweetcricscore    46   eventbased
    2   tweetcricscore    12   eventbased
    3   tweetcricscore    46   eventbased
    4   tweetcricscore     1      viewers
    5   tweetcricscore   178      viewers
    6   tweetcricscore    46  situational
    7   tweetcricscore    23  situational
    8   tweetcricscore     1  situational
    9   tweetcricscore     8  situational
    10  tweetcricscore    56  situational
    
    sns.set(style="whitegrid", color_codes=True)
    ax = sns.swarmplot(x="class", y="freq", data=df)
    plt.show()
    

    EDIT2:

    如果分隔符是,,则使用:

    import pandas as pd
    import matplotlib as mpl
    import matplotlib.pyplot as plt
    import seaborn as sns
    import io
    
    temp=u"""tweetcricscore,51,high active
    tweetcricscore,46,event based
    tweetcricscore,12,event based
    tweetcricscore,46,event based
    tweetcricscore,1,viewers
    tweetcricscore,178,viewers
    tweetcricscore,46,situational
    tweetcricscore,23,situational
    tweetcricscore,1,situational
    tweetcricscore,8,situational
    tweetcricscore,56,situational"""
    #after testing replace io.StringIO(temp) to filename
    df = pd.read_csv(io.StringIO(temp), names=['keyword','freq','class'])
    
    print df
               keyword  freq        class
    0   tweetcricscore    51  high active
    1   tweetcricscore    46  event based
    2   tweetcricscore    12  event based
    3   tweetcricscore    46  event based
    4   tweetcricscore     1      viewers
    5   tweetcricscore   178      viewers
    6   tweetcricscore    46  situational
    7   tweetcricscore    23  situational
    8   tweetcricscore     1  situational
    9   tweetcricscore     8  situational
    10  tweetcricscore    56  situational
    
    sns.set(style="whitegrid", color_codes=True)
    ax = sns.swarmplot(x="class", y="freq", data=df)
    plt.show()
    

    【讨论】:

    • 是的,完全正确。而sep=','是默认分隔符,所以你可以使用df = pd.read_csv('filename', names=['keyword','freq','class'])
    • Opps,我没有注意到分隔符的变化。所以请检查edit2。
    • 很高兴能帮到你!祝你好运!
    • 是的,没问题。好主意。
    • 好的,我认为这是个问题。见linkdocs - although it does not scale as well to large numbers of observations
    【解决方案2】:

    在用超过8-10k 行的数据集绘制swamplot 并在jezreal 的不断帮助和建议下进行了几次尝试。我们得出的结论是seaborn 类别绘图swarmplot 无法像教程文档中提到的seaborn 中的其他绘图那样缩放大数据。因此,我将绘图样式更改为bokeh 散点图,其中我使用y 轴上的数值和x 轴上的分组类别名称,这有点解决了我用类别绘制univariate 数据的问题。

    import numpy as np
    import matplotlib.pyplot as plt
    from pylab import*
    import math
    from matplotlib.ticker import LogLocator
    import pandas as pd
    
    from bokeh.models import BoxSelectTool, BoxZoomTool, LassoSelectTool
    from bokeh.charts import Scatter, output_file, show
    from bokeh.plotting import figure, hplot, vplot
    from bokeh.models import LinearAxis
    
    df = pd.read_csv('input.csv', header = None)
    
    df.columns = ['user','freq','class']
    
    scatter = Scatter( df, x='class', y='freq', color='class', marker='class', title=' User classification', legend=False)
    
    output_file('output.html', title='output')
    
    show(scatter)
    

    这允许按class 列分组,并根据组分配颜色和标记。 freq 沿 y 轴绘制。

    注意:这可能会意外地起作用,因为数据是离散的。

    【讨论】:

      猜你喜欢
      • 2017-10-03
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2016-07-21
      • 2017-01-07
      • 2021-05-01
      • 1970-01-01
      相关资源
      最近更新 更多