【问题标题】:Setting a range of values as variables for a barplot - python将一系列值设置为条形图的变量 - python
【发布时间】:2021-10-29 14:08:49
【问题描述】:

在我的数据中,我有一列显示以下选项之一:'NOT_TESTED''NOT_COMPLETED''TOO_LOW',或介于 150190 之间的值,步长为 5(所以150、155、160 等)。
我正在尝试绘制一个条形图,它显示每个出现在列中的时间量,包括每个单独的数字。
所以条形图应该在 x 轴上有变量:'NOT_TESTED''NOT_COMPLETED''TOO_LOW'150155160 等等。
棒的高度应该是它在列中出现的次数。
这是我尝试过的代码,它让我最接近我的目标,但是,所有数字 (150-190) 都输出 1 作为条形图的值,所以所有的棒都处于相同的高度。
这不符合数据,我不知道如何前进。
我是新手,任何指导将不胜感激!

num_range = list(range(150,191, 5))
OUTCOMES = ['NOT_TESTED', 'NOT_COMPLETED', 'TOO_LOW']
OUTCOMES.extend(num_range)
df = df.append(pd.DataFrame(num_range, 
       columns=['PT1']),
       ignore_index = True)
df["outcomes_col"] = df["PT1"].astype ("category")
df["outcomes_col"].cat.set_categories(OUTCOMES , inplace = True )
sns.countplot(x= "outcomes_col", data=df, palette='Magma')
plt.xticks(rotation = 90)
plt.ylabel('Amount')
plt.xlabel('Outcomes')
plt.title("Outcomes per Testing")
plt.show()


pd.DataFrame({'ID': {0: 'GF342',  1: 'IF874',  2: 'FH386',  3: 'KJ190',  4: 'TY748',  5: 'YT947',  6: 'DF063',  7: 'ET512',  8: 'GC714',  9: 'SD978',  10: 'EF472',  11: 'PL489',  12: 'AZ315',  13: 'OL821',  14: 'HN765',  15: 'ED589'}, 'Location': {0: 'Q1',  1: 'Q3',  2: 'Q1',  3: 'Q3',  4: 'Q3',  5: 'Q4',  6: 'Q3',  7: 'Q1',  8: 'Q2',  9: 'Q3',  10: 'Q1',  11: 'Q2',  12: 'Q1',  13: 'Q1',  14: 'Q3',  15: 'Q1'}, 'NEW': {0: 'YES',  1: 'NO',  2: 'NO',  3: 'YES',  4: 'YES',  5: 'NO',  6: 'NO',  7: 'YES',  8: 'NO',  9: 'NO',  10: 'NO',  11: 'YES',  12: 'NO',  13: 'YES',  14: 'YES',  15: 'YES'}, 'YEAR': {0: 2021,  1: 2018,  2: 2019,  3: 2021,  4: 2021,  5: 2019,  6: 2019,  7: 2021,  8: 2018,  9: 2019,  10: 2018,  11: 2021,  12: 2018,  13: 2021,  14: 2021,  15: 2021}, 'PT1': {0: '',  1: 'NOT_TESTED',  2: '',  3: 'NOT_FINISHED',  4: '165',  5: '',  6: '180',  7: '145',  8: '155',  9: '',  10: '',  11: '',  12: 'TOO_LOW',  13: '150',  14: '155',  15: ''}, 'PT2': {0: '',  1: '',  2: '',  3: '',  4: '',  5: 'TOO_LOW',  6: '',  7: '',  8: '160',  9: 'TOO_LOW',  10: '',  11: '',  12: '',  13: '',  14: '',  15: ''}, 'PT3': {0: '',  1: 'TOO_LOW',  2: '',  3: 'TOO_LOW',  4: '',  5: '',  6: '',  7: '',  8: '',  9: '',  10: '',  11: 'NOT_FINISHED',  12: '',  13: '185',  14: '',  15: '165'}, 'PT4': {0: '',  1: '',  2: '',  3: '',  4: '',  5: 165.0,  6: '',  7: '',  8: '',  9: '',  10: '',  11: '',  12: 180.0,  13: '',  14: '',  15: ''}})

这不是整个数据集,只是其中的一部分。

【问题讨论】:

    标签: python pandas dataframe matplotlib seaborn


    【解决方案1】:

    从此数据框开始:
    (我将NOT_FINISHED 替换为NOT_COMPLETED,符合您问题中的代码,如果此替换正确,请告诉我)

           ID Location  NEW  YEAR            PT1      PT2            PT3  PT4
    0   GF342       Q1  YES  2021                                            
    1   IF874       Q3   NO  2018     NOT_TESTED                 TOO_LOW     
    2   FH386       Q1   NO  2019                                            
    3   KJ190       Q3  YES  2021  NOT_COMPLETED                 TOO_LOW     
    4   TY748       Q3  YES  2021            165                             
    5   YT947       Q4   NO  2019                 TOO_LOW                 165
    6   DF063       Q3   NO  2019            180                             
    7   ET512       Q1  YES  2021            145                             
    8   GC714       Q2   NO  2018            155      160                    
    9   SD978       Q3   NO  2019                 TOO_LOW                    
    10  EF472       Q1   NO  2018                                            
    11  PL489       Q2  YES  2021                          NOT_COMPLETED     
    12  AZ315       Q1   NO  2018        TOO_LOW                          180
    13  OL821       Q1  YES  2021            150                     185     
    14  HN765       Q3  YES  2021            155                             
    15  ED589       Q1  YES  2021                                    165     
    

    如果您对'PT1' 列的计数图感兴趣,首先您必须定义要绘制的类别。您可以使用pandas.CategoricalDtype,这样您就可以对这些类别进行排序。
    所以你定义了一个新的'outcomes_col' 列:

    num_range = list(range(150,191, 5))
    OUTCOMES = ['NOT_TESTED', 'NOT_COMPLETED', 'TOO_LOW']
    OUTCOMES.extend([str(num) for num in num_range])
    OUTCOMES = CategoricalDtype(OUTCOMES, ordered = True)
    df["outcomes_col"] = df["PT1"].astype(OUTCOMES)
    

    然后您可以继续绘制此列:

    sns.countplot(x= "outcomes_col", data=df, palette='Magma')
    plt.xticks(rotation = 90)
    plt.ylabel('Amount')
    plt.xlabel('Outcomes')
    plt.title("Outcomes per Testing")
        
    plt.show()
    

    完整代码

    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns
    from pandas.api.types import CategoricalDtype
    
    
    df = pd.DataFrame({'ID': {0: 'GF342',  1: 'IF874',  2: 'FH386',  3: 'KJ190',  4: 'TY748',  5: 'YT947',  6: 'DF063',  7: 'ET512',  8: 'GC714',  9: 'SD978',  10: 'EF472',  11: 'PL489',  12: 'AZ315',  13: 'OL821',  14: 'HN765',  15: 'ED589'}, 'Location': {0: 'Q1',  1: 'Q3',  2: 'Q1',  3: 'Q3',  4: 'Q3',  5: 'Q4',  6: 'Q3',  7: 'Q1',  8: 'Q2',  9: 'Q3',  10: 'Q1',  11: 'Q2',  12: 'Q1',  13: 'Q1',  14: 'Q3',  15: 'Q1'}, 'NEW': {0: 'YES',  1: 'NO',  2: 'NO',  3: 'YES',  4: 'YES',  5: 'NO',  6: 'NO',  7: 'YES',  8: 'NO',  9: 'NO',  10: 'NO',  11: 'YES',  12: 'NO',  13: 'YES',  14: 'YES',  15: 'YES'}, 'YEAR': {0: 2021,  1: 2018,  2: 2019,  3: 2021,  4: 2021,  5: 2019,  6: 2019,  7: 2021,  8: 2018,  9: 2019,  10: 2018,  11: 2021,  12: 2018,  13: 2021,  14: 2021,  15: 2021}, 'PT1': {0: '',  1: 'NOT_TESTED',  2: '',  3: 'NOT_COMPLETED',  4: '165',  5: '',  6: '180',  7: '145',  8: '155',  9: '',  10: '',  11: '',  12: 'TOO_LOW',  13: '150',  14: '155',  15: ''}, 'PT2': {0: '',  1: '',  2: '',  3: '',  4: '',  5: 'TOO_LOW',  6: '',  7: '',  8: '160',  9: 'TOO_LOW',  10: '',  11: '',  12: '',  13: '',  14: '',  15: ''}, 'PT3': {0: '',  1: 'TOO_LOW',  2: '',  3: 'TOO_LOW',  4: '',  5: '',  6: '',  7: '',  8: '',  9: '',  10: '',  11: 'NOT_COMPLETED',  12: '',  13: '185',  14: '',  15: '165'}, 'PT4': {0: '',  1: '',  2: '',  3: '',  4: '',  5: 165.0,  6: '',  7: '',  8: '',  9: '',  10: '',  11: '',  12: 180.0,  13: '',  14: '',  15: ''}})
    
    num_range = list(range(150,191, 5))
    OUTCOMES = ['NOT_TESTED', 'NOT_COMPLETED', 'TOO_LOW']
    OUTCOMES.extend([str(num) for num in num_range])
    OUTCOMES = CategoricalDtype(OUTCOMES, ordered = True)
    df["outcomes_col"] = df["PT1"].astype(OUTCOMES)
    
    sns.countplot(x= "outcomes_col", data=df, palette='Magma')
    plt.xticks(rotation = 90)
    plt.ylabel('Amount')
    plt.xlabel('Outcomes')
    plt.title("Outcomes per Testing")
    
    plt.show()
    

    【讨论】:

    • 这很好用,是否可以将“total_samples”设置为高于实际值的值? (由于我不确定列中出现了多少数字,因为有些数字是空白的,所以将 'total_samples' 设置为总行数是否可以,所以 'len(df.index)'? edit 这实际上不起作用,值不对应。我将尝试计算所有值然后使用它。
    • 我随意选择total_sample=100。这个值等于实际有值的样本数(150-190),所以你应该设置它等于出现的不是'NOT_TESTED''NOT_COMPLETED''TOO_LOW'的数量。
    • 请提供实际数据集
    • 使用数据框编辑的原始问题
    • 我更新了我的答案,如果它满足您的要求,请告诉我
    猜你喜欢
    • 2022-11-18
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-12-29
    • 1970-01-01
    • 2019-08-07
    • 2011-12-28
    • 2021-10-29
    相关资源
    最近更新 更多