【问题标题】:How can I get matplot to print ALL the outliers in a different colour not just one如何让 matplotlib 以不同颜色打印所有异常值,而不仅仅是一种颜色
【发布时间】:2020-03-25 01:47:47
【问题描述】:

我有一个基本的散点图,想用不同的颜色显示所有异常值。我将异常值定义为与均值相差超过 2 个标准差。我生成的代码只显示了一个异常值,而我希望所有异常值都是不同的颜色:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
data = pd.read_csv('1fXr31hcEemkYxLyQ1aU1g_50fc36ee697c4b158fe26ade3ec3bc24_Banknote-authentication-dataset- (1).csv')
data = np.array(data)
mean = np.mean(data, 0)
min = np.min(data,0)
max = np.max(data,0)
normed = (data - min) / (max - min)
mean = np.mean(normed, 0)
std_dev = np.std (normed, 0)
fig, graph = plt.subplots()
graph.scatter(normed [:,0], normed [:,1])
graph.scatter(mean[0], mean[1])
outliers = normed[normed>2*std_dev]
graph.scatter(outliers [0], outliers [1], c='red')
plt.show

【问题讨论】:

    标签: pandas filter colors scatter-plot outliers


    【解决方案1】:

    执行此操作的一种简单方法是在数据框中创建一个用于识别异常值的新列,然后将其输入plt.scatter() 中的c 参数:

    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    
    df = pd.DataFrame({'x' : np.random.normal(0, size = 100),
                       'y' : np.random.normal(0, size = 100)})
    
    # Identifies the means of x and y
    x_mean = df['x'].mean()
    y_mean = df['y'].mean()
    
    # Identify the standard deviation multiplied by 2
    x_std2 = x_mean + df['x'].std()*2
    y_std2 = y_mean + df['y'].std()*2
    
    # Create a  new column indicating if a value is below or above the mean +/- 2 times the standard deviation
    df['outlier'] = (((x_std2*-1 <= df['x']) & (df['x'] <= x_std2)) & 
                      ((y_std2*-1 <= df['y']) & (df['y'] <= y_std2)))
    
    # Here we use the indicator to signify the color that point should be assigned
    plt.scatter(df['x'],
                df['y'],
                s = 15,
                c = df['outlier'],
                cmap = 'RdYlGn')
    plt.xlabel('X')
    plt.ylabel('Y')
    
    # I just added a couple reference lines so you can see that the points are indeed below or above the mean +/- 2 times the standard deviation
    plt.axvline(x_mean, linestyle = '--', color = 'k')
    plt.axhline(y_mean, linestyle = '--', color = 'k')
    plt.axvline(x_std2, linestyle = ':', color = 'k')
    plt.axhline(y_std2, linestyle = ':', color = 'k')
    plt.axvline(x_std2*-1, linestyle = ':', color = 'k')
    plt.axhline(y_std2*-1, linestyle = ':', color = 'k')
    

    最终输出:

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2018-10-12
      • 1970-01-01
      • 2014-03-13
      • 2020-08-01
      • 1970-01-01
      • 1970-01-01
      • 2021-01-03
      • 2013-12-15
      相关资源
      最近更新 更多