Pandas Python - 从 2 个 CSV 列创建子图答案

【问题标题】：Pandas Python - Create subplots from 2 CSV columnsPandas Python - 从 2 个 CSV 列创建子图
【发布时间】：2022-01-03 10:54:40
【问题描述】：

我正在尝试创建子图：第一个饼图（得到它），第二个条形图（没有成功）：

这些是列：

我的代码：

top_series = all_data.head(50).groupby('Top Rated ')['Top Rated '].count()
top_values = top_series.values.tolist()
top_index = ['Top Rated', 'Not Top Rated']
top_colors = ['#27AE60', '#E74C3C']

all_data['Rating_Cat'] = all_data['Rating'].apply(lambda x : 'High' if (x > 10000000 ) else 'Low')

rating_series = all_data.head(50).groupby('Rating_Cat')['Rating_Cat'].count()
rating_values = rating_series.values.tolist()
rating_index = ['High' , 'Low']
rating_colors = ['#F1C40F', '#27AE60']

fig, axs = plt.subplots(1,2, figsize=(16,5))
axs[0].pie(top_values, labels=top_index, autopct='%1.1f%%', shadow=True, startangle=90,
           explode=(0.05, 0.05), radius=1.2, colors=top_colors, textprops={'fontsize':12})

all_data['Rating_Cat'].value_counts().plot(kind = 'bar', ax=axs[1])
fig.suptitle('Does "Rating" really affect on Top Sellers ?' , fontsize=17)

我的问题：
如何创建第二个将获得如下输出的图：

轴 X = 1 , 2 , 3 , 4 .... 50 + Top reated / NO（根据当前 col）
y 轴 = 从 0 到 7603388.0 的评分

我确实尝试了很多东西，但我有点迷失在这里......请帮助！

【问题讨论】：

我不确定您要做什么，所以我会尝试重申这个问题： - 您有 2 列。 Rating 和 Top rated。您希望Rating 中的每个值都使用基于Top rated 的颜色从头到尾绘制？
@DanielWlazło 是的！但只有前 100 行.. 我怎样才能做到？
嗨 OmerB，您介意提供minimal reproducible example吗？

标签： python pandas csv

【解决方案1】：

在第一个图中，您将获取数据集的前 50 行，并绘制 Top Rated 列中每个值的份额。

如果我理解您在第二个图中要做什么（您希望前 100 个值中的每个 Rating 都根据最高评级从第一个到最后一个颜色绘制）：

#taking first 100 rows
rating_series = all_data.head(100).copy()
#assigning color to the values, so you could use it in bar() plot
rating_series["color"] = rating_series["Top Rated "].map({"Top Rated": "#27AE60", "No": "#E74C3C"})
#plotting the values
axs[1].bar(rating_series.index, rating_series["Rating"], color = rating_series["color"])

如果你想在情节中添加图例，你必须手动完成

import matplotlib.patches as mpatches
axs[1].legend(handles=[mpatches.Patch(color='#27AE60', label='Top Rated'),
               mpatches.Patch(color='#E74C3C', label='Not Top Rated')])

编辑：我的整个代码

import pandas as pd
import numpy as np
import matplotlib.patches as mpatches
import random
df = pd.DataFrame(
    {
        "Rating": np.random.randint(0,7603388,size=200),
        "Top Rated ": [random.choice(['Top Rated', 'No']) for rated in range(0,200)]
    }
)

#taking first 100 rows
rating_series = df.head(100).copy()
#assigning color to the values, so you could use it in bar() plot
rating_series["color"] = rating_series["Top Rated "].map({"Top Rated": "#27AE60", "No": "#E74C3C"})
#checking if there were no NaNs
rating_series["color"].value_counts(dropna=False)

#Output:

#E74C3C    53
#FFC300    47
#Name: color, dtype: int64

#1st plot
top_series = rating_series.groupby('Top Rated ')['Top Rated '].count()
top_index = ['Top Rated', 'Not Top Rated']
top_colors = ['#27AE60', '#E74C3C']

fig, axs = plt.subplots(1,2, figsize=(16,5))
axs[0].pie(top_series.values, labels=top_index, autopct='%1.1f%%', shadow=True, startangle=90,
           explode=(0.05, 0.05), radius=1.2, colors=top_colors, textprops={'fontsize':12})

#2nd plot
axs[1].bar(rating_series.index, rating_series["Rating"], color = rating_series["color"])
axs[1].legend(handles=[mpatches.Patch(color='#27AE60', label='Top Rated'),
               mpatches.Patch(color='#E74C3C', label='Not Top Rated')])

【讨论】：

非常感谢！我如何处理ValueError: Invalid RGBA argument: nan RGBA 长度有什么问题？
rating_series["color"]（或 bar 函数的 color 参数中使用的其他可迭代对象）中的每个元素必须只有颜色值。你可以通过value_counts(dropna=False) 进行检查
? #plotting the values rating_series.value_counts(dronpa=False) axs[1].bar(rating_series.index, rating_series["Rating"], color = rating_series["color"])
我上传了我的整个代码。如您所见，在执行rating_series["color"].value_counts(dropna=False) 时，输出中不应有NaN（只有颜色代码）。如果存在 NaN，则意味着 "Top Rated " 列中的值与 "Top Rated" 和 "No" 不同（可能是一些空格或其他东西），应该在 map 方法中更新。我上传了我的整个代码，希望对你有所帮助，因为我已经完成了这个问题。
天哪，上帝保佑你！