箱形图Boxplot
Boxplot可能是最常见的图形类型之一。它能够很好表示数据中的分布规律。箱型图方框的末尾显示了上下四分位数。极线显示最高和最低值,不包括异常值。seaborn中用boxplot函数制作箱形图。该章节主要内容有:
- 基础箱形图绘制 Basic boxplot and input format
- 自定义外观 Custom boxplot appearance
- 箱型图的颜色设置 Control colors of boxplot
- 分组箱图 Grouped Boxplot
- 箱图的顺序设置 Control order of boxplot
- 添加散点分布 Add jitter over boxplot
- 显示各类的样本数 Show number of observation on boxplot
import seaborn as sns
df = sns.load_dataset('iris')
df.head()
|
sepal_length |
sepal_width |
petal_length |
petal_width |
species |
| 0 |
5.1 |
3.5 |
1.4 |
0.2 |
setosa |
| 1 |
4.9 |
3.0 |
1.4 |
0.2 |
setosa |
| 2 |
4.7 |
3.2 |
1.3 |
0.2 |
setosa |
| 3 |
4.6 |
3.1 |
1.5 |
0.2 |
setosa |
| 4 |
5.0 |
3.6 |
1.4 |
0.2 |
setosa |
1. 基础箱形图绘制 Basic boxplot and input format
- 一个数值变量 One numerical variable only
- 一个数值变量和多个分组 One numerical variable, and several groups
- 多个数值变量 Several numerical variable
- 水平箱型图 Horizontal boxplot with seaborn
sns.boxplot( y=df["sepal_length"] );
![[seaborn] seaborn学习笔记1——箱形图Boxplot [seaborn] seaborn学习笔记1——箱形图Boxplot](/default/index/img?u=L2RlZmF1bHQvaW5kZXgvaW1nP3U9YUhSMGNITTZMeTl3YVdGdWMyaGxiaTVqYjIwdmFXMWhaMlZ6THpJME1pODNNRGcyTldabFpETTFaamhtWVRSak4yRTNORGM1TmpZeU1qaGhORGhoTWk1d2JtYz0=)
sns.boxplot( x=df["species"], y=df["sepal_length"] );
![[seaborn] seaborn学习笔记1——箱形图Boxplot [seaborn] seaborn学习笔记1——箱形图Boxplot](/default/index/img?u=L2RlZmF1bHQvaW5kZXgvaW1nP3U9YUhSMGNITTZMeTl3YVdGdWMyaGxiaTVqYjIwdmFXMWhaMlZ6THpNNE1pODFNekl5TWpWbU5HSm1aVEUxWkRGbU16WmlOREppTm1ZeE5HRmpaREF3Tmk1d2JtYz0=)
sns.boxplot(data=df.iloc[:,0:2]);
![[seaborn] seaborn学习笔记1——箱形图Boxplot [seaborn] seaborn学习笔记1——箱形图Boxplot](/default/index/img?u=L2RlZmF1bHQvaW5kZXgvaW1nP3U9YUhSMGNITTZMeTl3YVdGdWMyaGxiaTVqYjIwdmFXMWhaMlZ6THpNNE9TODBNakl6TldVeFlqVXdORFZqTkdKbU1UYzJaakZpTm1NelpqWXlZalV3TlM1d2JtYz0=)
sns.boxplot( y=df["species"], x=df["sepal_length"] );
![[seaborn] seaborn学习笔记1——箱形图Boxplot [seaborn] seaborn学习笔记1——箱形图Boxplot](/default/index/img?u=L2RlZmF1bHQvaW5kZXgvaW1nP3U9YUhSMGNITTZMeTl3YVdGdWMyaGxiaTVqYjIwdmFXMWhaMlZ6THpjME1TOWtPREkyTkRFMVpERmpPVE5qTkRBeFl6RTJOREEyTXpGbE5EVTJNMkl6WkM1d2JtYz0=)
2. 自定义外观 Custom boxplot appearance
- 自定义线宽 Custom line width
- 添加缺口 Add notch
- 控制箱的尺寸 Control box sizes
sns.boxplot( x=df["species"], y=df["sepal_length"], linewidth=5);
![[seaborn] seaborn学习笔记1——箱形图Boxplot [seaborn] seaborn学习笔记1——箱形图Boxplot](/default/index/img?u=L2RlZmF1bHQvaW5kZXgvaW1nP3U9YUhSMGNITTZMeTl3YVdGdWMyaGxiaTVqYjIwdmFXMWhaMlZ6THpFNE1pODVNalZoTVdRMlpHTTJOamN5TURWbFpXUmhNemRqTkdZNU1tRmtNamcxWlM1d2JtYz0=)
sns.boxplot( x=df["species"], y=df["sepal_length"], notch=True);
![[seaborn] seaborn学习笔记1——箱形图Boxplot [seaborn] seaborn学习笔记1——箱形图Boxplot](/default/index/img?u=L2RlZmF1bHQvaW5kZXgvaW1nP3U9YUhSMGNITTZMeTl3YVdGdWMyaGxiaTVqYjIwdmFXMWhaMlZ6THpRd0x6TXlZelppTmpNeE0yTTBZbVZtWlRVd09URXdPV1F6T1dVNU5tTmpNemd3TG5CdVp3PT0=)
sns.boxplot( x=df["species"], y=df["sepal_length"], width=0.3);
![[seaborn] seaborn学习笔记1——箱形图Boxplot [seaborn] seaborn学习笔记1——箱形图Boxplot](/default/index/img?u=L2RlZmF1bHQvaW5kZXgvaW1nP3U9YUhSMGNITTZMeTl3YVdGdWMyaGxiaTVqYjIwdmFXMWhaMlZ6THpZM05pODRZV0ZqWVdRMU9UaGtOakV6TURjMk1EQmhaams1TXpJek9XUTFNbUkzWXk1d2JtYz0=)
3. 箱型图的颜色设置 Control colors of boxplot
- 调色板的使用 Use a color palette
- 单种颜色的使用 Uniform color
- 每组的特定颜色 Specific color for each group
- 单组高亮 Highlight a group
- 添加透明色 Add transparency to color
sns.boxplot( x=df["species"], y=df["sepal_length"], palette="Blues");
![[seaborn] seaborn学习笔记1——箱形图Boxplot [seaborn] seaborn学习笔记1——箱形图Boxplot](/default/index/img?u=L2RlZmF1bHQvaW5kZXgvaW1nP3U9YUhSMGNITTZMeTl3YVdGdWMyaGxiaTVqYjIwdmFXMWhaMlZ6THpNeE1pOWxaRFU1WkRBeE9UVmhOV1EyTldabE9Ua3lZMk0xTWpFek5ESm1NemM1TUM1d2JtYz0=)
sns.boxplot( x=df["species"], y=df["sepal_length"], color="skyblue");
![[seaborn] seaborn学习笔记1——箱形图Boxplot [seaborn] seaborn学习笔记1——箱形图Boxplot](/default/index/img?u=L2RlZmF1bHQvaW5kZXgvaW1nP3U9YUhSMGNITTZMeTl3YVdGdWMyaGxiaTVqYjIwdmFXMWhaMlZ6THpnNU1TOHlZbU0yTlRVek5HWTRORFl5TURjMFlqUTVZbUpoWXpVeVlXSTBOV1JpTXk1d2JtYz0=)
my_pal = {"versicolor": "g", "setosa": "b", "virginica":"m"}
sns.boxplot( x=df["species"], y=df["sepal_length"], palette=my_pal);
![[seaborn] seaborn学习笔记1——箱形图Boxplot [seaborn] seaborn学习笔记1——箱形图Boxplot](/default/index/img?u=L2RlZmF1bHQvaW5kZXgvaW1nP3U9YUhSMGNITTZMeTl3YVdGdWMyaGxiaTVqYjIwdmFXMWhaMlZ6THpFd0x6VTNPR1V4Wmpka09EYzJaR1UyWm1aak0yUTFNMkZpTkdJeFlUTTJNRGxoTG5CdVp3PT0=)
my_pal = {species: "r" if species == "versicolor" else "b" for species in df.species.unique()}
sns.boxplot( x=df["species"], y=df["sepal_length"], palette=my_pal);
![[seaborn] seaborn学习笔记1——箱形图Boxplot [seaborn] seaborn学习笔记1——箱形图Boxplot](/default/index/img?u=L2RlZmF1bHQvaW5kZXgvaW1nP3U9YUhSMGNITTZMeTl3YVdGdWMyaGxiaTVqYjIwdmFXMWhaMlZ6THpFd055ODNOekJpTnpJMlpEaGlPV1F4TUdNM1lXRTNNV0ZpWkRaallqWXdOV0l5TXk1d2JtYz0=)
ax = sns.boxplot(x='species', y='sepal_length', data=df);
for patch in ax.artists:
r, g, b, a = patch.get_facecolor()
patch.set_facecolor((r, g, b, .3))
![[seaborn] seaborn学习笔记1——箱形图Boxplot [seaborn] seaborn学习笔记1——箱形图Boxplot](/default/index/img?u=L2RlZmF1bHQvaW5kZXgvaW1nP3U9YUhSMGNITTZMeTl3YVdGdWMyaGxiaTVqYjIwdmFXMWhaMlZ6THpZMk1pOWlOemxpTm1FM09XWmxNVFUwTnpobFpUaGhaamxsT1RrMlltUmtZV1ZtWlM1d2JtYz0=)
4. 分组箱图 Grouped Boxplot
df_tips = sns.load_dataset('tips')
df_tips.head()
|
total_bill |
tip |
sex |
smoker |
day |
time |
size |
| 0 |
16.99 |
1.01 |
Female |
No |
Sun |
Dinner |
2 |
| 1 |
10.34 |
1.66 |
Male |
No |
Sun |
Dinner |
3 |
| 2 |
21.01 |
3.50 |
Male |
No |
Sun |
Dinner |
3 |
| 3 |
23.68 |
3.31 |
Male |
No |
Sun |
Dinner |
2 |
| 4 |
24.59 |
3.61 |
Female |
No |
Sun |
Dinner |
4 |
sns.boxplot(x="day", y="total_bill", hue="smoker", data=df_tips, palette="Set1");
![[seaborn] seaborn学习笔记1——箱形图Boxplot [seaborn] seaborn学习笔记1——箱形图Boxplot](/default/index/img?u=L2RlZmF1bHQvaW5kZXgvaW1nP3U9YUhSMGNITTZMeTl3YVdGdWMyaGxiaTVqYjIwdmFXMWhaMlZ6THpZNE5TOW1PR00yWlRObVlqaGtNVEJtTVRsaE56bGxNalkzTXpFMVpqWmlaR1E1WkM1d2JtYz0=)
5. 箱图的顺序设置 Control order of boxplot
p1=sns.boxplot(x='species', y='sepal_length', data=df, order=["virginica", "versicolor", "setosa"]);
![[seaborn] seaborn学习笔记1——箱形图Boxplot [seaborn] seaborn学习笔记1——箱形图Boxplot](/default/index/img?u=L2RlZmF1bHQvaW5kZXgvaW1nP3U9YUhSMGNITTZMeTl3YVdGdWMyaGxiaTVqYjIwdmFXMWhaMlZ6THprME55OWhNRFl3WVRGa1lXWmxabUpoWW1ObE5tRXdPVFJoTm1JMFptVXlNRGRsTXk1d2JtYz0=)
my_order = df.groupby(by=["species"])["sepal_length"].median().iloc[::-1].index
sns.boxplot(x='species', y='sepal_length', data=df, order=my_order);
![[seaborn] seaborn学习笔记1——箱形图Boxplot [seaborn] seaborn学习笔记1——箱形图Boxplot](/default/index/img?u=L2RlZmF1bHQvaW5kZXgvaW1nP3U9YUhSMGNITTZMeTl3YVdGdWMyaGxiaTVqYjIwdmFXMWhaMlZ6THpnME9TOHhaVGd5WW1RMU5HVTVObVZtT0RFeU56UXlNV1V6WXpFek9UWXdNMk0yT1M1d2JtYz0=)
6. 添加散点分布 Add jitter over boxplot
ax = sns.boxplot(x='species', y='sepal_length', data=df)
ax = sns.swarmplot(x='species', y='sepal_length', data=df, color="grey")
![[seaborn] seaborn学习笔记1——箱形图Boxplot [seaborn] seaborn学习笔记1——箱形图Boxplot](/default/index/img?u=L2RlZmF1bHQvaW5kZXgvaW1nP3U9YUhSMGNITTZMeTl3YVdGdWMyaGxiaTVqYjIwdmFXMWhaMlZ6THpJeU1pOHpZVE0zWkRkbFlUazRaVE00T1RRMVpUazFaV1UzTnpkbVpUTXhPVGM0Tmk1d2JtYz0=)
7. 显示各类的样本数 Show number of observation on boxplot
ax = sns.boxplot(x="species", y="sepal_length", data=df)
medians = df.groupby(['species'])['sepal_length'].median().values
nobs = df['species'].value_counts().values
nobs = [str(x) for x in nobs.tolist()]
nobs = ["n: " + i for i in nobs]
pos = range(len(nobs))
for tick,label in zip(pos,ax.get_xticklabels()):
ax.text(pos[tick], medians[tick] + 0.03, nobs[tick], horizontalalignment='center', size='x-small', color='w', weight='semibold')
![[seaborn] seaborn学习笔记1——箱形图Boxplot [seaborn] seaborn学习笔记1——箱形图Boxplot](/default/index/img?u=L2RlZmF1bHQvaW5kZXgvaW1nP3U9YUhSMGNITTZMeTl3YVdGdWMyaGxiaTVqYjIwdmFXMWhaMlZ6THpJeU9DOWlOemN5TWpSaFlXRTNPV1V4T1Rrek1tTTRaalUyTWpGak1EbGpOR014WXk1d2JtYz0=)