对分类问题进行简单可视化

导入数据

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data=pd.read_csv('LogiReg_data.txt',header=None,names=['Exam 1','Exam 2','Admitted'])
data.head()
Exam 1 Exam 2 Admitted
0 34.623660 78.024693 0
1 30.286711 43.894998 0
2 35.847409 72.902198 0
3 60.182599 86.308552 1
4 79.032736 75.344376 1

根据标签画出直方图

X=data.loc[:,data.columns!="Admitted"]
y=data.loc[:,data.columns=="Admitted"]
y_true=y.loc[y["Admitted"]==1,"Admitted"]
y_false=y.loc[y["Admitted"]==0,"Admitted"]
plt.bar([0,1],[len(y_true),len(y_false)],0.3)
plt.xticks([0,1])
plt.show()

分类可视化实践分类可视化实践

根据样本分布画出散点图

x_1=data.loc[data["Admitted"]==1,"Exam 1"]
y_1=data.loc[data["Admitted"]==1,"Exam 2"]

x_2=data.loc[data["Admitted"]==0,"Exam 1"]
y_2=data.loc[data["Admitted"]==0,"Exam 2"]

plt.scatter(x_1,y_1,color='red')
plt.scatter(x_2,y_2,color="blue")

plt.xlabel("Exam 1")
plt.ylabel("Exam 2")
plt.title("Scatter")

plt.show()

分类可视化实践分类可视化实践

训练模型

import matplotlib as mpl
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import recall_score

X=data.loc[:,data.columns!="Admitted"]
Y=data.loc[:,data.columns=="Admitted"]

dr=DecisionTreeRegressor()
dr.fit(X,Y.values.ravel())
y=lr.predict(X)
score=recall_score(Y,y)
score
1.0

画出分类可视化图

# 拓展一个含有500*500个样本点的空间
N=500
M=500

# 由已知的样本,确定样本点的空间的范围
x1_min,x2_min=X.min()
x1_max,x2_max=X.max()
t1=np.linspace(x1_min,x1_max,N) # 理解为横坐标(特征一)
t2=np.linspace(x2_min,x2_max,N) # 理解为纵坐标(特征二)

x1,x2=np.meshgrid(t1,t2) # 将横纵坐标拓展为500*500的网格(x1,x2中对应位置元素,组成网格对应位置交叉点坐标)

x_show=np.stack((x1.flat,x2.flat),axis=1) # 将拓展的500*500的样本点组成训练集
y_predict=dr.predict(x_show) # 通过模型得出预测值

cm_light = mpl.colors.ListedColormap(['#A0FFA0', '#FFA0A0'])
cm_dark = mpl.colors.ListedColormap(['g', 'r'])

plt.pcolormesh(x1,x2,y_predict.reshape(x1.shape),cmap=cm_light)
#plt.scatter(X['Exam 1'],X['Exam 2'],Y,cmap=cm_dark,marker='o',edgecolors='k')
x_1=data.loc[data["Admitted"]==1,"Exam 1"]
y_1=data.loc[data["Admitted"]==1,"Exam 2"]

x_2=data.loc[data["Admitted"]==0,"Exam 1"]
y_2=data.loc[data["Admitted"]==0,"Exam 2"]

plt.scatter(x_1,y_1,color='red')
plt.scatter(x_2,y_2,color="blue")

plt.xlabel("Exam 1")
plt.ylabel("Exam 2")
plt.title("Scatter")

plt.show()
plt.show()

分类可视化实践分类可视化实践


相关文章:

  • 2022-01-05
  • 2022-12-23
  • 2021-06-29
  • 2021-10-11
  • 2021-11-16
  • 2021-08-14
  • 2022-12-23
猜你喜欢
  • 2021-06-11
  • 2021-09-07
  • 2021-08-04
  • 2022-12-23
  • 2022-12-23
  • 2021-12-18
  • 2021-12-27
相关资源
相似解决方案