题目:利用安斯库姆四重奏实验中的数据进行分析,计算每个表x和y的平均值,方差,相关系数和回归直线拟合,并做出图像。
题目参考网址:
https://nbviewer.jupyter.org/github/schmit/cme193-ipython-notebooks-lecture/blob/master/Exercises.ipynb
代码:
import random import numpy as np import scipy as sp import pandas as pd import matplotlib.pyplot as plt import seaborn as sns import statsmodels.api as sm import statsmodels.formula.api as smf sns.set_context("talk") anascombe=pd.read_csv('data/anscombe.csv') anascombe.head() def Data(data): return pd.Series([data['x'].mean(),data['x'].var(),data['y'].mean(),data['y'].var()],index=['x mean','x var','y mean','y var']) datasetName=anascombe.dataset.unique() group=anascombe.groupby(by=list(["dataset"])) for temp in datasetName: print() data=group.get_group(temp) print(temp) print(pd.DataFrame(Data(data))) print("相关系数") print(data.corr()) x=data['x'] x_plus=sm.add_constant(data['x']) y=data['y'] model=sm.OLS(y,x_plus) image=model.fit() print(image.params) y_plus=image.fittedvalues fig,ax=plt.subplots() ax.plot(x,y,'o',label='data') ax.plot(x,y_plus,'r-',label='OLS') ax.legend(loc='best') plt.show()
结果:
I
0
x mean 9.000000
x var 11.000000
y mean 7.500909
y var 4.127269
相关系数
x y
x 1.000000 0.816421
y 0.816421 1.000000
const 3.000091
x 0.500091
dtype: float64
II
0
x mean 9.000000
x var 11.000000
y mean 7.500909
y var 4.127629
相关系数
x y
x 1.000000 0.816237
y 0.816237 1.000000
const 3.000909
x 0.500000
dtype: float64
III
0
x mean 9.00000
x var 11.00000
y mean 7.50000
y var 4.12262
相关系数
x y
x 1.000000 0.816287
y 0.816287 1.000000
const 3.002455
x 0.499727
dtype: float64
IV
0
x mean 9.000000
x var 11.000000
y mean 7.500909
y var 4.123249
相关系数
x y
x 1.000000 0.816521
y 0.816521 1.000000
const 3.001727
x 0.499909
dtype: float64