【发布时间】:2016-12-06 15:19:30
【问题描述】:
我有这些数据:
新的.csv:
X Y
230.1 22.1
44.5 10.4
17.2 9.3
151.5 18.5
180.8 12.9
8.7 7.2
57.5 11.8
120.2 13.2
8.6 4.8
199.8 10.6
66.1 8.6
214.7 17.4
23.8 9.2
97.5 9.7
204.1 19
195.4 22.4
67.8 12.5
281.4 24.4
69.2 11.3
我正在尝试应用线性回归模型。
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model, cross_validation
import random
data = pd.read_csv('./data/new.csv', names=['X', 'Y'], header=0)
fig = plt.figure()
ax = plt.axes()
x = data.loc[:,'X'].to_frame()
y = data.loc[:,'Y'].to_frame()
x_train, x_test, y_train, y_test = cross_validation.train_test_split(x, y, test_size=0.3, random_state=0)
regr = linear_model.LinearRegression()
regr.fit(x_train, y_train)
ax.set(xlabel='X', ylabel='Y', title='X vs Y')
ax.scatter(x_test,y_test, alpha=0.5, cmap='viridis')
ax.plot(x_test, regr.predict(x_test), color='red', linewidth=2)
直到这里,一切都运行良好。我尝试添加错误栏的那一刻:
plt.errorbar(x_test,y_test, yerr=1, fmt='o');
我收到标题中的错误。
完整的错误是:
TypeError Traceback (most recent call last)
<ipython-input-29-cf35bd0c650f> in <module>()
29
30 #dy=1
---> 31 plt.errorbar(x_test,y_test, yerr=1);
../anaconda2/envs/python3/lib/python3.5/site-packages/matplotlib/pyplot.py in errorbar(x, y, yerr, xerr, fmt, ecolor, elinewidth, capsize, barsabove, lolims, uplims, xlolims, xuplims, errorevery, capthick, hold, data, **kwargs)
2835 xlolims=xlolims, xuplims=xuplims,
2836 errorevery=errorevery, capthick=capthick, data=data,
-> 2837 **kwargs)
2838 finally:
2839 ax.hold(washold)
../anaconda2/envs/python3/lib/python3.5/site-packages/matplotlib/__init__.py in inner(ax, *args, **kwargs)
1817 warnings.warn(msg % (label_namer, func.__name__),
1818 RuntimeWarning, stacklevel=2)
-> 1819 return func(ax, *args, **kwargs)
1820 pre_doc = inner.__doc__
1821 if pre_doc is None:
../anaconda2/envs/python3/lib/python3.5/site-packages/matplotlib/axes/_axes.py in errorbar(self, x, y, yerr, xerr, fmt, ecolor, elinewidth, capsize, barsabove, lolims, uplims, xlolims, xuplims, errorevery, capthick, **kwargs)
2924
2925 if yerr is not None:
-> 2926 lower, upper = extract_err(yerr, y)
2927 # select points without upper/lower limits in y and
2928 # draw normal errorbars for these points
../anaconda2/envs/python3/lib/python3.5/site-packages/matplotlib/axes/_axes.py in extract_err(err, data)
2873 # using list comps rather than arrays to preserve units
2874 low = [thisx - thiserr for (thisx, thiserr)
-> 2875 in cbook.safezip(data, err)]
2876 high = [thisx + thiserr for (thisx, thiserr)
2877 in cbook.safezip(data, err)]
../anaconda2/envs/python3/lib/python3.5/site-packages/matplotlib/axes/_axes.py in <listcomp>(.0)
2872 "dimensions as x, or 2xN.")
2873 # using list comps rather than arrays to preserve units
-> 2874 low = [thisx - thiserr for (thisx, thiserr)
2875 in cbook.safezip(data, err)]
2876 high = [thisx + thiserr for (thisx, thiserr)
TypeError: unsupported operand type(s) for -: 'str' and 'int'
【问题讨论】:
-
如果您发布了整个回溯,将会有所帮助。它有助于调试问题。
-
对我来说它很完美。但是您的某些数据似乎不是数字,请使用
df.dtypes对其进行测试。然后尝试转换为float,如df['X'] = df['X'].astype(float),如果值错误df['X'] = pd.to_numeric(df['X'], errors='coerce'),则将非数值转换为NaN -
@jezrael:我使用上面的X和Y值。它们都是数字。(也经过测试并返回float64)
-
@jezrael:我不明白为什么它对你有用。我使用的是上面的确切数字,它只有在我使用 x_test.values 时才有效
-
我考虑了一下。也许是错误,也许我在 win 中使用 python 3 和
pandas: 0.19.1、matplotlib: 1.5.1、scipy: 0.17.0和numpy: 1.11.0。所以也许可以帮助升级。
标签: python pandas scikit-learn