如何在以下数据上实现朴素贝叶斯？答案

【问题标题】：How do I implement Naive Bayes on the following data?如何在以下数据上实现朴素贝叶斯？
【发布时间】：2017-11-14 22:09:04
【问题描述】：

我的数据在 csv 文件中，格式如下：

45,45,34,34,34,56,52,88,50,46,46,1

28,26,23,22,32,36,21,18,8,28,40,0

28,46,57,42,46,51,48,48,40,46,34,1

11,11,11,34,17,13,11,46,11,33,40,0

42,36,46,32,28,51,48,56,38,46,40,1

等等。

我正在尝试使用二进制分类器，它可以对作为输入的数据进行分类，如前 11 列所示，第 12 列表示接受（1）或拒绝（0）。我正在使用 python 的 pandas、numpy 模块。如何对数据实施朴素贝叶斯？

我收到数据转换错误：

 ValueError: could not convert string to float

到目前为止，这是我的代码：

import pandas as pd
import numpy as np
from sklearn.cross_validation import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

df = pd.read_csv(r'file.csv')
features=df.values[:,:11]
target=df.values[:,12]

features_train, features_test, target_train, target_test = 
train_test_split(features, target, test_size = 0.33, random_state = 10)

clf=GaussianNB()
clf.fit(features_train, target_train)
target_pred = clf.predict(features_test)

【问题讨论】：

如果您的代码出现错误，请包含所有产生此错误的代码。我们无权访问您的计算机。请包括加载此 csv 的代码、尝试训练/适应此数据的代码等。
另外，第一行的位置 1 有一个空单元格。float('') 失败。
@EdChum 添加了代码
@Kendas 修复了它

标签： python pandas numpy scikit-learn anaconda

【解决方案1】：

您正在错误地读取 csv，您需要执行以下操作：

df = pd.read_csv(r'file.csv', skipinitialspace=True, header=None)

逗号分隔符之间有一个空格，也没有标题行，这将产生：

Out[18]: 
   0   1   2   3   4   5   6   7   8   9   10  11
0  45  45  34  34  34  56  52  88  50  46  46   1
1  28  26  23  22  32  36  21  18   8  28  40   0
2  28  46  57  42  46  51  48  48  40  46  34   1

dtypes 现在是数字：

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 12 columns):
0     3 non-null int64
1     3 non-null int64
2     3 non-null int64
3     3 non-null int64
4     3 non-null int64
5     3 non-null int64
6     3 non-null int64
7     3 non-null int64
8     3 non-null int64
9     3 non-null int64
10    3 non-null int64
11    3 non-null int64
dtypes: int64(12)
memory usage: 368.0 bytes

【讨论】：

我得到了关于使用 df.info() 的一栏
用真实的数据或链接编辑您的问题，我只能就发布的内容提出建议
已修复。现在逗号后面的数据没有空格了，
你能试试用numpy加载这个吗：df = np.loadtxt(r'file.csv', delimeter=',')