不使用熊猫数据框时如何添加另一个功能？答案

【问题标题】：How to add another feature when NOT using panda dataframe?不使用熊猫数据框时如何添加另一个功能？
【发布时间】：2018-06-02 16:32:43
【问题描述】：

我正在使用 python api 从 mysql 的表中返回数据。下面的代码适用于 1 个功能。 zip 的返回值是 1 的元组。

listtrain = twenty4hours.return_select("SELECT feature_1, class FROM justext")
f1, trgt = zip(*listtrain) 
X_train, X_test, y_train, y_test = train_test_split(f1, trgt, random_state=0)

上面的代码正在运行。

listtrain = twent4hours.return_select("SELECT feature_1, feature_2, class FROM justext")
f1, f2, trgt = zip(*listtrain) 
X_train, X_test, y_train, y_test = train_test_split(fX, trgt, random_state=0)

如何添加另一个特征/列，以便将其传递给函数 train_test_split？如何构建 fX？来自 f1 和 f2。

谢谢。

【问题讨论】：

标签： python-3.x machine-learning mysql-python

【解决方案1】：

我感觉我在转圈，所以我将使用 panda DataFrame，毕竟这是为此而制作的，并且网上有大量示例。我想使用列表/元组，因为 sql 查询 API 返回一个。

train_test_split() 允许的输入是列表、numpy 数组、scipy-sparse 矩阵或 pandas 数据帧。 doc

【讨论】：

因为我使用的是 SQL，所以我决定使用 CONCAT(F1, F2) 将 2 个输入放在一起。准确度更好。如果我将字符串输入与数字输入相结合，我将看看这是否会提高预测的准确性。