如何使用for循环附加从python中的函数生成的数据框答案

【问题标题】：How to use a for loop to append the data frame generated from a function in python如何使用for循环附加从python中的函数生成的数据框
【发布时间】：2014-08-28 16:29:15
【问题描述】：

我的问题是我生成了一个函数来存储每个分类器中每个逐步模型的 10 倍交叉验证分数。例如，对于朴素贝叶斯，我有两个模型，一个只使用一个变量，而另一些则使用两个。类似于决策树模型。该功能类似于

def crossV(clf):
    cvOutcome=pd.DataFrame()
    index=pd.DataFrame()
    classifier=pd.DataFrame()
    for i in range(4)[2:]:
        tt=array(tuple(x[1:i] for x in modelDataFullnew))
        qq=array(tuple(x[0] for x in modelDataFullnew))
        scores=cross_validation.cross_val_score(clf, tt, qq, cv=10)*100
        index_i=list(np.repeat(i-1,10))
        classifier_i=list(np.repeat(str(clf)[:-2],10))
        scores=list(scores)
        cvOutcome=cvOutcome.append(scores)
        index=index.append(index_i)
        classifier=classifier.append(classifier_i)
    merge=pd.concat([index,cvOutcome,classifier],axis=1)
    merge.columns=['model','rate','classifier']
    return(merge)

from sklearn.naive_bayes import GaussianNB as gnb
clf_nb=gnb()
from sklearn import tree
clf_dt=tree.DecisionTreeClassifier()

如果我这样做crossV(clf_nb)，它会给我结果

    model   rate    classifier
   1     92.558679   GaussianNB
   1     92.558381   GaussianNB
   1     92.558381   GaussianNB
   1     92.558381   GaussianNB
   1     92.558381   GaussianNB

我的问题是如何将此函数应用于多个分类器并将其结果附加为长数据框，例如

    model   rate    classifier
   1     92.558679   GaussianNB
   1     92.558381   GaussianNB
   1     92.558381   GaussianNB
   1     92.558381   GaussianNB
   1     92.558381   GaussianNB
   1     93.25       DecisionTree
   1     93.25       DecisionTree

我试过这段代码，但它不起作用：

hhh=[clf_nb,clf_dt]

g=pd.DataFrame()
while i in hhh:
    g=g.append(crossV(i))

我还尝试了数组中的映射函数，例如

map(crossV,(clf_nb,clf_dt))

它有效，但只是给我一个更大的列表，我不知道如何将它转换为数据框。

【问题讨论】：

add one row in a pandas.DataFrame 的可能重复项
我试过了但结果什么都没有我不知道这里出了什么问题
你试过df = pd.concat( (crossV(clf_nb), crossV(clf_dt)) )
这段代码有效，但如果我有 20 个分类器怎么办？我想编写一个通用函数来做到这一点，但不知何故它对我不起作用......

标签： python pandas append scikit-learn

【解决方案1】：

clf = [clf_nb, clf_dt]

cross_clf = [ crossV(x) for x in clf ]

df = pd.concat( cross_clf )

编辑：

评论中的问题示例：

我需要i = clf_nb 或i = clf_nb 来启动while

hhh = [clf_nb, clf_dt]

g = pd.DataFrame()

i = clf_nb

while i in hhh: # if `clf_nb` is still on the list `hhh` then ...
    g.append( crossV(i) ) # append `clf_nb` to the `g`

但i 始终等于clf_nb 和clf_nb 始终在列表hhh 上，所以你有无限循环，总是将clf_nb 添加到g

【讨论】：

谢谢！有用！！！但是你能指出为什么while循环在这里不起作用吗？再次感谢！
while i in hhh 表示：如果来自i 的值存在于列表hhh 上，则重复。就像if i in hhh。
谢谢！但是当i 在hhh 中时，我仍然有点困惑，为什么它不像您的代码那样执行 concat 或 append ？再次感谢！:)@furas
首先你没有变量i，第二个i必须是i = clf_nb或i = clf_dt，第三个in在for中的工作方式与while不同和if。 in 在while 中只检查“如果'i' 仍然在列表'hhh' 上然后......”，in 在for 中执行'从'hhh' 中获取下一个元素并将其分配给'i'那么……”
我添加一些例子来回答。