如何循环遍历熊猫数据框以提取特定行和选定列答案

【问题标题】：How to loop through pandas data frame to extract specific rows and selected columns如何循环遍历熊猫数据框以提取特定行和选定列
【发布时间】：2016-04-06 04:20:23
【问题描述】：

我有 Whole_mat 作为熊猫 df。 corpus_index 作为我想要复制到 New_mat 中的有效行，我只想要列号 1、4 和 7。但顺序应该是 7、1、4。下面是我尝试过的，但我得到 TypeError: unhashable type: '列表'。整个垫子的形状是 Nx10，我想要 New_mat 的 nx3。

New_mat = []
for i in range(len(corpus_index):
    index = corpus_index[i]
    New_mat.append(Whole_mat[[index], [7,1,4]])
print New_mat

有什么更好的方法可以解决我的问题？

【问题讨论】：

可能你应该使用New_mat.append(Whole_mat.loc[index, [7,1,4]])如果你有7、1、4作为列名
这是抛出错误 "'None of [[7, 1, 4]] are in the [columns]'" 。我必须将列名作为字符串给出吗？像 [ "user_id", "phone_no" ] ?
是的，你应该这样做。你也可以通过columns 和New_mat.append(Whole_mat.loc[index, Whole_mat.columns[7,1,4]])。 Note 索引从 0 开始。
请原谅我提出愚蠢的问题，但这对我来说很容易：/。我收到too many indices for array 同一行New_mat.append(Whole_mat.loc[index, Whole_mat.columns[7,1,4]])

标签： python pandas

【解决方案1】：

我认为您不需要使用 for 循环进行迭代，您可以尝试这样做，

New_mat = Whole_mat.loc[corpus_index.index, Whole_mat.columns[[7, 1, 4]]]

注意：列索引从 0 开始。

【讨论】：

您通常不需要遍历数据帧的行，更重要的是您不应该，因为它通常比使用构建的矢量化操作慢得多进入熊猫。

【解决方案2】：

您所需要的只是简单的索引。示例：

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: df = pd.DataFrame([np.random.rand(10) for _ in xrange(10)])

In [4]: df.ix[[1,4,5],[3,4,5]]
Out[4]:
          3         4         5
1  0.523302  0.104327  0.672953
4  0.303693  0.785685  0.080759
5  0.955738  0.987779  0.410638

无论何时使用 pandas，都应尽可能避免“循环”（这不是经常需要）。使用 pandas 的全部目的是矢量化。

【讨论】：

感谢您的建议。我会学习的。我用了上面的方法，得到了<pandas.core.indexing._IXIndexer at 0x7fab62a734d0>。现在 New_mat 是什么对象？我该如何使用它。我尝试了 New_mat.head()，打印 New_mat，但没有任何帮助。
您可能使用了.ix() 而不是.ix[]。您需要括号，而不是括号。
太棒了！有效。但是.ix() and .ix[] 有什么区别呢？
不太确定pandas.core.indexing._IXIndexer 对象的用途。如果您好奇，请在此处提供更多信息：stackoverflow.com/questions/30447719/… 另外，如果这解决了您的问题，请接受它作为有效答案；）