过滤并仅保留具有相同索引的行

【问题标题】：Filter and to stay only rows with the same index过滤并仅保留具有相同索引的行
【发布时间】：2021-12-21 16:16:33
【问题描述】：

我有两个数据框：X_oos_top_10 和 y_oos_top_10。我需要通过X_oos_top_10["comm"] == 1 过滤它们。

我这样做是为了一个人：

X_oos_top_10_comm1 = X_oos_top_10[X_oos_top_10["comm"] == 1]

但另一个我有问题：IndexingError: Unalignable boolean Series provided as indexer（布尔系列的索引和索引对象的索引不匹配）。

y_oos_top_10_comm1 = y_oos_top_10[X_oos_top_10["comm"] == 1]

我不知道该怎么做。

【问题讨论】：

请使用反引号格式化为代码块
您可以做的是将 X 和 y 连接在一起，过滤它们，然后重新分离。

标签： python pandas filter

【解决方案1】：

假设 X 和 y 的长度相同，则可以使用索引。

设置minimal reproducible example:

X_oos_top_10 = pd.DataFrame({'comm': np.random.randint(1, 10, 10)})
y_oos_top_10 = pd.DataFrame(np.random.randint(1, 10, (10, 4)), columns=list('ABCD'))

print(X_oos_top_10)

# Output:
   comm
0     5
1     6
2     2
3     6
4     1
5     6
6     1
7     4
8     5
9     8

print(y_oos_top_10)

# Output:
   A  B  C  D
0  2  9  1  6
1  9  8  5  4
2  1  6  7  6
3  6  3  6  5
4  2  6  8  3
5  2  6  6  5
6  4  4  3  5
7  6  3  7  5
8  2  8  8  7
9  4  9  1  4

第一种方法

idx = X_oos_top_10[X_oos_top_10["comm"] == 1].index
out = y_oos_top_10.loc[idx]
print(out)

# Output:
   A  B  C  D
4  2  6  8  3
6  4  4  3  5

第二种方法

Xy_oos_top_10 = X_oos_top_10.join(y_oos_top_10)
out = Xy_oos_top_10[Xy_oos_top_10['comm'] == 1]
print(out)

# Output:
   comm  A  B  C  D
4     1  2  6  8  3
6     1  4  4  3  5

【讨论】：

@ViolettaDavydova。它能解决你的问题吗？