如何从数据集中采样并获取初始数据集中样本的索引答案

【问题标题】：how to sample from a dataset and get the indices of samples in initial dataset如何从数据集中采样并获取初始数据集中样本的索引
【发布时间】：2021-02-15 14:36:42
【问题描述】：

我有一个形状为 (1000, 10) 的数据集 A。我想做这样的采样：

B = pd.DataFrame(A).sample(frac = 0.2)

如何获得包含 B 的 A 的索引？或者我如何根据 B 对 A 进行排序以在 A 的开头有这 200 行 B？

我已经尝试过这段代码，但我不明白为什么它会给我一个错误

I = np.argwhere((A == B[:, None]).all(axis=2))[:, 1]

或者这个

np.arange(A.shape[0])[np.isin(A,B).all(axis=1)]

谢谢

【问题讨论】：

标签： python dataframe numpy sorting sampling

【解决方案1】：

在A 中创建一个布尔列，用于判断行是否在B 中
我们可以在B 中获取索引为A.index.isin(B.index) 的行
按新列排序并删除该列

# after defining A and B
# step 1, 2
A["isinB"] = A.index.isin(B.index)
# step 3 Trues go to front, Falses go to end
A.sort_values("isinB").drop("isinB", 1)

【讨论】：