【发布时间】:2018-11-30 06:14:50
【问题描述】:
我正在尝试遍历三个数据帧以找出它们之间的差异。我有一个包含所有内容的主数据框和两个包含部分主数据框的其他数据框。我正在尝试编写一个 python 代码来识别其他两个文件中缺少的内容。主文件如下所示:
ID Name
1 Mike
2 Dani
3 Scott
4 Josh
5 Nate
6 Sandy
第二个数据框如下所示:
ID Name
1 Mike
2 Dani
3 Scott
6 Sandy
第三个数据框如下所示:
ID Name
1 Mike
2 Dani
3 Scott
4 Josh
5 Nate
所以会有两个输出数据帧。第二个数据帧的所需输出如下所示:
ID Name
4 Josh
5 Nate
第三个数据帧的期望输出如下所示:
ID Name
6 Sandy
我在 Google 上没有找到类似的东西。我试过这个:
for i in second['ID'], third['ID']:
if i not in master['ID']:
print(i)
它返回主文件中的所有数据。
如果我尝试这段代码:
import pandas as pd
names = ["Mike", "Dani", "Scott", "Josh", "Nate", "Sandy"]
ids = [1, 2, 3, 4, 5, 6]
master = pd.DataFrame({"ID": ids, "Name": names})
# print(master)
names_second = ["Mike", "Dani", "Scott", "Sandy"]
ids_second = [1, 2, 3, 6]
second = pd.DataFrame({"ID": ids_second, "Name": names_second})
# print(second)
names_third = ["Mike", "Dani", "Scott", "Josh", "Nate"]
ids_third = [1, 2, 3, 4, 5]
third = pd.DataFrame({"ID": ids_third, "Name": names_third})
# print(third)
for i in master['ID']:
if i not in second["ID"]:
print("NOT IN SECOND", i)
if i not in third["ID"]:
print("NOT IN THIRD", i)
输出 ::
NOT IN SECOND 4
NOT IN SECOND 5
NOT IN THIRD 5
NOT IN SECOND 6
NOT IN THIRD 6
为什么会显示NOT IN SECOND 6 和NOT IN THIRD 5?
有什么建议吗?提前致谢。
【问题讨论】:
-
是ID索引还是列?
-
可能是一个列,因为它在尝试中是如何被引用的
标签: python pandas loops dataframe iterator